1. 08 9月, 2005 4 次提交
  2. 05 9月, 2005 28 次提交
    • P
      [PATCH] uml: SYSEMU: slight cleanup and speedup · 640aa46e
      Paolo 'Blaisorblade' Giarrusso 提交于
      As a follow-up to "UML Support - Ptrace: adds the host SYSEMU support, for
      UML and general usage" (i.e.  uml-support-* in current mm).
      
      Avoid unconditionally jumping to work_pending and code copying, just reuse
      the already existing resume_userspace path.
      
      One interesting note, from Charles P.  Wright, suggested that the API is
      improvable with no downsides for UML (except that it will have to support
      yet another host API, since dropping support for the current API, for UML,
      is not reasonable from users' point of view).
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      CC: Charles P. Wright <cwright@cs.sunysb.edu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      640aa46e
    • B
      [PATCH] SYSEMU: fix sysaudit / singlestep interaction · ab1c23c2
      Bodo Stroesser 提交于
            Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      
      This is simply an adjustment for "Ptrace - i386: fix Syscall Audit interaction
      with singlestep" to work on top of SYSEMU patches, too.  On this patch, I have
      some doubts: I wonder why we need to alter that way ptrace_disable().
      
      I left the patch this way because it has been extensively tested, but I don't
      understand the reason.
      
      The current PTRACE_DETACH handling simply clears child->ptrace; actually this
      is not enough because entry.S just looks at the thread_flags; actually,
      do_syscall_trace checks current->ptrace but I don't think depending on that is
      good, at least for performance, so I think the clearing is done elsewhere.
      For instance, on PTRACE_CONT it's done, but doing PTRACE_DETACH without
      PTRACE_CONT is possible (and happens when gdb crashes and one kills it
      manually).
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      CC: Roland McGrath <roland@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab1c23c2
    • B
      [PATCH] Uml support: add PTRACE_SYSEMU_SINGLESTEP option to i386 · 1b38f006
      Bodo Stroesser 提交于
      This patch implements the new ptrace option PTRACE_SYSEMU_SINGLESTEP, which
      can be used by UML to singlestep a process: it will receive SINGLESTEP
      interceptions for normal instructions and syscalls, but syscall execution will
      be skipped just like with PTRACE_SYSEMU.
      Signed-off-by: NBodo Stroesser <bstroesser@fujitsu-siemens.com>
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1b38f006
    • B
      [PATCH] Uml support: reorganize PTRACE_SYSEMU support · c8c86cec
      Bodo Stroesser 提交于
      With this patch, we change the way we handle switching from PTRACE_SYSEMU to
      PTRACE_{SINGLESTEP,SYSCALL}, to free TIF_SYSCALL_EMU from double use as a
      preparation for PTRACE_SYSEMU_SINGLESTEP extension, without changing the
      behavior of the host kernel.
      Signed-off-by: NBodo Stroesser <bstroesser@fujitsu-siemens.com>
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c8c86cec
    • L
      [PATCH] UML Support - Ptrace: adds the host SYSEMU support, for UML and general usage · ed75e8d5
      Laurent Vivier 提交于
            Jeff Dike <jdike@addtoit.com>,
            Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>,
            Bodo Stroesser <bstroesser@fujitsu-siemens.com>
      
      Adds a new ptrace(2) mode, called PTRACE_SYSEMU, resembling PTRACE_SYSCALL
      except that the kernel does not execute the requested syscall; this is useful
      to improve performance for virtual environments, like UML, which want to run
      the syscall on their own.
      
      In fact, using PTRACE_SYSCALL means stopping child execution twice, on entry
      and on exit, and each time you also have two context switches; with SYSEMU you
      avoid the 2nd stop and so save two context switches per syscall.
      
      Also, some architectures don't have support in the host for changing the
      syscall number via ptrace(), which is currently needed to skip syscall
      execution (UML turns any syscall into getpid() to avoid it being executed on
      the host).  Fixing that is hard, while SYSEMU is easier to implement.
      
      * This version of the patch includes some suggestions of Jeff Dike to avoid
        adding any instructions to the syscall fast path, plus some other little
        changes, by myself, to make it work even when the syscall is executed with
        SYSENTER (but I'm unsure about them). It has been widely tested for quite a
        lot of time.
      
      * Various fixed were included to handle the various switches between
        various states, i.e. when for instance a syscall entry is traced with one of
        PT_SYSCALL / _SYSEMU / _SINGLESTEP and another one is used on exit.
        Basically, this is done by remembering which one of them was used even after
        the call to ptrace_notify().
      
      * We're combining TIF_SYSCALL_EMU with TIF_SYSCALL_TRACE or TIF_SINGLESTEP
        to make do_syscall_trace() notice that the current syscall was started with
        SYSEMU on entry, so that no notification ought to be done in the exit path;
        this is a bit of a hack, so this problem is solved in another way in next
        patches.
      
      * Also, the effects of the patch:
      "Ptrace - i386: fix Syscall Audit interaction with singlestep"
      are cancelled; they are restored back in the last patch of this series.
      
      Detailed descriptions of the patches doing this kind of processing follow (but
      I've already summed everything up).
      
      * Fix behaviour when changing interception kind #1.
      
        In do_syscall_trace(), we check the status of the TIF_SYSCALL_EMU flag
        only after doing the debugger notification; but the debugger might have
        changed the status of this flag because he continued execution with
        PTRACE_SYSCALL, so this is wrong.  This patch fixes it by saving the flag
        status before calling ptrace_notify().
      
      * Fix behaviour when changing interception kind #2:
        avoid intercepting syscall on return when using SYSCALL again.
      
        A guest process switching from using PTRACE_SYSEMU to PTRACE_SYSCALL
        crashes.
      
        The problem is in arch/i386/kernel/entry.S.  The current SYSEMU patch
        inhibits the syscall-handler to be called, but does not prevent
        do_syscall_trace() to be called after this for syscall completion
        interception.
      
        The appended patch fixes this.  It reuses the flag TIF_SYSCALL_EMU to
        remember "we come from PTRACE_SYSEMU and now are in PTRACE_SYSCALL", since
        the flag is unused in the depicted situation.
      
      * Fix behaviour when changing interception kind #3:
        avoid intercepting syscall on return when using SINGLESTEP.
      
        When testing 2.6.9 and the skas3.v6 patch, with my latest patch and had
        problems with singlestepping on UML in SKAS with SYSEMU.  It looped
        receiving SIGTRAPs without moving forward.  EIP of the traced process was
        the same for all SIGTRAPs.
      
      What's missing is to handle switching from PTRACE_SYSCALL_EMU to
      PTRACE_SINGLESTEP in a way very similar to what is done for the change from
      PTRACE_SYSCALL_EMU to PTRACE_SYSCALL_TRACE.
      
      I.e., after calling ptrace(PTRACE_SYSEMU), on the return path, the debugger is
      notified and then wake ups the process; the syscall is executed (or skipped,
      when do_syscall_trace() returns 0, i.e.  when using PTRACE_SYSEMU), and
      do_syscall_trace() is called again.  Since we are on the return path of a
      SYSEMU'd syscall, if the wake up is performed through ptrace(PTRACE_SYSCALL),
      we must still avoid notifying the parent of the syscall exit.  Now, this
      behaviour is extended even to resuming with PTRACE_SINGLESTEP.
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ed75e8d5
    • B
      [PATCH] Ptrace/i386: fix "syscall audit" interaction with singlestep · 94c80b25
      Bodo Stroesser 提交于
            Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      
      Avoid giving two traps for singlestep instead of one, when syscall auditing is
      enabled.
      
      In fact no singlestep trap is sent on syscall entry, only on syscall exit, as
      can be seen in entry.S:
      
      # Note that in this mask _TIF_SINGLESTEP is not tested !!! <<<<<<<<<<<<<<
              testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SECCOMP),TI_flags(%ebp)
              jnz syscall_trace_entry
      	...
      syscall_trace_entry:
      	...
      	call do_syscall_trace
      
      But auditing a SINGLESTEP'ed process causes do_syscall_trace to be called, so
      the tracer will get one more trap on the syscall entry path, which it
      shouldn't.
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      CC: Roland McGrath <roland@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      94c80b25
    • S
      [PATCH] add suspend/resume for timer · c3c433e4
      Shaohua Li 提交于
      The timers lack .suspend/.resume methods.  Because of this, jiffies got a
      big compensation after a S3 resume.  And then softlockup watchdog reports
      an oops.  This occured with HPET enabled, but it's also possible for other
      timers.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c3c433e4
    • P
      [PATCH] swsusp: fix remaining u32 vs. pm_message_t confusion · 829ca9a3
      Pavel Machek 提交于
      Fix remaining bits of u32 vs.  pm_message confusion.  Should not break
      anything.
      Signed-off-by: NPavel Machek <pavel@suse.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      829ca9a3
    • P
      [PATCH] ISA DMA suspend for i386 · 795312e7
      Pierre Ossman 提交于
      Reset the ISA DMA controller into a known state after a suspend.  Primary
      concern was reenabling the cascading DMA channel (4).
      Signed-off-by: NPierre Ossman <drzeus@drzeus.cx>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      795312e7
    • B
      [PATCH] unify x86/x86-64 semaphore code · 52fdd089
      Benjamin LaHaise 提交于
      This patch moves the common code in x86 and x86-64's semaphore.c into a
      single file in lib/semaphore-sleepers.c.  The arch specific asm stubs are
      left in the arch tree (in semaphore.c for i386 and in the asm for x86-64).
      There should be no changes in code/functionality with this patch.
      Signed-off-by: NBenjamin LaHaise <benjamin.c.lahaise@intel.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NJeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      52fdd089
    • Z
      [PATCH] i386 boottime for_each_cpu broken · 4ad8d383
      Zwane Mwaikambo 提交于
      for_each_cpu walks through all processors in cpu_possible_map, which is
      defined as cpu_callout_map on i386 and isn't initialised until all
      processors have been booted. This breaks things which do for_each_cpu
      iterations early during boot. So, define cpu_possible_map as a bitmap with
      NR_CPUS bits populated. This was triggered by a patch i'm working on which
      does alloc_percpu before bringing up secondary processors.
      
      From: Alexander Nyberg <alexn@telia.com>
      
      i386-boottime-for_each_cpu-broken.patch
      i386-boottime-for_each_cpu-broken-fix.patch
      
      The SMP version of __alloc_percpu checks the cpu_possible_map before
      allocating memory for a certain cpu.  With the above patches the BSP cpuid
      is never set in cpu_possible_map which breaks CONFIG_SMP on uniprocessor
      machines (as soon as someone tries to dereference something allocated via
      __alloc_percpu, which in fact is never allocated since the cpu is not set
      in cpu_possible_map).
      Signed-off-by: NZwane Mwaikambo <zwane@arm.linux.org.uk>
      Signed-off-by: NAlexander Nyberg <alexn@telia.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4ad8d383
    • Z
      [PATCH] i386: encapsulate copying of pgd entries · d7271b14
      Zachary Amsden 提交于
      Add a clone operation for pgd updates.
      
      This helps complete the encapsulation of updates to page tables (or pages
      about to become page tables) into accessor functions rather than using
      memcpy() to duplicate them.  This is both generally good for consistency
      and also necessary for running in a hypervisor which requires explicit
      updates to page table entries.
      
      The new function is:
      
      clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
      
         dst - pointer to pgd range anwhere on a pgd page
         src - ""
         count - the number of pgds to copy.
      
         dst and src can be on the same page, but the range must not overlap
         and must not cross a page boundary.
      
      Note that I ommitted using this call to copy pgd entries into the
      software suspend page root, since this is not technically a live paging
      structure, rather it is used on resume from suspend.  CC'ing Pavel in case
      he has any feedback on this.
      
      Thanks to Chris Wright for noticing that this could be more optimal in
      PAE compiles by eliminating the memset.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d7271b14
    • G
      [PATCH] x86 NMI: better support for debuggers · 748f2edb
      George Anzinger 提交于
      This patch adds a notify to the die_nmi notify that the system is about to
      be taken down.  If the notify is handled with a NOTIFY_STOP return, the
      system is given a new lease on life.
      
      We also change the nmi watchdog to carry on if die_nmi returns.
      
      This give debug code a chance to a) catch watchdog timeouts and b) possibly
      allow the system to continue, realizing that the time out may be due to
      debugger activities such as single stepping which is usually done with
      "other" cpus held.
      
      Signed-off-by: George Anzinger<george@mvista.com>
      Cc: Keith Owens <kaos@ocs.com.au>
      Signed-off-by: NGeorge Anzinger <george@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      748f2edb
    • Z
      [PATCH] x86: introduce a write acessor for updating the current LDT · f2f30ebc
      Zachary Amsden 提交于
      Introduce a write acessor for updating the current LDT.  This is required
      for hypervisors like Xen that do not allow LDT pages to be directly
      written.
      
      Testing - here's a fun little LDT test that can be trivially modified to
      test limits as well.
      
      /*
       * Copyright (c) 2005, Zachary Amsden (zach@vmware.com)
       * This is licensed under the GPL.
       */
      
      #include <stdio.h>
      #include <signal.h>
      #include <asm/ldt.h>
      #include <asm/segment.h>
      #include <sys/types.h>
      #include <unistd.h>
      #include <sys/mman.h>
      #define __KERNEL__
      #include <asm/page.h>
      
      void main(void)
      {
              struct user_desc desc;
              char *code;
              unsigned long long tsc;
      
              code = (char *)mmap(0, 8192, PROT_EXEC|PROT_READ|PROT_WRITE,
                                       MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
              desc.entry_number = 0;
              desc.base_addr = code;
              desc.limit = 1;
              desc.seg_32bit = 1;
              desc.contents = MODIFY_LDT_CONTENTS_CODE;
              desc.read_exec_only = 0;
              desc.limit_in_pages = 1;
              desc.seg_not_present = 0;
              desc.useable = 1;
              if (modify_ldt(1, &desc, sizeof(desc)) != 0) {
                      perror("modify_ldt");
              }
              printf("code base is 0x%08x\n", (unsigned)code);
              code[0x0ffe] = 0x0f;  /* rdtsc */
              code[0x0fff] = 0x31;
              code[0x1000] = 0xcb;  /* lret */
              __asm__ __volatile("lcall $7,$0xffe" : "=A" (tsc));
              printf("TSC is 0x%016llx\n", tsc);
      }
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f2f30ebc
    • Z
      [PATCH] x86: make IOPL explicit · a5201129
      Zachary Amsden 提交于
      The pushf/popf in switch_to are ONLY used to switch IOPL.  Making this
      explicit in C code is more clear.  This pushf/popf pair was added as a
      bugfix for leaking IOPL to unprivileged processes when using
      sysenter/sysexit based system calls (sysexit does not restore flags).
      
      When requesting an IOPL change in sys_iopl(), it is just as easy to change
      the current flags and the flags in the stack image (in case an IRET is
      required), but there is no reason to force an IRET if we came in from the
      SYSENTER path.
      
      This change is the minimal solution for supporting a paravirtualized Linux
      kernel that allows user processes to run with I/O privilege.  Other
      solutions require radical rewrites of part of the low level fault / system
      call handling code, or do not fully support sysenter based system calls.
      
      Unfortunately, this added one field to the thread_struct.  But as a bonus,
      on P4, the fastest time measured for switch_to() went from 312 to 260
      cycles, a win of about 17% in the fast case through this performance
      critical path.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a5201129
    • Z
      [PATCH] x86: privilege cleanup · 0998e422
      Zachary Amsden 提交于
      Privilege checking cleanup.  Originally, these diffs were much greater, but
      recent cleanups in Linux have already done much of the cleanup.  I added
      some explanatory comments in places where the reasoning behind certain
      tests is rather subtle.
      
      Also, in traps.c, we can skip the user_mode check in handle_BUG().  The
      reason is, there are only two call chains - one via die_if_kernel() and one
      via do_page_fault(), both entering from die().  Both of these paths already
      ensure that a kernel mode failure has happened.  Also, the original check
      here, if (user_mode(regs)) was insufficient anyways, since it would not
      rule out BUG faults from V8086 mode execution.
      
      Saving the %ss segment in show_regs() rather than assuming a fixed value
      also gives better information about the current kernel state in the
      register dump.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0998e422
    • Z
      [PATCH] x86: more asm cleanups · f2ab4461
      Zachary Amsden 提交于
      Some more assembler cleanups I noticed along the way.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f2ab4461
    • Z
      [PATCH] i386: load_tls() fix · e7a2ff59
      Zachary Amsden 提交于
      Subtle fix: load_TLS has been moved after saving %fs and %gs segments to avoid
      creating non-reversible segments.  This could conceivably cause a bug if the
      kernel ever needed to save and restore fs/gs from the NMI handler.  It
      currently does not, but this is the safest approach to avoiding fs/gs
      corruption.  SMIs are safe, since SMI saves the descriptor hidden state.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e7a2ff59
    • Z
      [PATCH] i386: inline assembler: cleanup and encapsulate descriptor and task register management · 4d37e7e3
      Zachary Amsden 提交于
      i386 inline assembler cleanup.
      
      This change encapsulates descriptor and task register management.  Also,
      it is possible to improve assembler generation in two cases; savesegment
      may store the value in a register instead of a memory location, which
      allows GCC to optimize stack variables into registers, and MOV MEM, SEG
      is always a 16-bit write to memory, making the casting in math-emu
      unnecessary.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4d37e7e3
    • Z
      [PATCH] i386: cleanup serialize msr · 245067d1
      Zachary Amsden 提交于
      i386 arch cleanup.  Introduce the serialize macro to serialize processor
      state.  Why the microcode update needs it I am not quite sure, since wrmsr()
      is already a serializing instruction, but it is a microcode update, so I will
      keep the semantic the same, since this could be a timing workaround.  As far
      as I can tell, this has always been there since the original microcode update
      source.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      245067d1
    • Z
      [PATCH] i386: inline asm cleanup · 4bb0d3ec
      Zachary Amsden 提交于
      i386 Inline asm cleanup.  Use cr/dr accessor functions.
      
      Also, a potential bugfix.  Also, some CR accessors really should be volatile.
      Reads from CR0 (numeric state may change in an exception handler), writes to
      CR4 (flipping CR4.TSD) and reads from CR2 (page fault) prevent instruction
      re-ordering.  I did not add memory clobber to CR3 / CR4 / CR0 updates, as it
      was not there to begin with, and in no case should kernel memory be clobbered,
      except when doing a TLB flush, which already has memory clobber.
      
      I noticed that page invalidation does not have a memory clobber.  I can't find
      a bug as a result, but there is definitely a potential for a bug here:
      
      #define __flush_tlb_single(addr) \
      	__asm__ __volatile__("invlpg %0": :"m" (*(char *) addr))
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4bb0d3ec
    • R
      [PATCH] i386: clean up vDSO alignment padding · 2a0694d1
      Roland McGrath 提交于
      This makes the vDSO use nops for all its padding around instructions,
      rather than sometimes zeros, and nop-pads the end of the area containing
      instructions to a 32-byte cache line, to keep text and data in separate
      lines.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2a0694d1
    • V
    • V
      [PATCH] x86: sutomatically enable bigsmp when we have more than 8 CPUs · 911a62d4
      Venkatesh Pallipadi 提交于
      i386 generic subarchitecture requires explicit dmi strings or command line
      to enable bigsmp mode.  The patch below removes that restriction, and uses
      bigsmp as soon as it finds more than 8 logical CPUs, Intel processors and
      xAPIC support.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      911a62d4
    • V
      [PATCH] kdump: Save parameter segment in protected mode (x86) · 484b90c4
      Vivek Goyal 提交于
      o With introduction of kexec as boot-loader, the assumption that parameter
        segment will always be loaded at lower address than kernel and will be
        addressable by early bootup page tables is no longer valid. In kexec on
        panic case parameter segment might well be loaded beyond kernel image and
        might not be addressable by early boot page tables.
      o This case might hit in the scenario where user has reserved a chunk of
        memory for second kernel, for example 16MB to 64MB, and has also built
        second kernel for physical memory location 16MB. In this case kexec has no
        choice but to load the parameter segment at a higher address than new kernel
        image at safe location where new kernel does not stomp it.
      o Though problem should automatically go away once relocatable kernel for i386
        is in place and kexec can determine the location of new kernel at run time
        and load parameter segment at lower address than kernel image. But till then
        this patch can go in (assuming it does not break something else).
      o This patch moves up the boot parameter saving code. Now boot parameters
        are copied out in protected mode before page tables are initialized. This
        will ensure that parameter segment is always addressable irrespective of
        its physical location.
      Signed-off-by: NVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      484b90c4
    • P
      [PATCH] vm86: Honor TF bit when emulating an instruction · 5fd75ebb
      Petr Tesarik 提交于
      If the virtual 86 machine reaches an instruction which raises a General
      Protection Fault (such as CLI or STI), the instruction is emulated (in
      handle_vm86_fault).  However, the emulation ignored the TF bit, so the
      hardware debug interrupt was not invoked after such an emulated instruction
      (and the DOS debugger missed it).
      
      This patch fixes the problem by emulating the hardware debug interrupt as
      the last action before control is returned to the VM86 program.
      Signed-off-by: NPetr Tesarik <kernel@tesarici.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5fd75ebb
    • M
      [PATCH] x86: fix EFI memory map parsing · 7ae65fd3
      Matt Tolentino 提交于
      The memory descriptors that comprise the EFI memory map are not fixed in
      stone such that the size could change in the future.  This uses the memory
      descriptor size obtained from EFI to iterate over the memory map entries
      during boot.  This enables the removal of an x86 specific pad (and ifdef)
      in the EFI header.  I also couldn't stomach the broken up nature of the
      function to put EFI runtime calls into virtual mode any longer so I fixed
      that up a bit as well.
      
      For reference, this patch only impacts x86.
      Signed-off-by: NMatt Tolentino <matthew.e.tolentino@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7ae65fd3
    • V
      [PATCH] hpet: use read_timer_tsc only when CPU has TSC · 4116c527
      Venkatesh Pallipadi 提交于
      Only use read_timer_tsc only when CPU has TSC.  Thanks to Andrea for
      pointing this out.  Should not be issue on any platforms as all recent
      systems that has HPET also has CPUs that supports TSC.  The patch is still
      required for correctness.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4116c527
  3. 30 8月, 2005 1 次提交
    • S
      [PATCH] convert signal handling of NODEFER to act like other Unix boxes. · 69be8f18
      Steven Rostedt 提交于
      It has been reported that the way Linux handles NODEFER for signals is
      not consistent with the way other Unix boxes handle it.  I've written a
      program to test the behavior of how this flag affects signals and had
      several reports from people who ran this on various Unix boxes,
      confirming that Linux seems to be unique on the way this is handled.
      
      The way NODEFER affects signals on other Unix boxes is as follows:
      
      1) If NODEFER is set, other signals in sa_mask are still blocked.
      
      2) If NODEFER is set and the signal is in sa_mask, then the signal is
      still blocked. (Note: this is the behavior of all tested but Linux _and_
      NetBSD 2.0 *).
      
      The way NODEFER affects signals on Linux:
      
      1) If NODEFER is set, other signals are _not_ blocked regardless of
      sa_mask (Even NetBSD doesn't do this).
      
      2) If NODEFER is set and the signal is in sa_mask, then the signal being
      handled is not blocked.
      
      The patch converts signal handling in all current Linux architectures to
      the way most Unix boxes work.
      
      Unix boxes that were tested:  DU4, AIX 5.2, Irix 6.5, NetBSD 2.0, SFU
      3.5 on WinXP, AIX 5.3, Mac OSX, and of course Linux 2.6.13-rcX.
      
      * NetBSD was the only other Unix to behave like Linux on point #2. The
      main concern was brought up by point #1 which even NetBSD isn't like
      Linux.  So with this patch, we leave NetBSD as the lonely one that
      behaves differently here with #2.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      69be8f18
  4. 24 8月, 2005 1 次提交
    • C
      [PATCH] i386: fix incorrect FP signal code · b1daec30
      Chuck Ebbert 提交于
      i386 floating-point exception handling has a bug that can cause error
      code 0 to be sent instead of the proper code during signal delivery.
      
      This is caused by unconditionally checking the IS and c1 bits from the
      FPU status word when they are not always relevant.  The IS bit tells
      whether an exception is a stack fault and is only relevant when the
      exception is IE (invalid operation.) The C1 bit determines whether a
      stack fault is overflow or underflow and is only relevant when IS and IE
      are set.
      Signed-off-by: NChuck Ebbert <76306.1226@compuserve.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b1daec30
  5. 20 8月, 2005 1 次提交
    • S
      [PATCH] Mobil Pentium 4 HT and the NMI · cd3716ab
      Steven Rostedt 提交于
      I'm trying to get the nmi working with my laptop (IBM ThinkPad G41) and after
      debugging it a while, I found that the nmi code doesn't want to set it up for
      this particular CPU.
      
      Here I have:
      
      $ cat /proc/cpuinfo
      processor       : 0
      vendor_id       : GenuineIntel
      cpu family      : 15
      model           : 4
      model name      : Mobile Intel(R) Pentium(R) 4 CPU 3.33GHz
      stepping        : 1
      cpu MHz         : 3320.084
      cache size      : 1024 KB
      physical id     : 0
      siblings        : 2
      core id         : 0
      cpu cores       : 1
      fdiv_bug        : no
      hlt_bug         : no
      f00f_bug        : no
      coma_bug        : no
      fpu             : yes
      fpu_exception   : yes
      cpuid level     : 3
      wp              : yes
      flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
      mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni
      monitor ds_cpl est tm2 cid xtpr
      bogomips        : 6642.39
      
      processor       : 1
      vendor_id       : GenuineIntel
      cpu family      : 15
      model           : 4
      model name      : Mobile Intel(R) Pentium(R) 4 CPU 3.33GHz
      stepping        : 1
      cpu MHz         : 3320.084
      cache size      : 1024 KB
      physical id     : 0
      siblings        : 2
      core id         : 0
      cpu cores       : 1
      fdiv_bug        : no
      hlt_bug         : no
      f00f_bug        : no
      coma_bug        : no
      fpu             : yes
      fpu_exception   : yes
      cpuid level     : 3
      wp              : yes
      flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
      mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe pni
      monitor ds_cpl est tm2 cid xtpr
      bogomips        : 6637.46
      
      And the following code shows:
      
      $ cat linux-2.6.13-rc6/arch/i386/kernel/nmi.c
      
      [...]
      
      void setup_apic_nmi_watchdog (void)
      {
              switch (boot_cpu_data.x86_vendor) {
              case X86_VENDOR_AMD:
                      if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15)
                              return;
                      setup_k7_watchdog();
                      break;
              case X86_VENDOR_INTEL:
                       switch (boot_cpu_data.x86) {
                      case 6:
                              if (boot_cpu_data.x86_model > 0xd)
                                      return;
      
                              setup_p6_watchdog();
                              break;
                      case 15:
                              if (boot_cpu_data.x86_model > 0x3)
                                      return;
      
      Here I get boot_cpu_data.x86_model == 0x4.  So I decided to change it and
      reboot.  I now seem to have a working NMI.  So, unless there's something know
      to be bad about this processor and the NMI.  I'm submitting the following
      patch.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NZwane Mwaikambo <zwane@arm.linux.org.uk>
      Acked-by: NMikael Pettersson <mikpe@csd.uu.se>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cd3716ab
  6. 19 8月, 2005 1 次提交
  7. 02 8月, 2005 3 次提交
    • A
      [PATCH] transmeta: CONFIG_PROC_FS=n build fix · 39bbb07d
      Andrew Morton 提交于
      Fix bug found by Grant Coady <lkml@dodo.com.au>'s autobuild setup.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      39bbb07d
    • E
      [PATCH] disable addres space randomization default on transmeta CPUs · cdf32eaa
      Eric Lammerts 提交于
      We know that the randomisation slows down some workloads on Transmeta CPUs
      by quite large amounts.  We think it's because the CPU needs to recode the
      same x86 instructions when they pop up at a different virtual address after
      a fork+exec.
      
      So disable randomization by default on those CPUs.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cdf32eaa
    • I
      [PATCH] remove sys_set_zone_reclaim() · 6cb54819
      Ingo Molnar 提交于
      This removes sys_set_zone_reclaim() for now.  While i'm sure Martin is
      trying to solve a real problem, we must not hard-code an incomplete and
      insufficient approach into a syscall, because syscalls are pretty much
      for eternity.  I am quite strongly convinced that this syscall must not
      hit v2.6.13 in its current form.
      
      Firstly, the syscall lacks basic syscall design: e.g. it allows the
      global setting of VM policy for unprivileged users. (!) [ Imagine an
      Oracle installation and a SAP installation on the same NUMA box fighting
      over the 'optimal' setting for this flag. What will they do? Will they
      try to set the flag to their own preferred value every second or so? ]
      
      Secondly, it was added based on a single datapoint from Martin:
      
       http://marc.theaimsgroup.com/?l=linux-mm&m=111763597218177&w=2
      
      where Martin characterizes the numbers the following way:
      
       ' Run-to-run variability for "make -j" is huge, so these numbers aren't
         terribly useful except to see that with reclaim the benchmark still
         finishes in a reasonable amount of time. '
      
      in other words: the fundamental problem has likely not been solved, only
      a tendential move into the right direction has been observed, and a
      handful of numbers were picked out of a set of hugely variable results,
      without showing the variability data. How much variance is there
      run-to-run?
      
      I'd really suggest to first walk the walk and see what's needed to get
      stable & predictable kernel compilation numbers on that NUMA box, before
      adding random syscalls to tune a particular aspect of the VM ... which
      approach might not even matter once the whole picture has been analyzed
      and understood!
      
      The third, most important point is that the syscall exposes VM tuning
      internals in a completely unstructured way. What sense does it make to
      have a _GLOBAL_ per-node setting for 'should we go to another node for
      reclaim'? If then it might make sense to do this per-app, via numalib or
      so.
      
      The change is minimalistic in that it doesnt remove the syscall and the
      underlying infrastructure changes, only the user-visible changes.  We
      could perhaps add a CAP_SYS_ADMIN-only sysctl for this hack, a'ka
      /proc/sys/vm/swappiness, but even that looks quite counterproductive
      when the generic approach is that we are trying to reduce the number of
      external factors in the VM balance picture.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6cb54819
  8. 30 7月, 2005 1 次提交