1. 08 9月, 2005 5 次提交
  2. 05 9月, 2005 35 次提交
    • P
      [PATCH] uml: SYSEMU: slight cleanup and speedup · 640aa46e
      Paolo 'Blaisorblade' Giarrusso 提交于
      As a follow-up to "UML Support - Ptrace: adds the host SYSEMU support, for
      UML and general usage" (i.e.  uml-support-* in current mm).
      
      Avoid unconditionally jumping to work_pending and code copying, just reuse
      the already existing resume_userspace path.
      
      One interesting note, from Charles P.  Wright, suggested that the API is
      improvable with no downsides for UML (except that it will have to support
      yet another host API, since dropping support for the current API, for UML,
      is not reasonable from users' point of view).
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      CC: Charles P. Wright <cwright@cs.sunysb.edu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      640aa46e
    • B
      [PATCH] SYSEMU: fix sysaudit / singlestep interaction · ab1c23c2
      Bodo Stroesser 提交于
            Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      
      This is simply an adjustment for "Ptrace - i386: fix Syscall Audit interaction
      with singlestep" to work on top of SYSEMU patches, too.  On this patch, I have
      some doubts: I wonder why we need to alter that way ptrace_disable().
      
      I left the patch this way because it has been extensively tested, but I don't
      understand the reason.
      
      The current PTRACE_DETACH handling simply clears child->ptrace; actually this
      is not enough because entry.S just looks at the thread_flags; actually,
      do_syscall_trace checks current->ptrace but I don't think depending on that is
      good, at least for performance, so I think the clearing is done elsewhere.
      For instance, on PTRACE_CONT it's done, but doing PTRACE_DETACH without
      PTRACE_CONT is possible (and happens when gdb crashes and one kills it
      manually).
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      CC: Roland McGrath <roland@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab1c23c2
    • B
      [PATCH] Uml support: add PTRACE_SYSEMU_SINGLESTEP option to i386 · 1b38f006
      Bodo Stroesser 提交于
      This patch implements the new ptrace option PTRACE_SYSEMU_SINGLESTEP, which
      can be used by UML to singlestep a process: it will receive SINGLESTEP
      interceptions for normal instructions and syscalls, but syscall execution will
      be skipped just like with PTRACE_SYSEMU.
      Signed-off-by: NBodo Stroesser <bstroesser@fujitsu-siemens.com>
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1b38f006
    • B
      [PATCH] Uml support: reorganize PTRACE_SYSEMU support · c8c86cec
      Bodo Stroesser 提交于
      With this patch, we change the way we handle switching from PTRACE_SYSEMU to
      PTRACE_{SINGLESTEP,SYSCALL}, to free TIF_SYSCALL_EMU from double use as a
      preparation for PTRACE_SYSEMU_SINGLESTEP extension, without changing the
      behavior of the host kernel.
      Signed-off-by: NBodo Stroesser <bstroesser@fujitsu-siemens.com>
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c8c86cec
    • L
      [PATCH] UML Support - Ptrace: adds the host SYSEMU support, for UML and general usage · ed75e8d5
      Laurent Vivier 提交于
            Jeff Dike <jdike@addtoit.com>,
            Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>,
            Bodo Stroesser <bstroesser@fujitsu-siemens.com>
      
      Adds a new ptrace(2) mode, called PTRACE_SYSEMU, resembling PTRACE_SYSCALL
      except that the kernel does not execute the requested syscall; this is useful
      to improve performance for virtual environments, like UML, which want to run
      the syscall on their own.
      
      In fact, using PTRACE_SYSCALL means stopping child execution twice, on entry
      and on exit, and each time you also have two context switches; with SYSEMU you
      avoid the 2nd stop and so save two context switches per syscall.
      
      Also, some architectures don't have support in the host for changing the
      syscall number via ptrace(), which is currently needed to skip syscall
      execution (UML turns any syscall into getpid() to avoid it being executed on
      the host).  Fixing that is hard, while SYSEMU is easier to implement.
      
      * This version of the patch includes some suggestions of Jeff Dike to avoid
        adding any instructions to the syscall fast path, plus some other little
        changes, by myself, to make it work even when the syscall is executed with
        SYSENTER (but I'm unsure about them). It has been widely tested for quite a
        lot of time.
      
      * Various fixed were included to handle the various switches between
        various states, i.e. when for instance a syscall entry is traced with one of
        PT_SYSCALL / _SYSEMU / _SINGLESTEP and another one is used on exit.
        Basically, this is done by remembering which one of them was used even after
        the call to ptrace_notify().
      
      * We're combining TIF_SYSCALL_EMU with TIF_SYSCALL_TRACE or TIF_SINGLESTEP
        to make do_syscall_trace() notice that the current syscall was started with
        SYSEMU on entry, so that no notification ought to be done in the exit path;
        this is a bit of a hack, so this problem is solved in another way in next
        patches.
      
      * Also, the effects of the patch:
      "Ptrace - i386: fix Syscall Audit interaction with singlestep"
      are cancelled; they are restored back in the last patch of this series.
      
      Detailed descriptions of the patches doing this kind of processing follow (but
      I've already summed everything up).
      
      * Fix behaviour when changing interception kind #1.
      
        In do_syscall_trace(), we check the status of the TIF_SYSCALL_EMU flag
        only after doing the debugger notification; but the debugger might have
        changed the status of this flag because he continued execution with
        PTRACE_SYSCALL, so this is wrong.  This patch fixes it by saving the flag
        status before calling ptrace_notify().
      
      * Fix behaviour when changing interception kind #2:
        avoid intercepting syscall on return when using SYSCALL again.
      
        A guest process switching from using PTRACE_SYSEMU to PTRACE_SYSCALL
        crashes.
      
        The problem is in arch/i386/kernel/entry.S.  The current SYSEMU patch
        inhibits the syscall-handler to be called, but does not prevent
        do_syscall_trace() to be called after this for syscall completion
        interception.
      
        The appended patch fixes this.  It reuses the flag TIF_SYSCALL_EMU to
        remember "we come from PTRACE_SYSEMU and now are in PTRACE_SYSCALL", since
        the flag is unused in the depicted situation.
      
      * Fix behaviour when changing interception kind #3:
        avoid intercepting syscall on return when using SINGLESTEP.
      
        When testing 2.6.9 and the skas3.v6 patch, with my latest patch and had
        problems with singlestepping on UML in SKAS with SYSEMU.  It looped
        receiving SIGTRAPs without moving forward.  EIP of the traced process was
        the same for all SIGTRAPs.
      
      What's missing is to handle switching from PTRACE_SYSCALL_EMU to
      PTRACE_SINGLESTEP in a way very similar to what is done for the change from
      PTRACE_SYSCALL_EMU to PTRACE_SYSCALL_TRACE.
      
      I.e., after calling ptrace(PTRACE_SYSEMU), on the return path, the debugger is
      notified and then wake ups the process; the syscall is executed (or skipped,
      when do_syscall_trace() returns 0, i.e.  when using PTRACE_SYSEMU), and
      do_syscall_trace() is called again.  Since we are on the return path of a
      SYSEMU'd syscall, if the wake up is performed through ptrace(PTRACE_SYSCALL),
      we must still avoid notifying the parent of the syscall exit.  Now, this
      behaviour is extended even to resuming with PTRACE_SINGLESTEP.
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ed75e8d5
    • B
      [PATCH] Ptrace/i386: fix "syscall audit" interaction with singlestep · 94c80b25
      Bodo Stroesser 提交于
            Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      
      Avoid giving two traps for singlestep instead of one, when syscall auditing is
      enabled.
      
      In fact no singlestep trap is sent on syscall entry, only on syscall exit, as
      can be seen in entry.S:
      
      # Note that in this mask _TIF_SINGLESTEP is not tested !!! <<<<<<<<<<<<<<
              testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SECCOMP),TI_flags(%ebp)
              jnz syscall_trace_entry
      	...
      syscall_trace_entry:
      	...
      	call do_syscall_trace
      
      But auditing a SINGLESTEP'ed process causes do_syscall_trace to be called, so
      the tracer will get one more trap on the syscall entry path, which it
      shouldn't.
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      CC: Roland McGrath <roland@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      94c80b25
    • S
      [PATCH] add suspend/resume for timer · c3c433e4
      Shaohua Li 提交于
      The timers lack .suspend/.resume methods.  Because of this, jiffies got a
      big compensation after a S3 resume.  And then softlockup watchdog reports
      an oops.  This occured with HPET enabled, but it's also possible for other
      timers.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c3c433e4
    • P
      [PATCH] swsusp: fix remaining u32 vs. pm_message_t confusion · 829ca9a3
      Pavel Machek 提交于
      Fix remaining bits of u32 vs.  pm_message confusion.  Should not break
      anything.
      Signed-off-by: NPavel Machek <pavel@suse.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      829ca9a3
    • P
      [PATCH] ISA DMA suspend for i386 · 795312e7
      Pierre Ossman 提交于
      Reset the ISA DMA controller into a known state after a suspend.  Primary
      concern was reenabling the cascading DMA channel (4).
      Signed-off-by: NPierre Ossman <drzeus@drzeus.cx>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      795312e7
    • B
      [PATCH] unify x86/x86-64 semaphore code · 52fdd089
      Benjamin LaHaise 提交于
      This patch moves the common code in x86 and x86-64's semaphore.c into a
      single file in lib/semaphore-sleepers.c.  The arch specific asm stubs are
      left in the arch tree (in semaphore.c for i386 and in the asm for x86-64).
      There should be no changes in code/functionality with this patch.
      Signed-off-by: NBenjamin LaHaise <benjamin.c.lahaise@intel.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NJeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      52fdd089
    • Z
      [PATCH] i386 boottime for_each_cpu broken · 4ad8d383
      Zwane Mwaikambo 提交于
      for_each_cpu walks through all processors in cpu_possible_map, which is
      defined as cpu_callout_map on i386 and isn't initialised until all
      processors have been booted. This breaks things which do for_each_cpu
      iterations early during boot. So, define cpu_possible_map as a bitmap with
      NR_CPUS bits populated. This was triggered by a patch i'm working on which
      does alloc_percpu before bringing up secondary processors.
      
      From: Alexander Nyberg <alexn@telia.com>
      
      i386-boottime-for_each_cpu-broken.patch
      i386-boottime-for_each_cpu-broken-fix.patch
      
      The SMP version of __alloc_percpu checks the cpu_possible_map before
      allocating memory for a certain cpu.  With the above patches the BSP cpuid
      is never set in cpu_possible_map which breaks CONFIG_SMP on uniprocessor
      machines (as soon as someone tries to dereference something allocated via
      __alloc_percpu, which in fact is never allocated since the cpu is not set
      in cpu_possible_map).
      Signed-off-by: NZwane Mwaikambo <zwane@arm.linux.org.uk>
      Signed-off-by: NAlexander Nyberg <alexn@telia.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4ad8d383
    • Z
      [PATCH] i386: encapsulate copying of pgd entries · d7271b14
      Zachary Amsden 提交于
      Add a clone operation for pgd updates.
      
      This helps complete the encapsulation of updates to page tables (or pages
      about to become page tables) into accessor functions rather than using
      memcpy() to duplicate them.  This is both generally good for consistency
      and also necessary for running in a hypervisor which requires explicit
      updates to page table entries.
      
      The new function is:
      
      clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
      
         dst - pointer to pgd range anwhere on a pgd page
         src - ""
         count - the number of pgds to copy.
      
         dst and src can be on the same page, but the range must not overlap
         and must not cross a page boundary.
      
      Note that I ommitted using this call to copy pgd entries into the
      software suspend page root, since this is not technically a live paging
      structure, rather it is used on resume from suspend.  CC'ing Pavel in case
      he has any feedback on this.
      
      Thanks to Chris Wright for noticing that this could be more optimal in
      PAE compiles by eliminating the memset.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d7271b14
    • G
      [PATCH] x86 NMI: better support for debuggers · 748f2edb
      George Anzinger 提交于
      This patch adds a notify to the die_nmi notify that the system is about to
      be taken down.  If the notify is handled with a NOTIFY_STOP return, the
      system is given a new lease on life.
      
      We also change the nmi watchdog to carry on if die_nmi returns.
      
      This give debug code a chance to a) catch watchdog timeouts and b) possibly
      allow the system to continue, realizing that the time out may be due to
      debugger activities such as single stepping which is usually done with
      "other" cpus held.
      
      Signed-off-by: George Anzinger<george@mvista.com>
      Cc: Keith Owens <kaos@ocs.com.au>
      Signed-off-by: NGeorge Anzinger <george@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      748f2edb
    • Z
      [PATCH] x86: introduce a write acessor for updating the current LDT · f2f30ebc
      Zachary Amsden 提交于
      Introduce a write acessor for updating the current LDT.  This is required
      for hypervisors like Xen that do not allow LDT pages to be directly
      written.
      
      Testing - here's a fun little LDT test that can be trivially modified to
      test limits as well.
      
      /*
       * Copyright (c) 2005, Zachary Amsden (zach@vmware.com)
       * This is licensed under the GPL.
       */
      
      #include <stdio.h>
      #include <signal.h>
      #include <asm/ldt.h>
      #include <asm/segment.h>
      #include <sys/types.h>
      #include <unistd.h>
      #include <sys/mman.h>
      #define __KERNEL__
      #include <asm/page.h>
      
      void main(void)
      {
              struct user_desc desc;
              char *code;
              unsigned long long tsc;
      
              code = (char *)mmap(0, 8192, PROT_EXEC|PROT_READ|PROT_WRITE,
                                       MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
              desc.entry_number = 0;
              desc.base_addr = code;
              desc.limit = 1;
              desc.seg_32bit = 1;
              desc.contents = MODIFY_LDT_CONTENTS_CODE;
              desc.read_exec_only = 0;
              desc.limit_in_pages = 1;
              desc.seg_not_present = 0;
              desc.useable = 1;
              if (modify_ldt(1, &desc, sizeof(desc)) != 0) {
                      perror("modify_ldt");
              }
              printf("code base is 0x%08x\n", (unsigned)code);
              code[0x0ffe] = 0x0f;  /* rdtsc */
              code[0x0fff] = 0x31;
              code[0x1000] = 0xcb;  /* lret */
              __asm__ __volatile("lcall $7,$0xffe" : "=A" (tsc));
              printf("TSC is 0x%016llx\n", tsc);
      }
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f2f30ebc
    • Z
      [PATCH] x86: remove redundant TSS clearing · e9f86e35
      Zachary Amsden 提交于
      When reviewing GDT updates, I found the code:
      
      	set_tss_desc(cpu,t);	/* This just modifies memory; ... */
              per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS].b &= 0xfffffdff;
      
      This second line is unnecessary, since set_tss_desc() has already cleared
      the busy bit.
      
      Commented disassembly, line 1:
      
      c028b8bd:       8b 0c 86                mov    (%esi,%eax,4),%ecx
      c028b8c0:       01 cb                   add    %ecx,%ebx
      c028b8c2:       8d 0c 39                lea    (%ecx,%edi,1),%ecx
      
        => %ecx = per_cpu(cpu_gdt_table, cpu)
      
      c028b8c5:       8d 91 80 00 00 00       lea    0x80(%ecx),%edx
      
        => %edx = &per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS]
      
      c028b8cb:       66 c7 42 00 73 20       movw   $0x2073,0x0(%edx)
      c028b8d1:       66 89 5a 02             mov    %bx,0x2(%edx)
      c028b8d5:       c1 cb 10                ror    $0x10,%ebx
      c028b8d8:       88 5a 04                mov    %bl,0x4(%edx)
      c028b8db:       c6 42 05 89             movb   $0x89,0x5(%edx)
      
        => ((char *)%edx)[5] = 0x89
        (equivalent) ((char *)per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS])[5] = 0x89
      
      c028b8df:       c6 42 06 00             movb   $0x0,0x6(%edx)
      c028b8e3:       88 7a 07                mov    %bh,0x7(%edx)
      c028b8e6:       c1 cb 10                ror    $0x10,%ebx
      
        => other bits
      
      Commented disassembly, line 2:
      
      c028b8e9:       8b 14 86                mov    (%esi,%eax,4),%edx
      c028b8ec:       8d 04 3a                lea    (%edx,%edi,1),%eax
      
        => %eax = per_cpu(cpu_gdt_table, cpu)
      
      c028b8ef:       81 a0 84 00 00 00 ff    andl   $0xfffffdff,0x84(%eax)
      
        => per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS].b &= 0xfffffdff;
        (equivalent) ((char *)per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS])[5] &= 0xfd
      
      Note that (0x89 & ~0xfd) == 0; i.e, set_tss_desc(cpu,t) has already stored
      the type field in the GDT with the busy bit clear.
      
      Eliminating redundant and obscure code is always a good thing; in fact, I
      pointed out this same optimization many moons ago in arch/i386/setup.c,
      back when it used to be called that.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e9f86e35
    • Z
      [PATCH] x86: make IOPL explicit · a5201129
      Zachary Amsden 提交于
      The pushf/popf in switch_to are ONLY used to switch IOPL.  Making this
      explicit in C code is more clear.  This pushf/popf pair was added as a
      bugfix for leaking IOPL to unprivileged processes when using
      sysenter/sysexit based system calls (sysexit does not restore flags).
      
      When requesting an IOPL change in sys_iopl(), it is just as easy to change
      the current flags and the flags in the stack image (in case an IRET is
      required), but there is no reason to force an IRET if we came in from the
      SYSENTER path.
      
      This change is the minimal solution for supporting a paravirtualized Linux
      kernel that allows user processes to run with I/O privilege.  Other
      solutions require radical rewrites of part of the low level fault / system
      call handling code, or do not fully support sysenter based system calls.
      
      Unfortunately, this added one field to the thread_struct.  But as a bonus,
      on P4, the fastest time measured for switch_to() went from 312 to 260
      cycles, a win of about 17% in the fast case through this performance
      critical path.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a5201129
    • Z
      [PATCH] x86: privilege cleanup · 0998e422
      Zachary Amsden 提交于
      Privilege checking cleanup.  Originally, these diffs were much greater, but
      recent cleanups in Linux have already done much of the cleanup.  I added
      some explanatory comments in places where the reasoning behind certain
      tests is rather subtle.
      
      Also, in traps.c, we can skip the user_mode check in handle_BUG().  The
      reason is, there are only two call chains - one via die_if_kernel() and one
      via do_page_fault(), both entering from die().  Both of these paths already
      ensure that a kernel mode failure has happened.  Also, the original check
      here, if (user_mode(regs)) was insufficient anyways, since it would not
      rule out BUG faults from V8086 mode execution.
      
      Saving the %ss segment in show_regs() rather than assuming a fixed value
      also gives better information about the current kernel state in the
      register dump.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0998e422
    • Z
      [PATCH] x86: more asm cleanups · f2ab4461
      Zachary Amsden 提交于
      Some more assembler cleanups I noticed along the way.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f2ab4461
    • Z
      [PATCH] i386: use set_pte macros in a couple places where they were missing · c9b02a24
      Zachary Amsden 提交于
      Also, setting PDPEs in PAE mode does not require atomic operations, since the
      PDPEs are cached by the processor, and only reloaded on an explicit or
      implicit reload of CR3.
      
      Since the four PDPEs must always be present in an active root, and the kernel
      PDPE is never updated, we are safe even from SMIs and interrupts / NMIs using
      task gates (which reload CR3).  Actually, much of this is moot, since the user
      PDPEs are never updated either, and the only usage of task gates is by the
      doublefault handler.  It appears the only place PGDs get updated in PAE mode
      is in init_low_mappings() / zap_low_mapping() for initial page table creation
      and recovery from ACPI sleep state, and these sites are safe by inspection.
      Getting rid of the cmpxchg8b saves code space and 720 cycles in pgd_alloc on
      P4.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c9b02a24
    • Z
      [PATCH] i386: load_tls() fix · e7a2ff59
      Zachary Amsden 提交于
      Subtle fix: load_TLS has been moved after saving %fs and %gs segments to avoid
      creating non-reversible segments.  This could conceivably cause a bug if the
      kernel ever needed to save and restore fs/gs from the NMI handler.  It
      currently does not, but this is the safest approach to avoiding fs/gs
      corruption.  SMIs are safe, since SMI saves the descriptor hidden state.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e7a2ff59
    • Z
      [PATCH] i386: inline assembler: cleanup and encapsulate descriptor and task register management · 4d37e7e3
      Zachary Amsden 提交于
      i386 inline assembler cleanup.
      
      This change encapsulates descriptor and task register management.  Also,
      it is possible to improve assembler generation in two cases; savesegment
      may store the value in a register instead of a memory location, which
      allows GCC to optimize stack variables into registers, and MOV MEM, SEG
      is always a 16-bit write to memory, making the casting in math-emu
      unnecessary.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4d37e7e3
    • Z
      [PATCH] i386: cleanup serialize msr · 245067d1
      Zachary Amsden 提交于
      i386 arch cleanup.  Introduce the serialize macro to serialize processor
      state.  Why the microcode update needs it I am not quite sure, since wrmsr()
      is already a serializing instruction, but it is a microcode update, so I will
      keep the semantic the same, since this could be a timing workaround.  As far
      as I can tell, this has always been there since the original microcode update
      source.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      245067d1
    • Z
      [PATCH] i386: inline asm cleanup · 4bb0d3ec
      Zachary Amsden 提交于
      i386 Inline asm cleanup.  Use cr/dr accessor functions.
      
      Also, a potential bugfix.  Also, some CR accessors really should be volatile.
      Reads from CR0 (numeric state may change in an exception handler), writes to
      CR4 (flipping CR4.TSD) and reads from CR2 (page fault) prevent instruction
      re-ordering.  I did not add memory clobber to CR3 / CR4 / CR0 updates, as it
      was not there to begin with, and in no case should kernel memory be clobbered,
      except when doing a TLB flush, which already has memory clobber.
      
      I noticed that page invalidation does not have a memory clobber.  I can't find
      a bug as a result, but there is definitely a potential for a bug here:
      
      #define __flush_tlb_single(addr) \
      	__asm__ __volatile__("invlpg %0": :"m" (*(char *) addr))
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4bb0d3ec
    • R
      [PATCH] i386: clean up vDSO alignment padding · 2a0694d1
      Roland McGrath 提交于
      This makes the vDSO use nops for all its padding around instructions,
      rather than sometimes zeros, and nop-pads the end of the area containing
      instructions to a 32-byte cache line, to keep text and data in separate
      lines.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2a0694d1
    • N
      [PATCH] ES7000 platform update (i386) · 56f1d5d5
      Natalie.Protasevich@unisys.com 提交于
      This is subarch update for ES7000.  I've modified platform check code and
      removed unnecessary OEM table parsing for newer systems that don't use OEM
      information during boot.  Parsing the table in fact is causing problems,
      and the platform doesn't get recognized.  The patch only affects the ES7000
      subach.
      
      Signed-off-by: <Natalie.Protasevich@unisys.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      56f1d5d5
    • V
    • V
      [PATCH] x86: sutomatically enable bigsmp when we have more than 8 CPUs · 911a62d4
      Venkatesh Pallipadi 提交于
      i386 generic subarchitecture requires explicit dmi strings or command line
      to enable bigsmp mode.  The patch below removes that restriction, and uses
      bigsmp as soon as it finds more than 8 logical CPUs, Intel processors and
      xAPIC support.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      911a62d4
    • V
      [PATCH] kdump: Save parameter segment in protected mode (x86) · 484b90c4
      Vivek Goyal 提交于
      o With introduction of kexec as boot-loader, the assumption that parameter
        segment will always be loaded at lower address than kernel and will be
        addressable by early bootup page tables is no longer valid. In kexec on
        panic case parameter segment might well be loaded beyond kernel image and
        might not be addressable by early boot page tables.
      o This case might hit in the scenario where user has reserved a chunk of
        memory for second kernel, for example 16MB to 64MB, and has also built
        second kernel for physical memory location 16MB. In this case kexec has no
        choice but to load the parameter segment at a higher address than new kernel
        image at safe location where new kernel does not stomp it.
      o Though problem should automatically go away once relocatable kernel for i386
        is in place and kexec can determine the location of new kernel at run time
        and load parameter segment at lower address than kernel image. But till then
        this patch can go in (assuming it does not break something else).
      o This patch moves up the boot parameter saving code. Now boot parameters
        are copied out in protected mode before page tables are initialized. This
        will ensure that parameter segment is always addressable irrespective of
        its physical location.
      Signed-off-by: NVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      484b90c4
    • P
      [PATCH] vm86: Honor TF bit when emulating an instruction · 5fd75ebb
      Petr Tesarik 提交于
      If the virtual 86 machine reaches an instruction which raises a General
      Protection Fault (such as CLI or STI), the instruction is emulated (in
      handle_vm86_fault).  However, the emulation ignored the TF bit, so the
      hardware debug interrupt was not invoked after such an emulated instruction
      (and the DOS debugger missed it).
      
      This patch fixes the problem by emulating the hardware debug interrupt as
      the last action before control is returned to the VM86 program.
      Signed-off-by: NPetr Tesarik <kernel@tesarici.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5fd75ebb
    • M
      [PATCH] x86: fix EFI memory map parsing · 7ae65fd3
      Matt Tolentino 提交于
      The memory descriptors that comprise the EFI memory map are not fixed in
      stone such that the size could change in the future.  This uses the memory
      descriptor size obtained from EFI to iterate over the memory map entries
      during boot.  This enables the removal of an x86 specific pad (and ifdef)
      in the EFI header.  I also couldn't stomach the broken up nature of the
      function to put EFI runtime calls into virtual mode any longer so I fixed
      that up a bit as well.
      
      For reference, this patch only impacts x86.
      Signed-off-by: NMatt Tolentino <matthew.e.tolentino@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7ae65fd3
    • V
      [PATCH] hpet: use read_timer_tsc only when CPU has TSC · 4116c527
      Venkatesh Pallipadi 提交于
      Only use read_timer_tsc only when CPU has TSC.  Thanks to Andrea for
      pointing this out.  Should not be issue on any platforms as all recent
      systems that has HPET also has CPUs that supports TSC.  The patch is still
      required for correctness.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4116c527
    • I
      [PATCH] x86: compress the stack layout of do_page_fault() · 869f96a0
      Ingo Molnar 提交于
      This patch pushes the creation of a rare signal frame (SIGBUS or SIGSEGV)
      into a separate function, thus saving stackspace in the main
      do_page_fault() stackframe.  The effect is 132 bytes less of stack used by
      the typical do_page_fault() invocation - resulting in a denser
      cache-layout.
      
      (Another minor effect is that in case of kernel crashes that come from a
      pagefault, we add less space to the already existing frame, giving the
      crash functions a slightly higher chance to do their stuff without
      overflowing the stack.)
      
      (The changes also result in slightly cleaner code.)
      
      argument bugfix from "Guillaume C." <guichaz@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      869f96a0
    • C
      [PATCH] remove hugetlb_clean_stale_pgtable() and fix huge_pte_alloc() · 0e5c9f39
      Chen, Kenneth W 提交于
      I don't think we need to call hugetlb_clean_stale_pgtable() anymore
      in 2.6.13 because of the rework with free_pgtables().  It now collect
      all the pte page at the time of munmap.  It used to only collect page
      table pages when entire one pgd can be freed and left with staled pte
      pages.  Not anymore with 2.6.13.  This function will never be called
      and We should turn it into a BUG_ON.
      
      I also spotted two problems here, not Adam's fault :-)
      (1) in huge_pte_alloc(), it looks like a bug to me that pud is not
          checked before calling pmd_alloc()
      (2) in hugetlb_clean_stale_pgtable(), it also missed a call to
          pmd_free_tlb.  I think a tlb flush is required to flush the mapping
          for the page table itself when we clear out the pmd pointing to a
          pte page.  However, since hugetlb_clean_stale_pgtable() is never
          called, so it won't trigger the bug.
      Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
      Cc: Adam Litke <agl@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0e5c9f39
    • A
      [PATCH] hugetlb: check p?d_present in huge_pte_offset() · 02b0ccef
      Adam Litke 提交于
      For demand faulting, we cannot assume that the page tables will be
      populated.  Do what the rest of the architectures do and test p?d_present()
      while walking down the page table.
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: <linux-mm@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      02b0ccef
    • A
      [PATCH] hugetlb: move stale pte check into huge_pte_alloc() · 7bf07f3d
      Adam Litke 提交于
      Initial Post (Wed, 17 Aug 2005)
      
      This patch moves the
      	if (! pte_none(*pte))
      		hugetlb_clean_stale_pgtable(pte);
      logic into huge_pte_alloc() so all of its callers can be immune to the bug
      described by Kenneth Chen at http://lkml.org/lkml/2004/6/16/246
      
      > It turns out there is a bug in hugetlb_prefault(): with 3 level page table,
      > huge_pte_alloc() might return a pmd that points to a PTE page. It happens
      > if the virtual address for hugetlb mmap is recycled from previously used
      > normal page mmap. free_pgtables() might not scrub the pmd entry on
      > munmap and hugetlb_prefault skips on any pmd presence regardless what type
      > it is.
      
      Unless I am missing something, it seems more correct to place the check inside
      huge_pte_alloc() to prevent a the same bug wherever a huge pte is allocated.
      It also allows checking for this condition when lazily faulting huge pages
      later in the series.
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: <linux-mm@kvack.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7bf07f3d