1. 10 9月, 2005 3 次提交
  2. 09 9月, 2005 1 次提交
  3. 08 9月, 2005 19 次提交
  4. 05 9月, 2005 17 次提交
    • P
      [PATCH] uml: SYSEMU: slight cleanup and speedup · 640aa46e
      Paolo 'Blaisorblade' Giarrusso 提交于
      As a follow-up to "UML Support - Ptrace: adds the host SYSEMU support, for
      UML and general usage" (i.e.  uml-support-* in current mm).
      
      Avoid unconditionally jumping to work_pending and code copying, just reuse
      the already existing resume_userspace path.
      
      One interesting note, from Charles P.  Wright, suggested that the API is
      improvable with no downsides for UML (except that it will have to support
      yet another host API, since dropping support for the current API, for UML,
      is not reasonable from users' point of view).
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      CC: Charles P. Wright <cwright@cs.sunysb.edu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      640aa46e
    • B
      [PATCH] SYSEMU: fix sysaudit / singlestep interaction · ab1c23c2
      Bodo Stroesser 提交于
            Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      
      This is simply an adjustment for "Ptrace - i386: fix Syscall Audit interaction
      with singlestep" to work on top of SYSEMU patches, too.  On this patch, I have
      some doubts: I wonder why we need to alter that way ptrace_disable().
      
      I left the patch this way because it has been extensively tested, but I don't
      understand the reason.
      
      The current PTRACE_DETACH handling simply clears child->ptrace; actually this
      is not enough because entry.S just looks at the thread_flags; actually,
      do_syscall_trace checks current->ptrace but I don't think depending on that is
      good, at least for performance, so I think the clearing is done elsewhere.
      For instance, on PTRACE_CONT it's done, but doing PTRACE_DETACH without
      PTRACE_CONT is possible (and happens when gdb crashes and one kills it
      manually).
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      CC: Roland McGrath <roland@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab1c23c2
    • B
      [PATCH] Uml support: add PTRACE_SYSEMU_SINGLESTEP option to i386 · 1b38f006
      Bodo Stroesser 提交于
      This patch implements the new ptrace option PTRACE_SYSEMU_SINGLESTEP, which
      can be used by UML to singlestep a process: it will receive SINGLESTEP
      interceptions for normal instructions and syscalls, but syscall execution will
      be skipped just like with PTRACE_SYSEMU.
      Signed-off-by: NBodo Stroesser <bstroesser@fujitsu-siemens.com>
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1b38f006
    • B
      [PATCH] Uml support: reorganize PTRACE_SYSEMU support · c8c86cec
      Bodo Stroesser 提交于
      With this patch, we change the way we handle switching from PTRACE_SYSEMU to
      PTRACE_{SINGLESTEP,SYSCALL}, to free TIF_SYSCALL_EMU from double use as a
      preparation for PTRACE_SYSEMU_SINGLESTEP extension, without changing the
      behavior of the host kernel.
      Signed-off-by: NBodo Stroesser <bstroesser@fujitsu-siemens.com>
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c8c86cec
    • L
      [PATCH] UML Support - Ptrace: adds the host SYSEMU support, for UML and general usage · ed75e8d5
      Laurent Vivier 提交于
            Jeff Dike <jdike@addtoit.com>,
            Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>,
            Bodo Stroesser <bstroesser@fujitsu-siemens.com>
      
      Adds a new ptrace(2) mode, called PTRACE_SYSEMU, resembling PTRACE_SYSCALL
      except that the kernel does not execute the requested syscall; this is useful
      to improve performance for virtual environments, like UML, which want to run
      the syscall on their own.
      
      In fact, using PTRACE_SYSCALL means stopping child execution twice, on entry
      and on exit, and each time you also have two context switches; with SYSEMU you
      avoid the 2nd stop and so save two context switches per syscall.
      
      Also, some architectures don't have support in the host for changing the
      syscall number via ptrace(), which is currently needed to skip syscall
      execution (UML turns any syscall into getpid() to avoid it being executed on
      the host).  Fixing that is hard, while SYSEMU is easier to implement.
      
      * This version of the patch includes some suggestions of Jeff Dike to avoid
        adding any instructions to the syscall fast path, plus some other little
        changes, by myself, to make it work even when the syscall is executed with
        SYSENTER (but I'm unsure about them). It has been widely tested for quite a
        lot of time.
      
      * Various fixed were included to handle the various switches between
        various states, i.e. when for instance a syscall entry is traced with one of
        PT_SYSCALL / _SYSEMU / _SINGLESTEP and another one is used on exit.
        Basically, this is done by remembering which one of them was used even after
        the call to ptrace_notify().
      
      * We're combining TIF_SYSCALL_EMU with TIF_SYSCALL_TRACE or TIF_SINGLESTEP
        to make do_syscall_trace() notice that the current syscall was started with
        SYSEMU on entry, so that no notification ought to be done in the exit path;
        this is a bit of a hack, so this problem is solved in another way in next
        patches.
      
      * Also, the effects of the patch:
      "Ptrace - i386: fix Syscall Audit interaction with singlestep"
      are cancelled; they are restored back in the last patch of this series.
      
      Detailed descriptions of the patches doing this kind of processing follow (but
      I've already summed everything up).
      
      * Fix behaviour when changing interception kind #1.
      
        In do_syscall_trace(), we check the status of the TIF_SYSCALL_EMU flag
        only after doing the debugger notification; but the debugger might have
        changed the status of this flag because he continued execution with
        PTRACE_SYSCALL, so this is wrong.  This patch fixes it by saving the flag
        status before calling ptrace_notify().
      
      * Fix behaviour when changing interception kind #2:
        avoid intercepting syscall on return when using SYSCALL again.
      
        A guest process switching from using PTRACE_SYSEMU to PTRACE_SYSCALL
        crashes.
      
        The problem is in arch/i386/kernel/entry.S.  The current SYSEMU patch
        inhibits the syscall-handler to be called, but does not prevent
        do_syscall_trace() to be called after this for syscall completion
        interception.
      
        The appended patch fixes this.  It reuses the flag TIF_SYSCALL_EMU to
        remember "we come from PTRACE_SYSEMU and now are in PTRACE_SYSCALL", since
        the flag is unused in the depicted situation.
      
      * Fix behaviour when changing interception kind #3:
        avoid intercepting syscall on return when using SINGLESTEP.
      
        When testing 2.6.9 and the skas3.v6 patch, with my latest patch and had
        problems with singlestepping on UML in SKAS with SYSEMU.  It looped
        receiving SIGTRAPs without moving forward.  EIP of the traced process was
        the same for all SIGTRAPs.
      
      What's missing is to handle switching from PTRACE_SYSCALL_EMU to
      PTRACE_SINGLESTEP in a way very similar to what is done for the change from
      PTRACE_SYSCALL_EMU to PTRACE_SYSCALL_TRACE.
      
      I.e., after calling ptrace(PTRACE_SYSEMU), on the return path, the debugger is
      notified and then wake ups the process; the syscall is executed (or skipped,
      when do_syscall_trace() returns 0, i.e.  when using PTRACE_SYSEMU), and
      do_syscall_trace() is called again.  Since we are on the return path of a
      SYSEMU'd syscall, if the wake up is performed through ptrace(PTRACE_SYSCALL),
      we must still avoid notifying the parent of the syscall exit.  Now, this
      behaviour is extended even to resuming with PTRACE_SINGLESTEP.
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ed75e8d5
    • B
      [PATCH] Ptrace/i386: fix "syscall audit" interaction with singlestep · 94c80b25
      Bodo Stroesser 提交于
            Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      
      Avoid giving two traps for singlestep instead of one, when syscall auditing is
      enabled.
      
      In fact no singlestep trap is sent on syscall entry, only on syscall exit, as
      can be seen in entry.S:
      
      # Note that in this mask _TIF_SINGLESTEP is not tested !!! <<<<<<<<<<<<<<
              testb $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SECCOMP),TI_flags(%ebp)
              jnz syscall_trace_entry
      	...
      syscall_trace_entry:
      	...
      	call do_syscall_trace
      
      But auditing a SINGLESTEP'ed process causes do_syscall_trace to be called, so
      the tracer will get one more trap on the syscall entry path, which it
      shouldn't.
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      CC: Roland McGrath <roland@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      94c80b25
    • S
      [PATCH] add suspend/resume for timer · c3c433e4
      Shaohua Li 提交于
      The timers lack .suspend/.resume methods.  Because of this, jiffies got a
      big compensation after a S3 resume.  And then softlockup watchdog reports
      an oops.  This occured with HPET enabled, but it's also possible for other
      timers.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c3c433e4
    • P
      [PATCH] swsusp: fix remaining u32 vs. pm_message_t confusion · 829ca9a3
      Pavel Machek 提交于
      Fix remaining bits of u32 vs.  pm_message confusion.  Should not break
      anything.
      Signed-off-by: NPavel Machek <pavel@suse.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      829ca9a3
    • P
      [PATCH] ISA DMA suspend for i386 · 795312e7
      Pierre Ossman 提交于
      Reset the ISA DMA controller into a known state after a suspend.  Primary
      concern was reenabling the cascading DMA channel (4).
      Signed-off-by: NPierre Ossman <drzeus@drzeus.cx>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      795312e7
    • B
      [PATCH] unify x86/x86-64 semaphore code · 52fdd089
      Benjamin LaHaise 提交于
      This patch moves the common code in x86 and x86-64's semaphore.c into a
      single file in lib/semaphore-sleepers.c.  The arch specific asm stubs are
      left in the arch tree (in semaphore.c for i386 and in the asm for x86-64).
      There should be no changes in code/functionality with this patch.
      Signed-off-by: NBenjamin LaHaise <benjamin.c.lahaise@intel.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NJeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      52fdd089
    • Z
      [PATCH] i386 boottime for_each_cpu broken · 4ad8d383
      Zwane Mwaikambo 提交于
      for_each_cpu walks through all processors in cpu_possible_map, which is
      defined as cpu_callout_map on i386 and isn't initialised until all
      processors have been booted. This breaks things which do for_each_cpu
      iterations early during boot. So, define cpu_possible_map as a bitmap with
      NR_CPUS bits populated. This was triggered by a patch i'm working on which
      does alloc_percpu before bringing up secondary processors.
      
      From: Alexander Nyberg <alexn@telia.com>
      
      i386-boottime-for_each_cpu-broken.patch
      i386-boottime-for_each_cpu-broken-fix.patch
      
      The SMP version of __alloc_percpu checks the cpu_possible_map before
      allocating memory for a certain cpu.  With the above patches the BSP cpuid
      is never set in cpu_possible_map which breaks CONFIG_SMP on uniprocessor
      machines (as soon as someone tries to dereference something allocated via
      __alloc_percpu, which in fact is never allocated since the cpu is not set
      in cpu_possible_map).
      Signed-off-by: NZwane Mwaikambo <zwane@arm.linux.org.uk>
      Signed-off-by: NAlexander Nyberg <alexn@telia.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4ad8d383
    • Z
      [PATCH] i386: encapsulate copying of pgd entries · d7271b14
      Zachary Amsden 提交于
      Add a clone operation for pgd updates.
      
      This helps complete the encapsulation of updates to page tables (or pages
      about to become page tables) into accessor functions rather than using
      memcpy() to duplicate them.  This is both generally good for consistency
      and also necessary for running in a hypervisor which requires explicit
      updates to page table entries.
      
      The new function is:
      
      clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
      
         dst - pointer to pgd range anwhere on a pgd page
         src - ""
         count - the number of pgds to copy.
      
         dst and src can be on the same page, but the range must not overlap
         and must not cross a page boundary.
      
      Note that I ommitted using this call to copy pgd entries into the
      software suspend page root, since this is not technically a live paging
      structure, rather it is used on resume from suspend.  CC'ing Pavel in case
      he has any feedback on this.
      
      Thanks to Chris Wright for noticing that this could be more optimal in
      PAE compiles by eliminating the memset.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d7271b14
    • G
      [PATCH] x86 NMI: better support for debuggers · 748f2edb
      George Anzinger 提交于
      This patch adds a notify to the die_nmi notify that the system is about to
      be taken down.  If the notify is handled with a NOTIFY_STOP return, the
      system is given a new lease on life.
      
      We also change the nmi watchdog to carry on if die_nmi returns.
      
      This give debug code a chance to a) catch watchdog timeouts and b) possibly
      allow the system to continue, realizing that the time out may be due to
      debugger activities such as single stepping which is usually done with
      "other" cpus held.
      
      Signed-off-by: George Anzinger<george@mvista.com>
      Cc: Keith Owens <kaos@ocs.com.au>
      Signed-off-by: NGeorge Anzinger <george@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      748f2edb
    • Z
      [PATCH] x86: introduce a write acessor for updating the current LDT · f2f30ebc
      Zachary Amsden 提交于
      Introduce a write acessor for updating the current LDT.  This is required
      for hypervisors like Xen that do not allow LDT pages to be directly
      written.
      
      Testing - here's a fun little LDT test that can be trivially modified to
      test limits as well.
      
      /*
       * Copyright (c) 2005, Zachary Amsden (zach@vmware.com)
       * This is licensed under the GPL.
       */
      
      #include <stdio.h>
      #include <signal.h>
      #include <asm/ldt.h>
      #include <asm/segment.h>
      #include <sys/types.h>
      #include <unistd.h>
      #include <sys/mman.h>
      #define __KERNEL__
      #include <asm/page.h>
      
      void main(void)
      {
              struct user_desc desc;
              char *code;
              unsigned long long tsc;
      
              code = (char *)mmap(0, 8192, PROT_EXEC|PROT_READ|PROT_WRITE,
                                       MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
              desc.entry_number = 0;
              desc.base_addr = code;
              desc.limit = 1;
              desc.seg_32bit = 1;
              desc.contents = MODIFY_LDT_CONTENTS_CODE;
              desc.read_exec_only = 0;
              desc.limit_in_pages = 1;
              desc.seg_not_present = 0;
              desc.useable = 1;
              if (modify_ldt(1, &desc, sizeof(desc)) != 0) {
                      perror("modify_ldt");
              }
              printf("code base is 0x%08x\n", (unsigned)code);
              code[0x0ffe] = 0x0f;  /* rdtsc */
              code[0x0fff] = 0x31;
              code[0x1000] = 0xcb;  /* lret */
              __asm__ __volatile("lcall $7,$0xffe" : "=A" (tsc));
              printf("TSC is 0x%016llx\n", tsc);
      }
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f2f30ebc
    • Z
      [PATCH] x86: remove redundant TSS clearing · e9f86e35
      Zachary Amsden 提交于
      When reviewing GDT updates, I found the code:
      
      	set_tss_desc(cpu,t);	/* This just modifies memory; ... */
              per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS].b &= 0xfffffdff;
      
      This second line is unnecessary, since set_tss_desc() has already cleared
      the busy bit.
      
      Commented disassembly, line 1:
      
      c028b8bd:       8b 0c 86                mov    (%esi,%eax,4),%ecx
      c028b8c0:       01 cb                   add    %ecx,%ebx
      c028b8c2:       8d 0c 39                lea    (%ecx,%edi,1),%ecx
      
        => %ecx = per_cpu(cpu_gdt_table, cpu)
      
      c028b8c5:       8d 91 80 00 00 00       lea    0x80(%ecx),%edx
      
        => %edx = &per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS]
      
      c028b8cb:       66 c7 42 00 73 20       movw   $0x2073,0x0(%edx)
      c028b8d1:       66 89 5a 02             mov    %bx,0x2(%edx)
      c028b8d5:       c1 cb 10                ror    $0x10,%ebx
      c028b8d8:       88 5a 04                mov    %bl,0x4(%edx)
      c028b8db:       c6 42 05 89             movb   $0x89,0x5(%edx)
      
        => ((char *)%edx)[5] = 0x89
        (equivalent) ((char *)per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS])[5] = 0x89
      
      c028b8df:       c6 42 06 00             movb   $0x0,0x6(%edx)
      c028b8e3:       88 7a 07                mov    %bh,0x7(%edx)
      c028b8e6:       c1 cb 10                ror    $0x10,%ebx
      
        => other bits
      
      Commented disassembly, line 2:
      
      c028b8e9:       8b 14 86                mov    (%esi,%eax,4),%edx
      c028b8ec:       8d 04 3a                lea    (%edx,%edi,1),%eax
      
        => %eax = per_cpu(cpu_gdt_table, cpu)
      
      c028b8ef:       81 a0 84 00 00 00 ff    andl   $0xfffffdff,0x84(%eax)
      
        => per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS].b &= 0xfffffdff;
        (equivalent) ((char *)per_cpu(cpu_gdt_table, cpu)[GDT_ENTRY_TSS])[5] &= 0xfd
      
      Note that (0x89 & ~0xfd) == 0; i.e, set_tss_desc(cpu,t) has already stored
      the type field in the GDT with the busy bit clear.
      
      Eliminating redundant and obscure code is always a good thing; in fact, I
      pointed out this same optimization many moons ago in arch/i386/setup.c,
      back when it used to be called that.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e9f86e35
    • Z
      [PATCH] x86: make IOPL explicit · a5201129
      Zachary Amsden 提交于
      The pushf/popf in switch_to are ONLY used to switch IOPL.  Making this
      explicit in C code is more clear.  This pushf/popf pair was added as a
      bugfix for leaking IOPL to unprivileged processes when using
      sysenter/sysexit based system calls (sysexit does not restore flags).
      
      When requesting an IOPL change in sys_iopl(), it is just as easy to change
      the current flags and the flags in the stack image (in case an IRET is
      required), but there is no reason to force an IRET if we came in from the
      SYSENTER path.
      
      This change is the minimal solution for supporting a paravirtualized Linux
      kernel that allows user processes to run with I/O privilege.  Other
      solutions require radical rewrites of part of the low level fault / system
      call handling code, or do not fully support sysenter based system calls.
      
      Unfortunately, this added one field to the thread_struct.  But as a bonus,
      on P4, the fastest time measured for switch_to() went from 312 to 260
      cycles, a win of about 17% in the fast case through this performance
      critical path.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a5201129
    • Z
      [PATCH] x86: privilege cleanup · 0998e422
      Zachary Amsden 提交于
      Privilege checking cleanup.  Originally, these diffs were much greater, but
      recent cleanups in Linux have already done much of the cleanup.  I added
      some explanatory comments in places where the reasoning behind certain
      tests is rather subtle.
      
      Also, in traps.c, we can skip the user_mode check in handle_BUG().  The
      reason is, there are only two call chains - one via die_if_kernel() and one
      via do_page_fault(), both entering from die().  Both of these paths already
      ensure that a kernel mode failure has happened.  Also, the original check
      here, if (user_mode(regs)) was insufficient anyways, since it would not
      rule out BUG faults from V8086 mode execution.
      
      Saving the %ss segment in show_regs() rather than assuming a fixed value
      also gives better information about the current kernel state in the
      register dump.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0998e422