1. 12 3月, 2008 3 次提交
    • T
      x86: remove quicklists · 985a34bd
      Thomas Gleixner 提交于
      quicklists cause a serious memory leak on 32-bit x86,
      as documented at:
      
        http://bugzilla.kernel.org/show_bug.cgi?id=9991
      
      the reason is that the quicklist pool is a special-purpose
      cache that grows out of proportion. It is not accounted for
      anywhere and users have no way to even realize that it's
      the quicklists that are causing RAM usage spikes. It was
      supposed to be a relatively small pool, but as demonstrated
      by KOSAKI Motohiro, they can grow as large as:
      
        Quicklists:    1194304 kB
      
      given how much trouble this code has caused historically,
      and given that Andrew objected to its introduction on x86
      (years ago), the best option at this point is to remove them.
      
      [ any performance benefits of caching constructed pgds should
        be implemented in a more generic way (possibly within the page
        allocator), while still allowing constructed pages to be
        allocated by other workloads. ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      985a34bd
    • R
      x86: ia32 syscall restart fix · 40f0933d
      Roland McGrath 提交于
      The code to restart syscalls after signals depends on checking for a
      negative orig_ax, and for particular negative -ERESTART* values in ax.
      These fields are 64 bits and for a 32-bit task they get zero-extended.
      The syscall restart behavior is lost, a regression from a native 32-bit
      kernel and from 64-bit tasks' behavior.
      
      This patch fixes the problem by doing sign-extension where it matters.
      
      For orig_ax, the only time the value should be -1 but winds up as
      0x0ffffffff is via a 32-bit ptrace call. So the patch changes ptrace to
      sign-extend the 32-bit orig_eax value when it's stored; it doesn't
      change the checks on orig_ax, though it uses the new current_syscall()
      inline to better document the subtle importance of the used of
      signedness there.
      
      The ax value is stored a lot of ways and it seems hard to get them all
      sign-extended at their origins. So for that, we use the
      current_syscall_ret() to sign-extend it only for 32-bit tasks at the
      time of the -ERESTART* comparisons.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      40f0933d
    • I
      x86: ioremap, remove WARN_ON() · 9a46d7e5
      Ingo Molnar 提交于
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9a46d7e5
  2. 11 3月, 2008 3 次提交
    • I
      fix BIOS PCI config cycle buglet causing ACPI boot regression · f5dbb55b
      Ingo Molnar 提交于
      I figured out another ACPI related regression today.
      
      randconfig testing triggered an early boot-time hang on a laptop of mine
      (32-bit x86, config attached) - the screen was scrolling ACPI AML
      exceptions [with no serial port and no early debugging available].
      
      v2.6.24 works fine on that laptop with the same .config, so after a few
      hours of bisection (had to restart it 3 times - other regressions
      interacted), it honed in on this commit:
      
      | 10270d48 is first bad commit
      |
      | Author: Linus Torvalds <torvalds@woody.linux-foundation.org>
      | Date:   Wed Feb 13 09:56:14 2008 -0800
      |
      |     acpi: fix acpi_os_read_pci_configuration() misuse of raw_pci_read()
      
      reverting this commit ontop of -rc5 gave a correctly booting kernel.
      
      But this commit fixes a real bug so the real question is, why did it
      break the bootup?
      
      After quite some head-scratching, the following change stood out:
      
      -                               pci_id->bus = tu8;
      +                               pci_id->bus = val;
      
      pci_id->bus is defined as u16:
      
         struct acpi_pci_id {
                 u16 segment;
                 u16 bus;
         ...
      
      and 'tu8' changed from u8 to u32. So previously we'd unconditionally
      mask the return value of acpi_os_read_pci_configuration()
      (raw_pci_read()) to 8 bits, but now we just trust whatever comes back
      from the PCI access routines and only crop it to 16 bits.
      
      But if the high 8 bits of that result contains any noise then we'll
      write that into ACPI's PCI ID descriptor and confuse the heck out of the
      rest of ACPI.
      
      So lets check the PCI-BIOS code on that theory. We have this codepath
      for 8-bit accesses (arch/x86/pci/pcbios.c:pci_bios_read()):
      
              switch (len) {
              case 1:
                      __asm__("lcall *(%%esi); cld\n\t"
                              "jc 1f\n\t"
                              "xor %%ah, %%ah\n"
                              "1:"
                              : "=c" (*value),
                                "=a" (result)
                              : "1" (PCIBIOS_READ_CONFIG_BYTE),
                                "b" (bx),
                                "D" ((long)reg),
                                "S" (&pci_indirect));
      
      Aha! The "=a" output constraint puts the full 32 bits of EAX into
      *value. But if the BIOS's routines set any of the high bits to nonzero,
      we'll return a value with more set in it than intended.
      
      The other, more common PCI access methods (v1 and v2 PCI reads) clear
      out the high bits already, for example pci_conf1_read() does:
      
              switch (len) {
              case 1:
                      *value = inb(0xCFC + (reg & 3));
      
      which explicitly converts the return byte up to 32 bits and zero-extends
      it.
      
      So zero-extending the result in the PCI-BIOS read routine fixes the
      regression on my laptop. ( It might fix some other long-standing issues
      we had with PCI-BIOS during the past decade ... ) Both 8-bit and 16-bit
      accesses were buggy.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f5dbb55b
    • R
      lguest: Revert 1ce70c4f, fix real problem. · 4357bd94
      Rusty Russell 提交于
      Ahmed managed to crash the Host in release_pgd(), which cannot be a Guest
      bug, and indeed it wasn't.
      
      The bug was that handing a 0 as the address of the toplevel page table
      being manipulated can cause the lookup code in find_pgdir() to return
      an uninitialized cache entry (we shadow up to 4 top level page tables
      for each Guest).
      
      Commit 37cc8d7f introduced this
      behaviour in the Guest, uncovering the bug.
      
      The patch which he submitted (which removed the /4 from the index
      calculation) simply ensured that these high-indexed entries hit the
      early exit path of guest_set_pmd().  But you get lots of segfaults in
      guest userspace as the PMDs aren't being updated.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      4357bd94
    • R
      lguest: Sanitize the lguest clock. · 3fabc55f
      Rusty Russell 提交于
      Now the TSC code handles a zero return from calculate_cpu_khz(),
      lguest can simply pass through the value it gets from the Host: if
      non-zero, all the normal TSC code applies.
      
      Otherwise (or if the Host really doesn't support TSC), the clocksource
      code will fall back to the slower but reasonable lguest clock.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      3fabc55f
  3. 08 3月, 2008 1 次提交
    • R
      x86_64: make ptrace always sign-extend orig_ax to 64 bits · 84c6f604
      Roland McGrath 提交于
      This makes 64-bit ptrace calls setting the 64-bit orig_ax field for a
      32-bit task sign-extend the low 32 bits up to 64.  This matches what a
      64-bit debugger expects when tracing a 32-bit task.
      
      This follows on my "x86_64 ia32 syscall restart fix".  This didn't
      matter until that was fixed.
      
      The debugger ignores or zeros the high half of every register slot it
      sets (including the orig_rax pseudo-register) uniformly.  It expects
      that the setting of the low 32 bits always has the same meaning as a
      32-bit debugger setting those same 32 bits with native 32-bit
      facilities.
      
      This never arose before because the syscall restart check never
      matched any -ERESTART* values due to lack of sign extension.  Before
      that fix, even 32-bit ptrace setting orig_eax to -1 failed to trigger
      the restart check anyway.  So this was never noticed as a regression
      of 64-bit debuggers vs 32-bit debuggers on the same 64-bit kernel.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      [ Changed to just do the sign-extension unconditionally on x86-64,
        since orig_ax is always just a small integer and doesn't need
        the full 64-bit range ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      84c6f604
  4. 07 3月, 2008 5 次提交
    • P
      x86-boot: don't request VBE2 information · 1722770f
      Peter Korsgaard 提交于
      The new x86 setup code (4fd06960) broke booting on an old P3/500MHz
      with an onboard Voodoo3 of mine. After debugging it, it turned out
      to be caused by the fact that the vesa probing now asks for VBE2 data.
      
      Disassembing the video BIOS shows that it overflows the vesa_general_info
      structure when VBE2 data is requested because the source addresses for the
      information strings which get strcpy'ed to the buffer lie outside the 32K
      BIOS code (and hence contain long sequences of 0xff's).
      
      E.G.:
      
      get_vbe_controller_info:
      00002A9C  60                pushaw
      00002A9D  1E                push ds
      00002A9E  0E                push cs
      00002A9F  1F                pop ds
      00002AA0  2BC9              sub cx,cx
      00002AA2  6626813D56424532  cmp dword [es:di],0x32454256 ; "VBE2"
      00002AAA  7501              jnz .1
      00002AAC  41                inc cx
      .1:
      00002AAD  51                push cx
      00002AAE  B91400            mov cx,0x14
      00002AB1  BED47F            mov si, controller_header
      00002AB4  57                push di
      00002AB5  F3A4              rep movsb ; copy vbe1.2 header
      
      00002AB7  B9EC00            mov cx,0xec
      00002ABA  2AC0              sub al,al
      00002ABC  F3AA              rep stosb ; zero pad remainder
      
      00002ABE  5F                pop di
      00002ABF  E8EB0D            call word get_memory
      00002AC2  C1E002            shl ax,0x2
      00002AC5  26894512          mov [es:di+0x12],ax ; total memory
      00002AC9  26C745040003      mov word [es:di+0x4],0x300 ; VBE version
      00002ACF  268C4D08          mov [es:di+0x8],cs
      00002AD3  268C4D10          mov [es:di+0x10],cs
      00002AD7  59                pop cx
      00002AD8  E361              jcxz .done ; VBE2 requested?
      00002ADA  8D9D0001          lea bx,[di+0x100]
      00002ADE  53                push bx
      00002ADF  87DF              xchg bx,di ; di now points to 2nd half
      00002AE1  26C747140001      mov word [es:bx+0x14],0x100 ; sw rev
      
      00002AE7  26897F06          mov [es:bx+0x6],di		; oem string
      00002AEB  268C4708          mov [es:bx+0x8],es
      00002AEF  BE5280            mov si,0x8052 ; oem string
      00002AF2  E87A1B            call word strcpy
      
      00002AF5  26897F0E          mov [es:bx+0xe],di ; video mode list
      00002AF9  268C4710          mov [es:bx+0x10],es
      00002AFD  B91E00            mov cx,0x1e
      00002B00  BEE87F            mov si,vidmodes
      00002B03  F3A5              rep movsw
      
      00002B05  26897F16          mov [es:bx+0x16],di ; oem vendor
      00002B09  268C4718          mov [es:bx+0x18],es
      00002B0D  BE2480            mov si,0x8024 ; oem vendor
      00002B10  E85C1B            call word strcpy
      
      00002B13  26897F1A          mov [es:bx+0x1a],di ; oem product
      00002B17  268C471C          mov [es:bx+0x1c],es
      00002B1B  BE3880            mov si,0x8038 ; oem product
      00002B1E  E84E1B            call word strcpy
      
      00002B21  26897F1E          mov [es:bx+0x1e],di ; oem product rev
      00002B25  268C4720          mov [es:bx+0x20],es
      00002B29  BE4580            mov si,0x8045 ; oem product rev
      00002B2C  E8401B            call word strcpy
      
      00002B2F  58                pop ax
      00002B30  B90001            mov cx,0x100
      00002B33  2BCF              sub cx,di
      00002B35  03C8              add cx,ax
      00002B37  2AC0              sub al,al
      00002B39  F3AA              rep stosb ; zero pad
      .done:
      00002B3B  1F                pop ds
      00002B3C  61                popaw
      00002B3D  B84F00            mov ax,0x4f
      00002B40  C3                ret
      
      (The full BIOS can be found at http://peter.korsgaard.com/vgabios.bin
      if interested).
      
      The old setup code didn't ask for VBE2 info, and the new code doesn't
      actually do anything with the extra information, so the fix is to simply
      not request it. Other BIOS'es might have the same problem.
      Signed-off-by: NPeter Korsgaard <jacmet@sunsite.dk>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1722770f
    • I
      x86: re-add reboot fixups · 7432d149
      Ingo Molnar 提交于
      Jan Beulich noticed that the reboot fixups went missing during
      reboot.c unification.
      
      (commit 4d022e35)
      
      Geode and a few other rare boards with special reboot quirks are
      affected.
      Reported-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7432d149
    • J
      x86: fix typo in step.c · d032b31a
      Jan Beulich 提交于
      TIF_DEBUGCTLMSR has no meaning in the actual MSR...
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d032b31a
    • J
      x86: fix merge mistake in i387.c · 609b5297
      Jan Beulich 提交于
      convert_fxsr_to_user() in 2.6.24's i387_32.c did this, and
      convert_to_fxsr() also does the inverse, so I assume it's an oversight
      that it is no longer being done.
      
      [ mingo@elte.hu:
      
        we encode it this way because there's no space for the 'FPU Last
        Instruction Opcode' (->fop) field in the legacy user_i387_ia32_struct
        that PTRACE_GETFPREGS/PTRACE_SETFPREGS uses.
      
        it's probably pure legacy - i'd be surprised if any user-space relied on
        the FPU Last Opcode in any way. But indeed we used to do it previously
        so the most conservative thing is to preserve that piece of information.
      ]
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      609b5297
    • A
      x86: clear DF before calling signal handler · e40cd10c
      Aurelien Jarno 提交于
      The Linux kernel currently does not clear the direction flag before
      calling a signal handler, whereas the x86/x86-64 ABI requires that.
      
      Linux had this behavior/bug forever, but this becomes a real problem
      with gcc version 4.3, which assumes that the direction flag is
      correctly cleared at the entry of a function.
      
      This patches changes the setup_frame() functions to clear the
      direction before entering the signal handler.
      Signed-off-by: NAurelien Jarno <aurelien@aurel32.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      e40cd10c
  5. 06 3月, 2008 1 次提交
  6. 05 3月, 2008 5 次提交
  7. 04 3月, 2008 8 次提交
  8. 03 3月, 2008 6 次提交
  9. 01 3月, 2008 8 次提交