1. 16 4月, 2008 1 次提交
  2. 11 4月, 2008 2 次提交
    • R
      asmlinkage_protect replaces prevent_tail_call · 54a01510
      Roland McGrath 提交于
      The prevent_tail_call() macro works around the problem of the compiler
      clobbering argument words on the stack, which for asmlinkage functions
      is the caller's (user's) struct pt_regs.  The tail/sibling-call
      optimization is not the only way that the compiler can decide to use
      stack argument words as scratch space, which we have to prevent.
      Other optimizations can do it too.
      
      Until we have new compiler support to make "asmlinkage" binding on the
      compiler's own use of the stack argument frame, we have work around all
      the manifestations of this issue that crop up.
      
      More cases seem to be prevented by also keeping the incoming argument
      variables live at the end of the function.  This makes their original
      stack slots attractive places to leave those variables, so the compiler
      tends not clobber them for something else.  It's still no guarantee, but
      it handles some observed cases that prevent_tail_call() did not.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54a01510
    • V
      x86: Simplify cpu_idle_wait · 783e391b
      Venki Pallipadi 提交于
      This patch also resolves hangs on boot:
      	http://lkml.org/lkml/2008/2/23/263
      	http://bugzilla.kernel.org/show_bug.cgi?id=10093
      
      The bug was causing once-in-few-reboots 10-15 sec wait during boot on
      certain laptops.
      
      Earlier commit 40d6a146 added
      smp_call_function in cpu_idle_wait() to kick cpus that are in tickless
      idle.  Looking at cpu_idle_wait code at that time, code seemed to be
      over-engineered for a case which is rarely used (while changing idle
      handler).
      
      Below is a simplified version of cpu_idle_wait, which just makes a dummy
      smp_call_function to all cpus, to make them come out of old idle handler
      and start using the new idle handler.  It eliminates code in the idle
      loop to handle cpu_idle_wait.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      783e391b
  3. 10 4月, 2008 1 次提交
    • S
      pop previous section in alternative.c · f4be31ec
      Steven Rostedt 提交于
      gcc expects all toplevel assembly to return to the original section type.
      The code in alteranative.c does not do this. This caused some strange bugs
      in sched-devel where code would end up in the .rodata section and when
      the kernel sets the NX bit on all .rodata, the kernel would crash when
      executing this code.
      
      This patch adds a .previous marker to return the code back to the
      original section.
      
      Credit goes to Andrew Pinski for telling me it wasn't a gcc bug but a
      bug in the toplevel asm code in the kernel.  ;-)
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f4be31ec
  4. 08 4月, 2008 2 次提交
  5. 07 4月, 2008 1 次提交
  6. 05 4月, 2008 8 次提交
    • T
      x86: revert assign IRQs to hpet timer · 5761d64b
      Thomas Gleixner 提交于
      The commits:
      
      commit 37a47db8
      Author: Balaji Rao <balajirrao@gmail.com>
      Date:   Wed Jan 30 13:30:03 2008 +0100
      
          x86: assign IRQs to HPET timers, fix
      
      and
      
      commit e3f37a54
      Author: Balaji Rao <balajirrao@gmail.com>
      Date:   Wed Jan 30 13:30:03 2008 +0100
      
          x86: assign IRQs to HPET timers
      
      have been identified to cause a regression on some platforms due to
      the assignement of legacy IRQs which makes the legacy devices
      connected to those IRQs disfunctional.
      
      Revert them.
      
      This fixes http://bugzilla.kernel.org/show_bug.cgi?id=10382Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5761d64b
    • T
      x86: tsc prevent time going backwards · 47001d60
      Thomas Gleixner 提交于
      We already catch most of the TSC problems by sanity checks, but there
      is a subtle bug which has been in the code for ever. This can cause
      time jumps in the range of hours.
      
      This was reported in:
           http://lkml.org/lkml/2007/8/23/96
      and
           http://lkml.org/lkml/2008/3/31/23
      
      I was able to reproduce the problem with a gettimeofday loop test on a
      dual core and a quad core machine which both have sychronized
      TSCs. The TSCs seems not to be perfectly in sync though, but the
      kernel is not able to detect the slight delta in the sync check. Still
      there exists an extremly small window where this delta can be observed
      with a real big time jump. So far I was only able to reproduce this
      with the vsyscall gettimeofday implementation, but in theory this
      might be observable with the syscall based version as well.
      
      CPU 0 updates the clock source variables under xtime/vyscall lock and
      CPU1, where the TSC is slighty behind CPU0, is reading the time right
      after the seqlock was unlocked.
      
      The clocksource reference data was updated with the TSC from CPU0 and
      the value which is read from TSC on CPU1 is less than the reference
      data. This results in a huge delta value due to the unsigned
      subtraction of the TSC value and the reference value. This algorithm
      can not be changed due to the support of wrapping clock sources like
      pm timer.
      
      The huge delta is converted to nanoseconds and added to xtime, which
      is then observable by the caller. The next gettimeofday call on CPU1
      will show the correct time again as now the TSC has advanced above the
      reference value.
      
      To prevent this TSC specific wreckage we need to compare the TSC value
      against the reference value and return the latter when it is larger
      than the actual TSC value.
      
      I pondered to mark the TSC unstable when the readout is smaller than
      the reference value, but this would render an otherwise good and fast
      clocksource unusable without a real good reason.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      47001d60
    • M
      xen: Clear PG_pinned in release_{pt,pd}() · c946c7de
      Mark McLoughlin 提交于
      Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Cc: xen-devel@lists.xensource.com
      Cc: Mark McLoughlin <markmc@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c946c7de
    • M
      xen: Do not pin/unpin PMD pages · a684d69d
      Mark McLoughlin 提交于
      i.e. with this simple test case:
      
          int fd = open("/dev/zero", O_RDONLY);
          munmap(mmap((void *)0x40000000, 0x1000_LEN, PROT_READ, MAP_PRIVATE, fd, 0), 0x1000);
          close(fd);
      
      we currently get:
      
         kernel BUG at arch/x86/xen/enlighten.c:678!
         ...
         EIP is at xen_release_pt+0x79/0xa9
         ...
         Call Trace:
          [<c041da25>] ? __pmd_free_tlb+0x1a/0x75
          [<c047a192>] ? free_pgd_range+0x1d2/0x2b5
          [<c047a2f3>] ? free_pgtables+0x7e/0x93
          [<c047b272>] ? unmap_region+0xb9/0xf5
          [<c047c1bd>] ? do_munmap+0x193/0x1f5
          [<c047c24f>] ? sys_munmap+0x30/0x3f
          [<c0408cce>] ? syscall_call+0x7/0xb
          =======================
      
      and xen complains:
      
        (XEN) mm.c:2241:d4 Mfn 1cc37 not pinned
      
      Further details at:
      
        https://bugzilla.redhat.com/436453Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Cc: xen-devel@lists.xensource.com
      Cc: Mark McLoughlin <markmc@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a684d69d
    • M
      xen: refactor xen_{alloc,release}_{pt,pd}() · f6433706
      Mark McLoughlin 提交于
      Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Cc: xen-devel@lists.xensource.com
      Cc: Mark McLoughlin <markmc@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f6433706
    • P
      x86, agpgart: scary messages are fortunately obsolete · 8f59610d
      Pavel Machek 提交于
      Fix obsolete printks in aperture-64. We used not to handle missing
      agpgart, but we handle it okay now.
      Signed-off-by: NPavel Machek <pavel@suse.cz>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8f59610d
    • I
      x86: print message if nmi_watchdog=2 cannot be enabled · 9c9b81f7
      Ingo Molnar 提交于
      right now if there's no CPU support for nmi_watchdog=2 we'll just
      refuse it silently.
      
      print a useful warning.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9c9b81f7
    • I
      x86: fix nmi_watchdog=2 on Pentium-D CPUs · 4f14bdef
      Ingo Molnar 提交于
      implement nmi_watchdog=2 on this class of CPUs:
      
        cpu family      : 15
        model           : 6
        model name      : Intel(R) Pentium(R) D CPU 3.00GHz
      
      the watchdog's ->setup() method is safe anyway, so if the CPU
      cannot support it we'll bail out safely.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4f14bdef
  7. 04 4月, 2008 1 次提交
    • R
      x86 ptrace: avoid unnecessary wrmsr · 4ba51fd7
      Roland McGrath 提交于
      This avoids using wrmsr on MSR_IA32_DEBUGCTLMSR when it's not needed.
      No wrmsr ever needs to be done if noone has ever used block stepping.
      
      Without this change, using ptrace on 2.6.25 on an x86 KVM guest
      will tickle KVM's missing support for the MSR and crash the guest
      kernel.  Though host KVM is the buggy one, this makes for a regression
      in the guest behavior from 2.6.24->2.6.25 that we can easily avoid.
      
      I also corrected some bad whitespace.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ba51fd7
  8. 03 4月, 2008 1 次提交
    • K
      vmcoreinfo: add the symbol "phys_base" · 629c8b4c
      Ken'ichi Ohmichi 提交于
      Fix the problem that makedumpfile sometimes fails on x86_64 machine.
      
      This patch adds the symbol "phys_base" to a vmcoreinfo data.  The
      vmcoreinfo data has the minimum debugging information only for dump
      filtering.  makedumpfile (dump filtering command) gets it to distinguish
      unnecessary pages, and makedumpfile creates a small dumpfile.
      
      On x86_64 kernel which compiled with CONFIG_PHYSICAL_START=0x0 and
      CONFIG_RELOCATABLE=y, makedumpfile fails like the following:
      
       # makedumpfile -d31 /proc/vmcore dumpfile
       The kernel version is not supported.
       The created dumpfile may be incomplete.
       _exclude_free_page: Can't get next online node.
      
       makedumpfile Failed.
       #
      
      The cause is the lack of the symbol "phys_base" in a vmcoreinfo data.
      If the symbol "phys_base" does not exist, makedumpfile considers an
      x86_64 kernel as non relocatable.  As the result, makedumpfile
      misunderstands the physical address where the kernel is loaded, and it
      cannot translate a kernel virtual address to physical address correctly.
      
      To fix this problem, this patch adds the symbol "phys_base" to a
      vmcoreinfo data.
      Signed-off-by: NKen'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <stable@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      629c8b4c
  9. 29 3月, 2008 1 次提交
  10. 28 3月, 2008 2 次提交
  11. 27 3月, 2008 13 次提交
  12. 25 3月, 2008 6 次提交
    • A
      KVM: MMU: Fix memory leak on guest demand faults · e48bb497
      Avi Kivity 提交于
      While backporting 72dc67a6, a gfn_to_page()
      call was duplicated instead of moved (due to an unrelated patch not being
      present in mainline).  This caused a page reference leak, resulting in a
      fairly massive memory leak.
      
      Fix by removing the extraneous gfn_to_page() call.
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      e48bb497
    • M
      KVM: VMX: convert init_rmode_tss() to slots_lock · 707a18a5
      Marcelo Tosatti 提交于
      init_rmode_tss was forgotten during the conversion from mmap_sem to
      slots_lock.
      
      INFO: task qemu-system-x86:3748 blocked for more than 120 seconds.
      Call Trace:
       [<ffffffff8053d100>] __down_read+0x86/0x9e
       [<ffffffff8053fb43>] do_page_fault+0x346/0x78e
       [<ffffffff8053d235>] trace_hardirqs_on_thunk+0x35/0x3a
       [<ffffffff8053dcad>] error_exit+0x0/0xa9
       [<ffffffff8035a7a7>] copy_user_generic_string+0x17/0x40
       [<ffffffff88099a8a>] :kvm:kvm_write_guest_page+0x3e/0x5f
       [<ffffffff880b661a>] :kvm_intel:init_rmode_tss+0xa7/0xf9
       [<ffffffff880b7d7e>] :kvm_intel:vmx_vcpu_reset+0x10/0x38a
       [<ffffffff8809b9a5>] :kvm:kvm_arch_vcpu_setup+0x20/0x53
       [<ffffffff8809a1e4>] :kvm:kvm_vm_ioctl+0xad/0x1cf
       [<ffffffff80249dea>] __lock_acquire+0x4f7/0xc28
       [<ffffffff8028fad9>] vfs_ioctl+0x21/0x6b
       [<ffffffff8028fd75>] do_vfs_ioctl+0x252/0x26b
       [<ffffffff8028fdca>] sys_ioctl+0x3c/0x5e
       [<ffffffff8020b01b>] system_call_after_swapgs+0x7b/0x80
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      707a18a5
    • M
      KVM: MMU: handle page removal with shadow mapping · 15aaa819
      Marcelo Tosatti 提交于
      Do not assume that a shadow mapping will always point to the same host
      frame number.  Fixes crash with madvise(MADV_DONTNEED).
      
      [avi: move after first printk(), add another printk()]
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      15aaa819
    • A
      KVM: MMU: Fix is_rmap_pte() with io ptes · 4b1a80fa
      Avi Kivity 提交于
      is_rmap_pte() doesn't take into account io ptes, which have the avail bit set.
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      4b1a80fa
    • A
      KVM: VMX: Restore tss even on x86_64 · 5dc83262
      Avi Kivity 提交于
      The vmx hardware state restore restores the tss selector and base address, but
      not its length.  Usually, this does not matter since most of the tss contents
      is within the default length of 0x67.  However, if a process is using ioperm()
      to grant itself I/O port permissions, an additional bitmap within the tss,
      but outside the default length is consulted.  The effect is that the process
      will receive a SIGSEGV instead of transparently accessing the port.
      
      Fix by restoring the tss length.  Note that i386 had this working already.
      
      Closes bugzilla 10246.
      Signed-off-by: NAvi Kivity <avi@qumranet.com>
      5dc83262
    • L
      x86-32: Pass the full resource data to ioremap() · b9e76a00
      Linus Torvalds 提交于
      It appears that 64-bit PCI resources cannot possibly ever have worked on
      x86-32 even when the RESOURCES_64BIT config option was set, because any
      driver that tried to [pci_]ioremap() the resource would have been unable
      to do so because the high 32 bits would have been silently dropped on
      the floor by the ioremap() routines that only used "unsigned long".
      
      Change them to use "resource_size_t" instead, which properly encodes the
      whole 64-bit resource data if RESOURCES_64BIT is enabled.
      Acked-by: NH. Peter Anvin <hpa@kernel.org>
      Acked-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9e76a00
  13. 23 3月, 2008 1 次提交
    • T
      x86: revert: reserve dma32 early for gart · 9e963048
      Thomas Gleixner 提交于
      Revert
      
      commit f62f1fc9
      Author: Yinghai Lu <yhlu.kernel@gmail.com>
      Date:   Fri Mar 7 15:02:50 2008 -0800
      
          x86: reserve dma32 early for gart
      
      The patch has a dependency on bootmem modifications which are not .25
      material that late in the -rc cycle. The problem which is addressed by
      the patch is limited to machines with 256G and more memory booted with
      NUMA disabled. This is not a .25 regression and the audience which is
      affected by this problem is very limited, so it's safer to do the
      revert than pulling in intrusive bootmem changes right now.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      9e963048