1. 22 6月, 2009 5 次提交
    • T
      x86: reorganize cpa_process_alias() · 992f4c1c
      Tejun Heo 提交于
      Reorganize cpa_process_alias() so that new alias condition can be
      added easily.
      
      Jan Beulich spotted problem in the original cleanup thread which
      incorrectly assumed the two existing conditions were mutially
      exclusive.
      
      [ Impact: code reorganization ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      992f4c1c
    • T
      x86: prepare setup_pcpu_lpage() for pageattr fix · 0ff2587f
      Tejun Heo 提交于
      Make the following changes in preparation of coming pageattr updates.
      
      * Define and use array of struct pcpul_ent instead of array of
        pointers.  The only difference is ->cpu field which is set but
        unused yet.
      
      * Rename variables according to the above change.
      
      * Rename local variable vm to pcpul_vm and move it out of the
        function.
      
      [ Impact: no functional difference ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      0ff2587f
    • T
      x86: rename remap percpu first chunk allocator to lpage · 97c9bf06
      Tejun Heo 提交于
      The "remap" allocator remaps large pages to build the first chunk;
      however, the name isn't very good because 4k allocator remaps too and
      the whole point of the remap allocator is using large page mapping.
      The allocator will be generalized and exported outside of x86, rename
      it to lpage before that happens.
      
      percpu_alloc kernel parameter is updated to accept both "remap" and
      "lpage" for lpage allocator.
      
      [ Impact: code cleanup, kernel parameter argument updated ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      97c9bf06
    • T
      x86: fix duplicate free in setup_pcpu_remap() failure path · c5806df9
      Tejun Heo 提交于
      In the failure path, setup_pcpu_remap() tries to free the area which
      has already been freed to make holes in the large page.  Fix it.
      
      [ Impact: fix duplicate free in failure path ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      c5806df9
    • L
      Move FAULT_FLAG_xyz into handle_mm_fault() callers · d06063cc
      Linus Torvalds 提交于
      This allows the callers to now pass down the full set of FAULT_FLAG_xyz
      flags to handle_mm_fault().  All callers have been (mechanically)
      converted to the new calling convention, there's almost certainly room
      for architectures to clean up their code and then add FAULT_FLAG_RETRY
      when that support is added.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d06063cc
  2. 21 6月, 2009 2 次提交
    • L
      x86, 64-bit: Clean up user address masking · 9063c61f
      Linus Torvalds 提交于
      The discussion about using "access_ok()" in get_user_pages_fast() (see
      commit 7f818906: "x86: don't use
      'access_ok()' as a range check in get_user_pages_fast()" for details and
      end result), made us notice that x86-64 was really being very sloppy
      about virtual address checking.
      
      So be way more careful and straightforward about masking x86-64 virtual
      addresses:
      
       - All the VIRTUAL_MASK* variants now cover half of the address
         space, it's not like we can use the full mask on a signed
         integer, and the larger mask just invites mistakes when
         applying it to either half of the 48-bit address space.
      
       - /proc/kcore's kc_offset_to_vaddr() becomes a lot more
         obvious when it transforms a file offset into a
         (kernel-half) virtual address.
      
       - Unify/simplify the 32-bit and 64-bit USER_DS definition to
         be based on TASK_SIZE_MAX.
      
      This cleanup and more careful/obvious user virtual address checking also
      uncovered a buglet in the x86-64 implementation of strnlen_user(): it
      would do an "access_ok()" check on the whole potential area, even if the
      string itself was much shorter, and thus return an error even for valid
      strings. Our sloppy checking had hidden this.
      
      So this fixes 'strnlen_user()' to do this properly, the same way we
      already handled user strings in 'strncpy_from_user()'.  Namely by just
      checking the first byte, and then relying on fault handling for the
      rest.  That always works, since we impose a guard page that cannot be
      mapped at the end of the user space address space (and even if we
      didn't, we'd have the address space hole).
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9063c61f
    • L
      x86: don't use 'access_ok()' as a range check in get_user_pages_fast() · 7f818906
      Linus Torvalds 提交于
      It's really not right to use 'access_ok()', since that is meant for the
      normal "get_user()" and "copy_from/to_user()" accesses, which are done
      through the TLB, rather than through the page tables.
      
      Why? access_ok() does both too few, and too many checks.  Too many,
      because it is meant for regular kernel accesses that will not honor the
      'user' bit in the page tables, and because it honors the USER_DS vs
      KERNEL_DS distinction that we shouldn't care about in GUP.  And too few,
      because it doesn't do the 'canonical' check on the address on x86-64,
      since the TLB will do that for us.
      
      So instead of using a function that isn't meant for this, and does
      something else and much more complicated, just do the real rules: we
      don't want the range to overflow, and on x86-64, we want it to be a
      canonical low address (on 32-bit, all addresses are canonical).
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f818906
  3. 19 6月, 2009 5 次提交
    • I
      perf_counter, x86: Improve interactions with fast-gup · 0c871971
      Ingo Molnar 提交于
      Improve a few details in perfcounter call-chain recording that
      makes use of fast-GUP:
      
      - Use ACCESS_ONCE() to observe the pte value. ptes are fundamentally
        racy and can be changed on another CPU, so we have to be careful
        about how we access them. The PAE branch is already careful with
        read-barriers - but the non-PAE and 64-bit side needs an
        ACCESS_ONCE() to make sure the pte value is observed only once.
      
      - make the checks a bit stricter so that we can feed it any kind of
        cra^H^H^H user-space input ;-)
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0c871971
    • P
      perf_counter: Make callchain samples extensible · f9188e02
      Peter Zijlstra 提交于
      Before exposing upstream tools to a callchain-samples ABI, tidy it
      up to make it more extensible in the future:
      
      Use markers in the IP chain to denote context, use (u64)-1..-4095 range
      for these context markers because we use them for ERR_PTR(), so these
      addresses are unlikely to be mapped.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f9188e02
    • S
      function-graph: add stack frame test · 71e308a2
      Steven Rostedt 提交于
      In case gcc does something funny with the stack frames, or the return
      from function code, we would like to detect that.
      
      An arch may implement passing of a variable that is unique to the
      function and can be saved on entering a function and can be tested
      when exiting the function. Usually the frame pointer can be used for
      this purpose.
      
      This patch also implements this for x86. Where it passes in the stack
      frame of the parent function, and will test that frame on exit.
      
      There was a case in x86_32 with optimize for size (-Os) where, for a
      few functions, gcc would align the stack frame and place a copy of the
      return address into it. The function graph tracer modified the copy and
      not the actual return address. On return from the funtion, it did not go
      to the tracer hook, but returned to the parent. This broke the function
      graph tracer, because the return of the parent (where gcc did not do
      this funky manipulation) returned to the location that the child function
      was suppose to. This caused strange kernel crashes.
      
      This test detected the problem and pointed out where the issue was.
      
      This modifies the parameters of one of the functions that the arch
      specific code calls, so it includes changes to arch code to accommodate
      the new prototype.
      
      Note, I notice that the parsic arch implements its own push_return_trace.
      This is now a generic function and the ftrace_push_return_trace should be
      used instead. This patch does not touch that code.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      71e308a2
    • F
      dma-mapping: x86: use asm-generic/dma-mapping-common.h · 7c095e46
      FUJITA Tomonori 提交于
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Acked-by: NJoerg Roedel <joerg.roedel@amd.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c095e46
    • P
      gcov: enable GCOV_PROFILE_ALL for x86_64 · 7bf99fb6
      Peter Oberparleiter 提交于
      Enable gcov profiling of the entire kernel on x86_64. Required changes
      include disabling profiling for:
      
      * arch/kernel/acpi/realmode and arch/kernel/boot/compressed:
        not linked to main kernel
      * arch/vdso, arch/kernel/vsyscall_64 and arch/kernel/hpet:
        profiling causes segfaults during boot (incompatible context)
      Signed-off-by: NPeter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Li Wei <W.Li@Sun.COM>
      Cc: Michael Ellerman <michaele@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Heiko Carstens <heicars2@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <mschwid2@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bf99fb6
  4. 18 6月, 2009 15 次提交
    • H
      x86, mce: fix error path in mce_create_device() · b1f49f95
      Hidetoshi Seto 提交于
      Don't skip removing mce_attrs in route from error2.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Huang Ying <ying.huang@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      b1f49f95
    • H
      crypto: aes-ni - Remove CRYPTO_TFM_REQ_MAY_SLEEP from fpu template · b6f34d44
      Huang Ying 提交于
      kernel_fpu_begin/end used preempt_disable/enable, so sleep should be
      prevented between kernel_fpu_begin/end.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      b6f34d44
    • H
      crypto: aes-ni - Do not sleep when using the FPU · 9251b64f
      Huang Ying 提交于
      Because AES-NI instructions will touch XMM state, corresponding code
      must be enclosed within kernel_fpu_begin/end, which used
      preempt_disable/enable. So sleep should be prevented between
      kernel_fpu_begin/end.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      9251b64f
    • H
      crypto: aes-ni - Fix cbc mode IV saving · e6efaa02
      Huang Ying 提交于
      Original implementation of aesni_cbc_dec do not save IV if input
      length % 4 == 0. This will make decryption of next block failed.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      e6efaa02
    • Y
      x86: use zalloc_cpumask_var for mce_dev_initialized · e92fae06
      Yinghai Lu 提交于
      We need a cleared cpu_mask to record if mce is initialized, especially
      when MAXSMP is used.
      
      used zalloc_... instead
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Reviewed-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: stable@kernel.org
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      e92fae06
    • Y
      x86: fix duplicated sysfs attribute · 74b602c7
      Yinghai Lu 提交于
      The sysfs attribute cmci_disabled was accidentall turned into a
      duplicate of ignore_ce, breaking all other attributes.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      74b602c7
    • A
      x86: de-assembler-ize asm/desc.h · bc3f5d3d
      Alexander van Heukelum 提交于
      asm/desc.h is included in three assembly files, but the only macro
      it defines, GET_DESC_BASE, is never used. This patch removes the
      includes, removes the macro GET_DESC_BASE and the ASSEMBLY guard
      from asm/desc.h.
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      bc3f5d3d
    • A
      i386: fix/simplify espfix stack switching, move it into assembly · dc4c2a0a
      Alexander van Heukelum 提交于
      The espfix code triggers if we have a protected mode userspace
      application with a 16-bit stack. On returning to userspace, with iret,
      the CPU doesn't restore the high word of the stack pointer. This is an
      "official" bug, and the work-around used in the kernel is to temporarily
      switch to a 32-bit stack segment/pointer pair where the high word of the
      pointer is equal to the high word of the userspace stackpointer.
      
      The current implementation uses THREAD_SIZE to determine the cut-off,
      but there is no good reason not to use the more natural 64kb... However,
      implementing this by simply substituting THREAD_SIZE with 65536 in
      patch_espfix_desc crashed the test application. patch_espfix_desc tries
      to do what is described above, but gets it subtly wrong if the userspace
      stack pointer is just below a multiple of THREAD_SIZE: an overflow
      occurs to bit 13... With a bit of luck, when the kernelspace
      stackpointer is just below a 64kb-boundary, the overflow then ripples
      trough to bit 16 and userspace will see its stack pointer changed by
      65536.
      
      This patch moves all espfix code into entry_32.S. Selecting a 16-bit
      cut-off simplifies the code. The game with changing the limit dynamically
      is removed too. It complicates matters and I see no value in it. Changing
      only the top 16-bit word of ESP is one instruction and it also implies
      that only two bytes of the ESPFIX GDT entry need to be changed and this
      can be implemented in just a handful simple to understand instructions.
      As a side effect, the operation to compute the original ESP from the
      ESPFIX ESP and the GDT entry simplifies a bit too, and the remaining
      three instructions have been expanded inline in entry_32.S.
      
      impact: can now reliably run userspace with ESP=xxxxfffc on 16-bit
      stack segment
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Acked-by: NStas Sergeev <stsp@aknet.ru>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      dc4c2a0a
    • A
      i386: fix return to 16-bit stack from NMI handler · 2e04bc76
      Alexander van Heukelum 提交于
      Returning to a task with a 16-bit stack requires special care: the iret
      instruction does not restore the high word of esp in that case. The
      espfix code fixes this, but currently is not invoked on NMIs. This means
      that a running task gets the upper word of esp clobbered due intervening
      NMIs. To reproduce, compile and run the following program with the nmi
      watchdog enabled (nmi_watchdog=2 on the command line). Using gdb you can
      see that the high bits of esp contain garbage, while the low bits are
      still correct.
      
      This patch puts the espfix code back into the NMI code path.
      
      The patch is slightly complicated due to the irqtrace infrastructure not
      being NMI-safe. The NMI return path cannot call TRACE_IRQS_IRET.
      Otherwise, the tail of the normal iret-code is correct for the nmi code
      path too. To be able to share this code-path, the TRACE_IRQS_IRET was
      move up a bit. The espfix code exists after the TRACE_IRQS_IRET, but
      this code explicitly disables interrupts. This short interrupts-off
      section is now not traced anymore. The return-to-kernel path now always
      includes the preliminary test to decide if the espfix code should be
      called. This is never the case, but doing it this way keeps the patch as
      simple as possible and the few extra instructions should not affect
      timing in any significant way.
      
       #define _GNU_SOURCE
       #include <stdio.h>
       #include <sys/types.h>
       #include <sys/mman.h>
       #include <unistd.h>
       #include <sys/syscall.h>
       #include <asm/ldt.h>
      
      int modify_ldt(int func, void *ptr, unsigned long bytecount)
      {
              return syscall(SYS_modify_ldt, func, ptr, bytecount);
      }
      
      /* this is assumed to be usable */
       #define SEGBASEADDR 0x10000
       #define SEGLIMIT 0x20000
      
      /* 16-bit segment */
      struct user_desc desc = {
              .entry_number = 0,
              .base_addr = SEGBASEADDR,
              .limit = SEGLIMIT,
              .seg_32bit = 0,
              .contents = 0, /* ??? */
              .read_exec_only = 0,
              .limit_in_pages = 0,
              .seg_not_present = 0,
              .useable = 1
      };
      
      int main(void)
      {
              setvbuf(stdout, NULL, _IONBF, 0);
      
              /* map a 64 kb segment */
              char *pointer = mmap((void *)SEGBASEADDR, SEGLIMIT+1,
                              PROT_EXEC|PROT_READ|PROT_WRITE,
                              MAP_SHARED|MAP_ANONYMOUS, -1, 0);
              if (pointer == NULL) {
                      printf("could not map space\n");
                      return 0;
              }
      
              /* write ldt, new mode */
              int err = modify_ldt(0x11, &desc, sizeof(desc));
              if (err) {
                      printf("error modifying ldt: %i\n", err);
                      return 0;
              }
      
              for (int i=0; i<1000; i++) {
              asm volatile (
                      "pusha\n\t"
                      "mov %ss, %eax\n\t" /* preserve ss:esp */
                      "mov %esp, %ebp\n\t"
                      "push $7\n\t" /* index 0, ldt, user mode */
                      "push $65536-4096\n\t" /* esp */
                      "lss (%esp), %esp\n\t" /* switch to new stack */
                      "push %eax\n\t" /* save old ss:esp on new stack */
                      "push %ebp\n\t"
                      "add $17*65536, %esp\n\t" /* set high bits */
                      "mov %esp, %edx\n\t"
      
                      "mov $10000000, %ecx\n\t" /* wait... */
                      "1: loop 1b\n\t" /* ... a bit */
      
                      "cmp %esp, %edx\n\t"
                      "je 1f\n\t"
                      "ud2\n\t" /* esp changed inexplicably! */
                      "1:\n\t"
                      "sub $17*65536, %esp\n\t" /* restore high bits */
                      "lss (%esp), %esp\n\t" /* restore old ss:esp */
                      "popa\n\t");
      
                      printf("\rx%ix", i);
              }
      
              return 0;
      }
      Signed-off-by: NAlexander van Heukelum <heukelum@fastmail.fm>
      Acked-by: NStas Sergeev <stsp@aknet.ru>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      2e04bc76
    • M
      x86: Use pci_claim_resource · a76117df
      Matthew Wilcox 提交于
      Instead of open-coding pci_find_parent_resource and request_resource,
      just call pci_claim_resource.
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a76117df
    • C
      x86, ioapic: Don't call disconnect_bsp_APIC if no APIC present · 3f4c3955
      Cyrill Gorcunov 提交于
      Vegard Nossum reported:
      
      [  503.576724] ACPI: Preparing to enter system sleep state S5
      [  503.710857] Disabling non-boot CPUs ...
      [  503.716853] Power down.
      [  503.717770] ------------[ cut here ]------------
      [  503.717770] WARNING: at arch/x86/kernel/apic/apic.c:249 native_apic_write_du)
      [  503.717770] Hardware name: OptiPlex GX100
      [  503.717770] Modules linked in:
      [  503.717770] Pid: 2136, comm: halt Not tainted 2.6.30 #443
      [  503.717770] Call Trace:
      [  503.717770]  [<c154d327>] ? printk+0x18/0x1a
      [  503.717770]  [<c1017358>] ? native_apic_write_dummy+0x38/0x50
      [  503.717770]  [<c10360fc>] warn_slowpath_common+0x6c/0xc0
      [  503.717770]  [<c1017358>] ? native_apic_write_dummy+0x38/0x50
      [  503.717770]  [<c1036165>] warn_slowpath_null+0x15/0x20
      [  503.717770]  [<c1017358>] native_apic_write_dummy+0x38/0x50
      [  503.717770]  [<c1017173>] disconnect_bsp_APIC+0x63/0x100
      [  503.717770]  [<c1019e48>] disable_IO_APIC+0xb8/0xc0
      [  503.717770]  [<c1214231>] ? acpi_power_off+0x0/0x29
      [  503.717770]  [<c1015e55>] native_machine_shutdown+0x65/0x80
      [  503.717770]  [<c1015c36>] native_machine_power_off+0x26/0x30
      [  503.717770]  [<c1015c49>] machine_power_off+0x9/0x10
      [  503.717770]  [<c1046596>] kernel_power_off+0x36/0x40
      [  503.717770]  [<c104680d>] sys_reboot+0xfd/0x1f0
      [  503.717770]  [<c109daa0>] ? perf_swcounter_event+0xb0/0x130
      [  503.717770]  [<c109db7d>] ? perf_counter_task_sched_out+0x5d/0x120
      [  503.717770]  [<c102dfc6>] ? finish_task_switch+0x56/0xd0
      [  503.717770]  [<c154da1e>] ? schedule+0x49e/0xb40
      [  503.717770]  [<c10444b0>] ? sys_kill+0x70/0x160
      [  503.717770]  [<c119d9db>] ? selinux_file_ioctl+0x3b/0x50
      [  503.717770]  [<c10dd443>] ? sys_ioctl+0x63/0x70
      [  503.717770]  [<c1003024>] sysenter_do_call+0x12/0x22
      [  503.717770] ---[ end trace 8157b5d0ed378f15 ]---
      
      |
      | That's including this commit:
      |
      | commit 103428e5
      |Author: Cyrill Gorcunov <gorcunov@openvz.org>
      |Date:   Sun Jun 7 16:48:40 2009 +0400
      |
      |    x86, apic: Fix dummy apic read operation together with broken MP handling
      |
      
      If we have apic disabled we don't even switch to APIC mode and do not
      calling for connect_bsp_APIC. Though on SMP compiled kernel the
      native_machine_shutdown does try to write the apic register anyway.
      
      Fix it with explicit check if we really should touch apic registers.
      Reported-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      LKML-Reference: <20090617181322.GG10822@lenovo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3f4c3955
    • P
      perf_counter: x86: Set the period in the intel overflow handler · 60f916de
      Peter Zijlstra 提交于
      Commit 9e350de3 ("perf_counter: Accurate period data")
      missed a spot, which caused all Intel-PMU samples to have a
      period of 0.
      
      This broke auto-freq sampling.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60f916de
    • H
      x86: Remove duplicated #include's · 8653f88f
      Huang Weiyi 提交于
      Signed-off-by: NHuang Weiyi <weiyi.huang@gmail.com>
      LKML-Reference: <1244895686-2348-1-git-send-email-weiyi.huang@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8653f88f
    • J
      x86: msr.h linux/types.h is only required for __KERNEL__ · 8fa62ad9
      Jaswinder Singh Rajput 提交于
      <linux/types.h> is only required for __KERNEL__ as whole file is covered with it
      
      Also fixed some spacing issues for usr/include/asm-x86/msr.h
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: "H. Peter Anvin" <hpa@kernel.org>
      LKML-Reference: <1245228070.2662.1.camel@ht.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8fa62ad9
    • P
      x86: nmi: Add Intel processor 0x6f4 to NMI perfctr1 workaround · fe955e5c
      Prarit Bhargava 提交于
      Expand Intel NMI perfctr1 workaround to include a Core2 processor stepping
      (cpuid family-6, model-f, stepping-4).  Resolves a situation where the NMI
      would not enable on these processors.
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: prarit@redhat.com
      Cc: suresh.b.siddha@intel.com
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fe955e5c
  5. 17 6月, 2009 13 次提交
    • H
      x86, mce: mce_intel.c needs <asm/apic.h> · 1bf7b31e
      H. Peter Anvin 提交于
      mce_intel.c uses apic_write() and lapic_get_maxlvt(), and so it needs
      <asm/apic.h>.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      1bf7b31e
    • J
    • F
      x86, io_apic.c: Work around compiler warning · 50a8d4d2
      Figo.zhang 提交于
      This compiler warning:
      
        arch/x86/kernel/apic/io_apic.c: In function ‘ioapic_write_entry’:
        arch/x86/kernel/apic/io_apic.c:466: warning: ‘eu’ is used uninitialized in this function
        arch/x86/kernel/apic/io_apic.c:465: note: ‘eu’ was declared here
      
      Is bogus as 'eu' is always initialized. But annotate it away by
      initializing the variable, to make it easier for people to notice
      real warnings. A compiler that sees through this logic will
      optimize away the initialization.
      Signed-off-by: NFigo.zhang <figo1802@gmail.com>
      LKML-Reference: <1245248720.3312.27.camel@myhost>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      50a8d4d2
    • C
      x86: mce: Don't touch THERMAL_APIC_VECTOR if no active APIC present · 5ce4243d
      Cyrill Gorcunov 提交于
      If APIC was disabled (for some reason) and as result
      it's not even mapped we should not try to enable thermal
      interrupts at all.
      Reported-by: NSimon Holm Thøgersen <odie@cs.aau.dk>
      Tested-by: NSimon Holm Thøgersen <odie@cs.aau.dk>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      LKML-Reference: <20090615182633.GA7606@lenovo>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5ce4243d
    • P
      sched, x86: Fix cpufreq + sched_clock() TSC scaling · 84599f8a
      Peter Zijlstra 提交于
      For freqency dependent TSCs we only scale the cycles, we do not account
      for the discrepancy in absolute value.
      
      Our current formula is: time = cycles * mult
      
      (where mult is a function of the cpu-speed on variable tsc machines)
      
      Suppose our current cycle count is 10, and we have a multiplier of 5,
      then our time value would end up being 50.
      
      Now cpufreq comes along and changes the multiplier to say 3 or 7,
      which would result in our time being resp. 30 or 70.
      
      That means that we can observe random jumps in the time value due to
      frequency changes in both fwd and bwd direction.
      
      So what this patch does is change the formula to:
      
        time = cycles * frequency + offset
      
      And we calculate offset so that time_before == time_after, thereby
      ridding us of these jumps in time.
      
      [ Impact: fix/reduce sched_clock() jumps across frequency changing events ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Chucked-on-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      84599f8a
    • A
      x86: mce: Handle banks == 0 case in K7 quirk · 203abd67
      Andi Kleen 提交于
      Vegard Nossum reported:
      
      > I get an MCE-related crash like this in latest linus tree:
      >
      > [    0.115341] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
      > [    0.116396] CPU: L2 Cache: 512K (64 bytes/line)
      > [    0.120570] mce: CPU supports 0 MCE banks
      > [    0.124870] BUG: unable to handle kernel NULL pointer dereference at 00000000 00000010
      > [    0.128001] IP: [<ffffffff813b98ad>] mcheck_init+0x278/0x320
      > [    0.128001] PGD 0
      > [    0.128001] Thread overran stack, or stack corrupted
      > [    0.128001] Oops: 0002 [#1] PREEMPT SMP
      > [    0.128001] last sysfs file:
      > [    0.128001] CPU 0
      > [    0.128001] Modules linked in:
      > [    0.128001] Pid: 0, comm: swapper Not tainted 2.6.30 #426
      > [    0.128001] RIP: 0010:[<ffffffff813b98ad>]  [<ffffffff813b98ad>] mcheck_init+0x278/0x320
      > [    0.128001] RSP: 0018:ffffffff81595e38  EFLAGS: 00000246
      > [    0.128001] RAX: 0000000000000010 RBX: ffffffff8158f900 RCX: 0000000000000000
      > [    0.128001] RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000000000010
      > [    0.128001] RBP: ffffffff81595e68 R08: 0000000000000001 R09: 0000000000000000
      > [    0.128001] R10: 0000000000000010 R11: 0000000000000000 R12: 0000000000000000
      > [    0.128001] R13: 00000000ffffffff R14: 0000000000000000 R15: 0000000000000000
      > [    0.128001] FS:  0000000000000000(0000) GS:ffff880002288000(0000) knlGS:00000
      > 00000000000
      > [    0.128001] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      > [    0.128001] CR2: 0000000000000010 CR3: 0000000001001000 CR4: 00000000000006b0
      > [    0.128001] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > [    0.128001] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
      > [    0.128001] Process swapper (pid: 0, threadinfo ffffffff81594000, task ffffff
      > ff8152a4a0)
      > [    0.128001] Stack:
      > [    0.128001]  0000000081595e68 5aa50ed3b4ddbe6e ffffffff8158f900 ffffffff8158f
      > 914
      > [    0.128001]  ffffffff8158f948 0000000000000000 ffffffff81595eb8 ffffffff813b8
      > 69c
      > [    0.128001]  5aa50ed3b4ddbe6e 00000001078bfbfd 0000062300000800 5aa50ed3b4ddb
      > e6e
      > [    0.128001] Call Trace:
      > [    0.128001]  [<ffffffff813b869c>] identify_cpu+0x331/0x392
      > [    0.128001]  [<ffffffff815a1445>] identify_boot_cpu+0x23/0x6e
      > [    0.128001]  [<ffffffff815a14ac>] check_bugs+0x1c/0x60
      > [    0.128001]  [<ffffffff8159c075>] start_kernel+0x403/0x46e
      > [    0.128001]  [<ffffffff8159b2ac>] x86_64_start_reservations+0xac/0xd5
      > [    0.128001]  [<ffffffff8159b3ea>] x86_64_start_kernel+0x115/0x14b
      > [    0.128001]  [<ffffffff8159b140>] ? early_idt_handler+0x0/0x71
      
      This happens on QEMU which reports MCA capability, but no banks.
      Without this patch there is a buffer overrun and boot ops because
      the code would try to initialize the 0 element of a zero length
      kmalloc() buffer.
      Reported-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      LKML-Reference: <20090615125200.GD31969@one.firstfloor.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      203abd67
    • R
      kmap_types: make most arches use generic header file · e4c9dd0f
      Randy Dunlap 提交于
      Convert most arches to use asm-generic/kmap_types.h.
      
      Move the KM_FENCE_ macro additions into asm-generic/kmap_types.h,
      controlled by __WITH_KM_FENCE from each arch's kmap_types.h file.
      
      Would be nice to be able to add custom KM_types per arch, but I don't yet
      see a nice, clean way to do that.
      
      Built on x86_64, i386, mips, sparc, alpha(tonyb), powerpc(tonyb), and
      68k(tonyb).
      
      Note: avr32 should be able to remove KM_PTE2 (since it's not used) and
      then just use the generic kmap_types.h file.  Get avr32 maintainer
      approval.
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: <linux-arch@vger.kernel.org>
      Acked-by: NMike Frysinger <vapier@gentoo.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Bryan Wu <cooloney@kernel.org>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: "Luck Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e4c9dd0f
    • M
      use printk_once() in several places · a9c56953
      Minchan Kim 提交于
      There are some places to be able to use printk_once instead of hard coding.
      Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Dominik Brodowski <linux@dominikbrodowski.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9c56953
    • M
      page allocator: do not check NUMA node ID when the caller knows the node is valid · 6484eb3e
      Mel Gorman 提交于
      Callers of alloc_pages_node() can optionally specify -1 as a node to mean
      "allocate from the current node".  However, a number of the callers in
      fast paths know for a fact their node is valid.  To avoid a comparison and
      branch, this patch adds alloc_pages_exact_node() that only checks the nid
      with VM_BUG_ON().  Callers that know their node is valid are then
      converted.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: Paul Mundt <lethal@linux-sh.org>	[for the SLOB NUMA bits]
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6484eb3e
    • A
      mm: consolidate init_mm definition · bb1f17b0
      Alexey Dobriyan 提交于
      * create mm/init-mm.c, move init_mm there
      * remove INIT_MM, initialize init_mm with C99 initializer
      * unexport init_mm on all arches:
      
        init_mm is already unexported on x86.
      
        One strange place is some OMAP driver (drivers/video/omap/) which
        won't build modular, but it's already wants get_vm_area() export.
        Somebody should look there.
      
      [akpm@linux-foundation.org: add missing #includes]
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Mike Frysinger <vapier.adi@gmail.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bb1f17b0
    • A
      time: move PIT_TICK_RATE to linux/timex.h · 08604bd9
      Arnd Bergmann 提交于
      PIT_TICK_RATE is currently defined in four architectures, but in three
      different places.  While linux/timex.h is not the perfect place for it, it
      is still a reasonable replacement for those drivers that traditionally use
      asm/timex.h to get CLOCK_TICK_RATE and expect it to be the PIT frequency.
      
      Note that for Alpha, the actual value changed from 1193182UL to 1193180UL.
       This is unlikely to make a difference, and probably can only improve
      accuracy.  There was a discussion on the correct value of CLOCK_TICK_RATE
      a few years ago, after which every existing instance was getting changed
      to 1193182.  According to the specification, it should be
      1193181.818181...
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Dmitry Torokhov <dtor@mail.ru>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08604bd9
    • H
      x86, boot: use .code16gcc instead of .code16 · 28b48688
      H. Peter Anvin 提交于
      Use .code16gcc to compile arch/x86/boot/bioscall.S rather than
      .code16, since some older versions of binutils can't generate 32-bit
      addressing expressions (67 prefixes) in .code16 mode, only in
      .code16gcc mode.
      Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      28b48688
    • C
      x86: correct the conversion of EFI memory types · e2a71476
      Cliff Wickman 提交于
      This patch causes all the EFI_RESERVED_TYPE memory reservations to be recorded
      in the e820 table as type E820_RESERVED.
      
      (This patch replaces one called 'x86: vendor reserved memory type'.
       This version has been discussed a bit with Peter and Yinghai but not given
       a final opinion.)
      
      Without this patch EFI_RESERVED_TYPE memory reservations may be
      marked usable in the e820 table. There may be a collision between
      kernel use and some reserver's use of this memory.
      
      (An example use of this functionality is the UV system, which
       will access extremely large areas of memory with a memory engine
       that allows a user to address beyond the processor's range.  Such
       areas are reserved in the EFI table by the BIOS.
       Some loaders have a restricted number of entries possible in the e820 table,
       hence the need to record the reservations in the unrestricted EFI table.)
      
      The call to do_add_efi_memmap() is only made if "add_efi_memmap" is specified
      on the kernel command line.
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      e2a71476