1. 24 9月, 2009 1 次提交
    • R
      x86: Reduce verbosity of "PAT enabled" kernel message · e23a8b6a
      Roland Dreier 提交于
      On modern systems, the kernel prints the message
      
          x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
      
      once for every CPU.
      
      This gets kind of ridiculous on huge systems; for example, on a
      64-thread system I was lucky enough to get:
      
          dmesg| grep 'PAT enabled' | wc
               64     704    5174
      
      There is already a BUG() if non-boot CPUs have PAT capabilities
      that don't match the boot CPU, so just print the message on the
      boot CPU. (I kept the print after the wrmsrl() that enables PAT,
      so that the log output continues to mean that the system survived
      enabling PAT on the boot CPU)
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      LKML-Reference: <adavdj92sso.fsf@cisco.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e23a8b6a
  2. 22 9月, 2009 2 次提交
  3. 20 9月, 2009 1 次提交
  4. 18 9月, 2009 1 次提交
    • S
      x86, pat: don't use rb-tree based lookup in reserve_memtype() · dcb73bf4
      Suresh Siddha 提交于
      Recent enhancement of rb-tree based lookup exposed a  bug with the lookup
      mechanism in the reserve_memtype() which ensures that there are no conflicting
      memtype requests for the memory range.
      
      memtype_rb_search() returns an entry which has a start address <= new start
      address. And from here we traverse the linear linked list to check if there
      any conflicts with the existing mappings. As the rbtree is based on the
      start address of the memory range, it is quite possible that we have several
      overlapped mappings whose start address is much less than new requested start
      but the end is >= new requested end. This results in conflicting memtype
      mappings.
      
      Same bug exists with the old code which uses cached_entry from where
      we traverse the linear linked list. But the new rb-tree code exposes this
      bug fairly easily.
      
      For now, don't use the memtype_rb_search() and always start the search from
      the head of linear linked list in reserve_memtype(). Linear linked list
      for most of the systems grow's to few 10's of entries(as we track memory type
      of RAM pages using struct page). So we should be ok for now.
      
      We still retain the rbtree and use it to speed up free_memtype() which
      doesn't have the same bug(as we know what exactly we are searching for
      in free_memtype).
      
      Also use list_for_each_entry_from() in free_memtype() so that we start
      the search from rb-tree lookup result.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      LKML-Reference: <1253136483.4119.12.camel@sbs-t61.sc.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      dcb73bf4
  5. 11 9月, 2009 2 次提交
  6. 10 9月, 2009 3 次提交
    • A
      x86: Export kmap_atomic_to_page() · 256cd2ef
      Avi Kivity 提交于
      Needed by KVM.
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      256cd2ef
    • J
      xen: make -fstack-protector work under Xen · 577eebea
      Jeremy Fitzhardinge 提交于
      -fstack-protector uses a special per-cpu "stack canary" value.
      gcc generates special code in each function to test the canary to make
      sure that the function's stack hasn't been overrun.
      
      On x86-64, this is simply an offset of %gs, which is the usual per-cpu
      base segment register, so setting it up simply requires loading %gs's
      base as normal.
      
      On i386, the stack protector segment is %gs (rather than the usual kernel
      percpu %fs segment register).  This requires setting up the full kernel
      GDT and then loading %gs accordingly.  We also need to make sure %gs is
      initialized when bringing up secondary cpus too.
      
      To keep things consistent, we do the full GDT/segment register setup on
      both architectures.
      
      Because we need to avoid -fstack-protected code before setting up the GDT
      and because there's no way to disable it on a per-function basis, several
      files need to have stack-protector inhibited.
      
      [ Impact: allow Xen booting with stack-protector enabled ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      577eebea
    • J
      x86, pat: Fix cacheflush address in change_page_attr_set_clr() · fa526d0d
      Jack Steiner 提交于
      Fix address passed to cpa_flush_range() when changing page
      attributes from WB to UC. The address (*addr) is
      modified by __change_page_attr_set_clr(). The result is that
      the pages being flushed start at the _end_ of the changed range
      instead of the beginning.
      
      This should be considered for 2.6.30-stable and 2.6.31-stable.
      Signed-off-by: NJack Steiner <steiner@sgi.com>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Stable team <stable@kernel.org>
      fa526d0d
  7. 06 9月, 2009 2 次提交
  8. 04 9月, 2009 1 次提交
    • P
      kmemleak: Don't scan uninitialized memory when kmemcheck is enabled · 8e019366
      Pekka Enberg 提交于
      Ingo Molnar reported the following kmemcheck warning when running both
      kmemleak and kmemcheck enabled:
      
        PM: Adding info for No Bus:vcsa7
        WARNING: kmemcheck: Caught 32-bit read from uninitialized memory
        (f6f6e1a4)
        d873f9f600000000c42ae4c1005c87f70000000070665f666978656400000000
         i i i i u u u u i i i i i i i i i i i i i i i i i i i i i u u u
                 ^
      
        Pid: 3091, comm: kmemleak Not tainted (2.6.31-rc7-tip #1303) P4DC6
        EIP: 0060:[<c110301f>] EFLAGS: 00010006 CPU: 0
        EIP is at scan_block+0x3f/0xe0
        EAX: f40bd700 EBX: f40bd780 ECX: f16b46c0 EDX: 00000001
        ESI: f6f6e1a4 EDI: 00000000 EBP: f10f3f4c ESP: c2605fcc
         DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
        CR0: 8005003b CR2: e89a4844 CR3: 30ff1000 CR4: 000006f0
        DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
        DR6: ffff4ff0 DR7: 00000400
         [<c110313c>] scan_object+0x7c/0xf0
         [<c1103389>] kmemleak_scan+0x1d9/0x400
         [<c1103a3c>] kmemleak_scan_thread+0x4c/0xb0
         [<c10819d4>] kthread+0x74/0x80
         [<c10257db>] kernel_thread_helper+0x7/0x3c
         [<ffffffff>] 0xffffffff
        kmemleak: 515 new suspected memory leaks (see
        /sys/kernel/debug/kmemleak)
        kmemleak: 42 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
      
      The problem here is that kmemleak will scan partially initialized
      objects that makes kmemcheck complain. Fix that up by skipping
      uninitialized memory regions when kmemcheck is enabled.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      8e019366
  9. 27 8月, 2009 9 次提交
  10. 25 8月, 2009 1 次提交
  11. 22 8月, 2009 1 次提交
    • L
      x86: don't call '->send_IPI_mask()' with an empty mask · b04e6373
      Linus Torvalds 提交于
      As noted in 83d349f3 ("x86: don't send
      an IPI to the empty set of CPU's"), some APIC's will be very unhappy
      with an empty destination mask.  That commit added a WARN_ON() for that
      case, and avoided the resulting problem, but didn't fix the underlying
      reason for why those empty mask cases happened.
      
      This fixes that, by checking the result of 'cpumask_andnot()' of the
      current CPU actually has any other CPU's left in the set of CPU's to be
      sent a TLB flush, and not calling down to the IPI code if the mask is
      empty.
      
      The reason this started happening at all is that we started passing just
      the CPU mask pointers around in commit 4595f962 ("x86: change
      flush_tlb_others to take a const struct cpumask"), and when we did that,
      the cpumask was no longer thread-local.
      
      Before that commit, flush_tlb_mm() used to create it's own copy of
      'mm->cpu_vm_mask' and pass that copy down to the low-level flush
      routines after having tested that it was not empty.  But after changing
      it to just pass down the CPU mask pointer, the lower level TLB flush
      routines would now get a pointer to that 'mm->cpu_vm_mask', and that
      could still change - and become empty - after the test due to other
      CPU's having flushed their own TLB's.
      
      See
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=13933
      
      for details.
      Tested-by: NThomas Björnell <thomas.bjornell@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b04e6373
  12. 21 8月, 2009 1 次提交
  13. 18 8月, 2009 1 次提交
    • S
      x86, pat: Allow ISA memory range uncacheable mapping requests · 1adcaafe
      Suresh Siddha 提交于
      Max Vozeler reported:
      >  Bug 13877 -  bogl-term broken with CONFIG_X86_PAT=y, works with =n
      >
      >  strace of bogl-term:
      >  814   mmap2(NULL, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, 4, 0)
      >				 = -1 EAGAIN (Resource temporarily unavailable)
      >  814   write(2, "bogl: mmaping /dev/fb0: Resource temporarily unavailable\n",
      >	       57) = 57
      
      PAT code maps the ISA memory range as WB in the PAT attribute, so that
      fixed range MTRR registers define the actual memory type (UC/WC/WT etc).
      
      But the upper level is_new_memtype_allowed() API checks are failing,
      as the request here is for UC and the return tracked type is WB (Tracked type is
      WB as MTRR type for this legacy range potentially will be different for each
      4k page).
      
      Fix is_new_memtype_allowed() by always succeeding the ISA address range
      checks, as the null PAT (WB) and def MTRR fixed range register settings
      satisfy the memory type needs of the applications that map the ISA address
      range.
      Reported-and-Tested-by: NMax Vozeler <xam@debian.org>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      1adcaafe
  14. 14 8月, 2009 1 次提交
  15. 04 8月, 2009 2 次提交
  16. 31 7月, 2009 1 次提交
    • P
      x86, pat: Fix set_memory_wc related corruption · bdc6340f
      Pallipadi, Venkatesh 提交于
      Changeset 3869c4aa
      that went in after 2.6.30-rc1 was a seemingly small change to _set_memory_wc()
      to make it complaint with SDM requirements. But, introduced a nasty bug, which
      can result in crash and/or strange corruptions when set_memory_wc is used.
      One such crash reported here
      http://lkml.org/lkml/2009/7/30/94
      
      Actually, that changeset introduced two bugs.
      * change_page_attr_set() takes &addr as first argument and can the addr value
        might have changed on return, even for single page change_page_attr_set()
        call. That will make the second change_page_attr_set() in this routine
        operate on unrelated addr, that can eventually cause strange corruptions
        and bad page state crash.
      * The second change_page_attr_set() call, before setting _PAGE_CACHE_WC, should
        clear the earlier _PAGE_CACHE_UC_MINUS, as otherwise cache attribute will not
        be WC (will be UC instead).
      
      The patch below fixes both these problems. Sending a single patch to fix both
      the problems, as the change is to the same line of code. The change to have a
      addr_copy is not very clean. But, it is simpler than making more changes
      through various routines in pageattr.c.
      
      A huge thanks to Jerome for reporting this problem and providing a simple test
      case that helped us root cause the problem.
      Reported-by: NJerome Glisse <glisse@freedesktop.org>
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20090730214319.GA1889@linux-os.sc.intel.com>
      Acked-by: NDave Airlie <airlied@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      bdc6340f
  17. 29 7月, 2009 1 次提交
  18. 28 7月, 2009 1 次提交
    • B
      mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() · 9e1b32ca
      Benjamin Herrenschmidt 提交于
      mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()
      
      Upcoming paches to support the new 64-bit "BookE" powerpc architecture
      will need to have the virtual address corresponding to PTE page when
      freeing it, due to the way the HW table walker works.
      
      Basically, the TLB can be loaded with "large" pages that cover the whole
      virtual space (well, sort-of, half of it actually) represented by a PTE
      page, and which contain an "indirect" bit indicating that this TLB entry
      RPN points to an array of PTEs from which the TLB can then create direct
      entries. Thus, in order to invalidate those when PTE pages are deleted,
      we need the virtual address to pass to tlbilx or tlbivax instructions.
      
      The old trick of sticking it somewhere in the PTE page struct page sucks
      too much, the address is almost readily available in all call sites and
      almost everybody implemets these as macros, so we may as well add the
      argument everywhere. I added it to the pmd and pud variants for consistency.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV]
      Acked-by: NNick Piggin <npiggin@suse.de>
      Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e1b32ca
  19. 22 7月, 2009 1 次提交
  20. 11 7月, 2009 1 次提交
    • R
      x86: Remove spurious printk level from segfault message · a1a08d1c
      Roland Dreier 提交于
      Since commit 5fd29d6c ("printk: clean up handling of log-levels
      and newlines"), the kernel logs segfaults like:
      
          <6>gnome-power-man[24509]: segfault at 20 ip 00007f9d4950465a sp 00007fffbb50fc70 error 4 in libgobject-2.0.so.0.2103.0[7f9d494f7000+45000]
      
      with the extra "<6>" being KERN_INFO.  This happens because the
      printk in show_signal_msg() started with KERN_CONT and then
      used "%s" to pass in the real level; and KERN_CONT is no longer
      an empty string, and printk only pays attention to the level at
      the very beginning of the format string.
      
      Therefore, remove the KERN_CONT from this printk, since it is
      now actively causing problems (and never really made any
      sense).
      Signed-off-by: NRoland Dreier <roland@digitalvampire.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <874otjitkj.fsf@shaolin.home.digitalvampire.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a1a08d1c
  21. 09 7月, 2009 2 次提交
  22. 04 7月, 2009 1 次提交
    • T
      x86,percpu: generalize lpage first chunk allocator · 8c4bfc6e
      Tejun Heo 提交于
      Generalize and move x86 setup_pcpu_lpage() into
      pcpu_lpage_first_chunk().  setup_pcpu_lpage() now is a simple wrapper
      around the generalized version.  Other than taking size parameters and
      using arch supplied callbacks to allocate/free/map memory,
      pcpu_lpage_first_chunk() is identical to the original implementation.
      
      This simplifies arch code and will help converting more archs to
      dynamic percpu allocator.
      
      While at it, factor out pcpu_calc_fc_sizes() which is common to
      pcpu_embed_first_chunk() and pcpu_lpage_first_chunk().
      
      [ Impact: code reorganization and generalization ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      8c4bfc6e
  23. 01 7月, 2009 2 次提交
    • J
      x86: Declare check_efer() before it gets used · 76c06927
      Jaswinder Singh Rajput 提交于
      This sparse warning:
      
        arch/x86/mm/init.c:83:16: warning: symbol 'check_efer' was not declared. Should it be static?
      
      triggers because check_efer() is not decalared before using it.
      asm/proto.h includes the declaration of check_efer(), so
      including asm/proto.h to fix that - this also addresses the
      sparse warning.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <1246458263.6940.22.camel@hpdv5.satnam>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      76c06927
    • Y
      x86: only clear node_states for 64bit · 66918dcd
      Yinghai Lu 提交于
      Nathan reported that
      
      | commit 73d60b7f
      | Author: Yinghai Lu <yinghai@kernel.org>
      | Date:   Tue Jun 16 15:33:00 2009 -0700
      |
      |    page-allocator: clear N_HIGH_MEMORY map before we set it again
      |
      |    SRAT tables may contains nodes of very small size.  The arch code may
      |    decide to not activate such a node.  However, currently the early boot
      |    code sets N_HIGH_MEMORY for such nodes.  These nodes therefore seem to be
      |    active although these nodes have no present pages.
      |
      |    For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too
      
      unintentionally and incorrectly clears the cpuset.mems cgroup attribute on
      an i386 kvm guest, meaning that cpuset.mems can not be used.
      
      Fix this by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
      and need to do save/restore for that in find_zone_movable_pfn
      Reported-by: NNathan Lynch <ntl@pobox.com>
      Tested-by: NNathan Lynch <ntl@pobox.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>,
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66918dcd
  24. 29 6月, 2009 1 次提交