1. 13 2月, 2009 1 次提交
  2. 12 2月, 2009 6 次提交
  3. 09 2月, 2009 1 次提交
  4. 06 2月, 2009 1 次提交
  5. 05 2月, 2009 1 次提交
  6. 31 1月, 2009 1 次提交
  7. 29 1月, 2009 4 次提交
  8. 27 1月, 2009 1 次提交
  9. 26 1月, 2009 1 次提交
  10. 24 1月, 2009 2 次提交
    • H
      x86: handle PAT more like other CPU features · 75a04811
      H. Peter Anvin 提交于
      Impact: Cleanup
      
      When PAT was originally introduced, it was handled specially for a few
      reasons:
      
      - PAT bugs are hard to track down, so we wanted to maintain a
        whitelist of CPUs.
      - The i386 and x86-64 CPUID code was not yet unified.
      
      Both of these are now obsolete, so handle PAT like any other features,
      including ordinary feature blacklisting due to known bugs.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      75a04811
    • H
      x86: uaccess: introduce try and catch framework · fe40c0af
      Hiroshi Shimamoto 提交于
      Impact: introduce new uaccess exception handling framework
      
      Introduce {get|put}_user_try and {get|put}_user_catch as new uaccess exception
      handling framework.
      {get|put}_user_try begins exception block and {get|put}_user_catch(err) ends
      the block and gets err if an exception occured in {get|put}_user_ex() in the
      block. The exception is stored thread_info->uaccess_err.
      
      The example usage of this framework is below;
      int func()
      {
      	int err = 0;
      
      	get_user_try {
      		get_user_ex(...);
      		get_user_ex(...);
      		:
      	} get_user_catch(err);
      
      	return err;
      }
      
      Note: get_user_ex() is not clear the value when an exception occurs, it's
      different from the behavior of __get_user(), but I think it doesn't matter.
      Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      fe40c0af
  11. 22 1月, 2009 3 次提交
  12. 21 1月, 2009 3 次提交
    • S
      x86: fix page attribute corruption with cpa() · a1e46212
      Suresh Siddha 提交于
      Impact: fix sporadic slowdowns and warning messages
      
      This patch fixes a performance issue reported by Linus on his
      Nehalem system. While Linus reverted the PAT patch (commit
      58dab916) which exposed the issue,
      existing cpa() code can potentially still cause wrong(page attribute
      corruption) behavior.
      
      This patch also fixes the "WARNING: at arch/x86/mm/pageattr.c:560" that
      various people reported.
      
      In 64bit kernel, kernel identity mapping might have holes depending
      on the available memory and how e820 reports the address range
      covering the RAM, ACPI, PCI reserved regions. If there is a 2MB/1GB hole
      in the address range that is not listed by e820 entries, kernel identity
      mapping will have a corresponding hole in its 1-1 identity mapping.
      
      If cpa() happens on the kernel identity mapping which falls into these holes,
      existing code fails like this:
      
      	__change_page_attr_set_clr()
      		__change_page_attr()
      			returns 0 because of if (!kpte). But doesn't
      			set cpa->numpages and cpa->pfn.
      		cpa_process_alias()
      			uses uninitialized cpa->pfn (random value)
      			which can potentially lead to changing the page
      			attribute of kernel text/data, kernel identity
      			mapping of RAM pages etc. oops!
      
      This bug was easily exposed by another PAT patch which was doing
      cpa() more often on kernel identity mapping holes (physical range between
      max_low_pfn_mapped and 4GB), where in here it was setting the
      cache disable attribute(PCD) for kernel identity mappings aswell.
      
      Fix cpa() to handle the kernel identity mapping holes. Retain
      the WARN() for cpa() calls to other not present address ranges
      (kernel-text/data, ioremap() addresses)
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a1e46212
    • I
      x86: uv cleanup, build fix · 4ec71fa2
      Ingo Molnar 提交于
      Fix:
      
       arch/x86/mm/srat_64.c: In function ‘acpi_numa_processor_affinity_init’:
       arch/x86/mm/srat_64.c:141: error: implicit declaration of function ‘get_uv_system_type’
       arch/x86/mm/srat_64.c:141: error: ‘UV_X2APIC’ undeclared (first use in this function)
       arch/x86/mm/srat_64.c:141: error: (Each undeclared identifier is reported only once
       arch/x86/mm/srat_64.c:141: error: for each function it appears in.)
      
      A couple of UV definitions were moved to asm/uv/uv.h, but srat_64.c did
      not include that header. Add it.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4ec71fa2
    • I
      x86, mm: move tlb.c to arch/x86/mm/ · 55f4949f
      Ingo Molnar 提交于
      Impact: cleanup
      
      Now that it's unified, move the (SMP) TLB flushing code from arch/x86/kernel/
      to arch/x86/mm/, where it belongs logically.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      55f4949f
  13. 20 1月, 2009 2 次提交
    • N
      x86: optimise x86's do_page_fault (C entry point for the page fault path) · 92181f19
      Nick Piggin 提交于
      Impact: cleanup, restructure code to improve assembly
      
      gcc isn't _all_ that smart about spilling registers to stack or reusing
      stack slots, even with branch annotations. do_page_fault contained a lot
      of functionality, so split unlikely paths into their own functions, and
      mark them as noinline just to be sure. I consider this actually to be
      somewhat of a cleanup too: the main function now contains about half
      the number of lines so the normal path is easier to read, while the error
      cases are also nicely split away.
      
      Also, ensure the order of arguments to functions is always the same: regs,
      addr, error_code. This can reduce code size a tiny bit, and just looks neater
      too.
      
      And add a couple of branch annotations.
      
      Before:
        do_page_fault:
                subq    $360, %rsp      #,
      
      After:
        do_page_fault:
                subq    $56, %rsp       #,
      
      bloat-o-meter:
        add/remove: 8/0 grow/shrink: 0/1 up/down: 2222/-1680 (542)
        function                                     old     new   delta
        __bad_area_nosemaphore                         -     506    +506
        no_context                                     -     474    +474
        vmalloc_fault                                  -     424    +424
        spurious_fault                                 -     358    +358
        mm_fault_error                                 -     272    +272
        bad_area_access_error                          -      89     +89
        bad_area                                       -      89     +89
        bad_area_nosemaphore                           -      10     +10
        do_page_fault                               2464     784   -1680
      
      Yes, the total size increases by 542 bytes, due to the extra function calls.
      But these will very rarely be called (except for vmalloc_fault) in a normal
      workload. Importantly, do_page_fault is less than 1/3rd it's original size,
      and touches far less stack.
      
      Existing gotos and branch hints did move a lot of the infrequently used text
      out of the fastpath, but that's even further improved after this patch.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      92181f19
    • G
      x86: remove kernel_physical_mapping_init() from init section · f5495506
      Gary Hade 提交于
      Impact: fix crash with memory hotplug enabled
      
      kernel_physical_mapping_init() is called during memory hotplug
      so it does not belong in the init section.
      
      If the kernel is built with CONFIG_DEBUG_SECTION_MISMATCH=y on
      the make command line, arch/x86/mm/init_64.c is compiled with
      the -fno-inline-functions-called-once gcc option defeating
      inlining of kernel_physical_mapping_init() within init_memory_mapping().
      
      When kernel_physical_mapping_init() is not inlined it is placed
      in the .init.text section according to the __init in it's current
      declaration.  A later call to kernel_physical_mapping_init() during
      a memory hotplug operation encounters an int3 trap because the
      .init.text section memory has been freed.
      
      This patch eliminates the crash caused by the int3 trap by moving the
      non-inlined kernel_physical_mapping_init() from .init.text to .meminit.text.
      Signed-off-by: NGary Hade <garyhade@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f5495506
  14. 16 1月, 2009 2 次提交
    • J
      x86: fix assumed to be contiguous leaf page tables for kmap_atomic region (take 2) · a3c6018e
      Jan Beulich 提交于
      Debugging and original patch from Nick Piggin <npiggin@suse.de>
      
      The early fixmap pmd entry inserted at the very top of the KVA is causing the
      subsequent fixmap mapping code to not provide physically linear pte pages over
      the kmap atomic portion of the fixmap (which relies on said property to
      calculate pte addresses).
      
      This has caused weird boot failures in kmap_atomic much later in the boot
      process (initial userspace faults) on a 32-bit PAE system with a larger number
      of CPUs (smaller CPU counts tend not to run over into the next page so don't
      show up the problem).
      
      Solve this by attempting to clear out the page table, and copy any of its
      entries to the new one. Also, add a bug if a nonlinear condition is encountered
      and can't be resolved, which might save some hours of debugging if this fragile
      scheme ever breaks again...
      
      Once we have such logic, we can also use it to eliminate the early ioremap
      trickery around the page table setup for the fixmap area. This also fixes
      potential issues with FIX_* entries sharing the leaf page table with the early
      ioremap ones getting discarded by early_ioremap_clear() and not restored by
      early_ioremap_reset(). It at once eliminates the temporary (and configuration,
      namely NR_CPUS, dependent) unavailability of early fixed mappings during the
      time the fixmap area page tables get constructed.
      
      Finally, also replace the hard coded calculation of the initial table space
      needed for the fixmap area with a proper one, allowing kernels configured for
      large CPU counts to actually boot.
      
      Based-on: Nick Piggin <npiggin@suse.de>
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a3c6018e
    • L
      Revert "x86 PAT: remove CPA WARN_ON for zero pte" · b5db0e38
      Linus Torvalds 提交于
      This reverts commit 58dab916, which
      makes my Nehalem come to a nasty crawling almost-halt.  It looks like it
      turns off caching of regular kernel RAM, with the understandable
      slowdown of a few orders of magnitude as a result.
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5db0e38
  15. 15 1月, 2009 1 次提交
  16. 14 1月, 2009 3 次提交
    • V
      x86 PAT: remove CPA WARN_ON for zero pte · 58dab916
      venkatesh.pallipadi@intel.com 提交于
      Impact: reduce scope of debug check - avoid warnings
      
      The logic to find whether identity map exists or not using
      high_memory or max_low_pfn_mapped/max_pfn_mapped are not complete
      as the memory withing the range may not be mapped if there is a
      unusable hole in e820.
      
      Specifically, on my test system I started seeing these warnings with
      tools like hwinfo, acpidump trying to map ACPI region.
      
      [   27.400018] ------------[ cut here ]------------
      [   27.400344] WARNING: at /home/venkip/src/linus/linux-2.6/arch/x86/mm/pageattr.c:560 __change_page_attr_set_clr+0xf3/0x8b8()
      [   27.400821] Hardware name: X7DB8
      [   27.401070] CPA: called for zero pte. vaddr = ffff8800cff6a000 cpa->vaddr = ffff8800cff6a000
      [   27.401569] Modules linked in:
      [   27.401882] Pid: 4913, comm: dmidecode Not tainted 2.6.28-05716-gfe0bdec6 #586
      [   27.402141] Call Trace:
      [   27.402488]  [<ffffffff80237c21>] warn_slowpath+0xd3/0x10f
      [   27.402749]  [<ffffffff80274ade>] ? find_get_page+0xb3/0xc9
      [   27.403028]  [<ffffffff80274a2b>] ? find_get_page+0x0/0xc9
      [   27.403333]  [<ffffffff80226425>] __change_page_attr_set_clr+0xf3/0x8b8
      [   27.403628]  [<ffffffff8028ec99>] ? __purge_vmap_area_lazy+0x192/0x1a1
      [   27.403883]  [<ffffffff8028eb52>] ? __purge_vmap_area_lazy+0x4b/0x1a1
      [   27.404172]  [<ffffffff80290268>] ? vm_unmap_aliases+0x1ab/0x1bb
      [   27.404512]  [<ffffffff80290105>] ? vm_unmap_aliases+0x48/0x1bb
      [   27.404766]  [<ffffffff80226d28>] change_page_attr_set_clr+0x13e/0x2e6
      [   27.405026]  [<ffffffff80698fa7>] ? _spin_unlock+0x26/0x2a
      [   27.405292]  [<ffffffff80227e6a>] ? reserve_memtype+0x19b/0x4e3
      [   27.405590]  [<ffffffff80226ffd>] _set_memory_wb+0x22/0x24
      [   27.405844]  [<ffffffff80225d28>] ioremap_change_attr+0x26/0x28
      [   27.406097]  [<ffffffff80228355>] reserve_pfn_range+0x1a3/0x235
      [   27.406427]  [<ffffffff80228430>] track_pfn_vma_new+0x49/0xb3
      [   27.406686]  [<ffffffff80286c46>] remap_pfn_range+0x94/0x32c
      [   27.406940]  [<ffffffff8022878d>] ? phys_mem_access_prot_allowed+0xb5/0x1a8
      [   27.407209]  [<ffffffff803e9bf4>] mmap_mem+0x75/0x9d
      [   27.407523]  [<ffffffff8028b3b4>] mmap_region+0x2cf/0x53e
      [   27.407776]  [<ffffffff8028b8cc>] do_mmap_pgoff+0x2a9/0x30d
      [   27.408034]  [<ffffffff8020f4a4>] sys_mmap+0x92/0xce
      [   27.408339]  [<ffffffff8020b65b>] system_call_fastpath+0x16/0x1b
      [   27.408614] ---[ end trace 4b16ad70c09a602d ]---
      [   27.408871] dmidecode:4913 reserve_pfn_range ioremap_change_attr failed write-back for cff6a000-cff6b000
      
      This is wih track_pfn_vma_new trying to keep identity map in sync.
      The address cff6a000 is the ACPI region according to e820.
      
      [    0.000000] BIOS-provided physical RAM map:
      [    0.000000]  BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
      [    0.000000]  BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved)
      [    0.000000]  BIOS-e820: 00000000000cc000 - 00000000000d0000 (reserved)
      [    0.000000]  BIOS-e820: 00000000000e4000 - 0000000000100000 (reserved)
      [    0.000000]  BIOS-e820: 0000000000100000 - 00000000cff60000 (usable)
      [    0.000000]  BIOS-e820: 00000000cff60000 - 00000000cff69000 (ACPI data)
      [    0.000000]  BIOS-e820: 00000000cff69000 - 00000000cff80000 (ACPI NVS)
      [    0.000000]  BIOS-e820: 00000000cff80000 - 00000000d0000000 (reserved)
      [    0.000000]  BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
      [    0.000000]  BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
      [    0.000000]  BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
      [    0.000000]  BIOS-e820: 00000000ff000000 - 0000000100000000 (reserved)
      [    0.000000]  BIOS-e820: 0000000100000000 - 0000000230000000 (usable)
      
      And is not mapped as per init_memory_mapping.
      
      [    0.000000] init_memory_mapping: 0000000000000000-00000000cff60000
      [    0.000000] init_memory_mapping: 0000000100000000-0000000230000000
      
      We can add logic to check for this. But, there can also be other holes in
      identity map when we have 1GB of aligned reserved space in e820.
      
      This patch handles it by removing the WARN_ON and returning a specific
      error value (EFAULT) to indicate that the address does not have any
      identity mapping.
      
      The code that tries to keep identity map in sync can ignore
      this error, with other callers of cpa still getting error here.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      58dab916
    • V
      x86 PAT: return compatible mapping to remap_pfn_range callers · cdecff68
      venkatesh.pallipadi@intel.com 提交于
      Impact: avoid warning message, potentially solve 3D performance regression
      
      Change x86 PAT code to return compatible memtype if the exact memtype that
      was requested in remap_pfn_rage and friends is not available due to some
      conflict.
      
      This is done by returning the compatible type in pgprot parameter of
      track_pfn_vma_new(), and the caller uses that memtype for page table.
      
      Note that track_pfn_vma_copy() which is basically called during fork gets the
      prot from existing page table and should not have any conflict. Hence we use
      strict memtype check there and do not allow compatible memtypes.
      
      This patch fixes the bug reported here:
      
        http://marc.info/?l=linux-kernel&m=123108883716357&w=2
      
      Specifically the error message:
      
        X:5010 map pfn expected mapping type write-back for d0000000-d0101000,
        got write-combining
      
      Should go away.
      Reported-and-bisected-by: NKevin Winchester <kjwinchester@gmail.com>
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdecff68
    • V
      x86 PAT: change track_pfn_vma_new to take pgprot_t pointer param · e4b866ed
      venkatesh.pallipadi@intel.com 提交于
      Impact: cleanup
      
      Change the protection parameter for track_pfn_vma_new() into a pgprot_t pointer.
      Subsequent patch changes the x86 PAT handling to return a compatible
      memtype in pgprot_t, if what was requested cannot be allowed due to conflicts.
      No fuctionality change in this patch.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e4b866ed
  17. 13 1月, 2009 1 次提交
    • A
      x86: avoid theoretical vmalloc fault loop · f313e123
      Andi Kleen 提交于
      Ajith Kumar noticed:
      
       I was going through the vmalloc fault handling for x86_64 and am unclear
       about the following lines in the vmalloc_fault() function.
      
       pgd = pgd_offset(current->mm ?: &init_mm, address);
       pgd_ref = pgd_offset_k(address);
      
       Here the intention is to get the pgd corresponding to the current process
       and sync it up with the pgd in init_mm(obtained from pgd_offset_k).
       However, for kernel threads current->mm is NULL and hence pgd =
       pgd_offset(init_mm, address) = pgd_ref which means the fault handler
       returns without setting the pgd entry in the MM structure in the context
       of which the kernel thread has faulted.  This could lead to never-ending
       faults and busy looping of kernel threads like pdflush.  So, shouldn't the
       pgd = pgd_offset(current->mm ?: &init_mm, address); be pgd =
       pgd_offset(current->active_mm ?: &init_mm, address);
      
      We can use active_mm unconditionally because it should be always set.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f313e123
  18. 08 1月, 2009 2 次提交
  19. 07 1月, 2009 3 次提交
    • J
      x86: smp.h move zap_low_mappings declartion to tlbflush.h · dacf7333
      Jaswinder Singh Rajput 提交于
      Impact: cleanup, moving NON-SMP stuff from smp.h
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dacf7333
    • G
      mm: show node to memory section relationship with symlinks in sysfs · c04fc586
      Gary Hade 提交于
      Show node to memory section relationship with symlinks in sysfs
      
      Add /sys/devices/system/node/nodeX/memoryY symlinks for all
      the memory sections located on nodeX.  For example:
      /sys/devices/system/node/node1/memory135 -> ../../memory/memory135
      indicates that memory section 135 resides on node1.
      
      Also revises documentation to cover this change as well as updating
      Documentation/ABI/testing/sysfs-devices-memory to include descriptions
      of memory hotremove files 'phys_device', 'phys_index', and 'state'
      that were previously not described there.
      
      In addition to it always being a good policy to provide users with
      the maximum possible amount of physical location information for
      resources that can be hot-added and/or hot-removed, the following
      are some (but likely not all) of the user benefits provided by
      this change.
      Immediate:
        - Provides information needed to determine the specific node
          on which a defective DIMM is located.  This will reduce system
          downtime when the node or defective DIMM is swapped out.
        - Prevents unintended onlining of a memory section that was
          previously offlined due to a defective DIMM.  This could happen
          during node hot-add when the user or node hot-add assist script
          onlines _all_ offlined sections due to user or script inability
          to identify the specific memory sections located on the hot-added
          node.  The consequences of reintroducing the defective memory
          could be ugly.
        - Provides information needed to vary the amount and distribution
          of memory on specific nodes for testing or debugging purposes.
      Future:
        - Will provide information needed to identify the memory
          sections that need to be offlined prior to physical removal
          of a specific node.
      
      Symlink creation during boot was tested on 2-node x86_64, 2-node
      ppc64, and 2-node ia64 systems.  Symlink creation during physical
      memory hot-add tested on a 2-node x86_64 system.
      Signed-off-by: NGary Hade <garyhade@us.ibm.com>
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c04fc586
    • N
      mm: invoke oom-killer from page fault · 1c0fe6e3
      Nick Piggin 提交于
      Rather than have the pagefault handler kill a process directly if it gets
      a VM_FAULT_OOM, have it call into the OOM killer.
      
      With increasingly sophisticated oom behaviour (cpusets, memory cgroups,
      oom killing throttling, oom priority adjustment or selective disabling,
      panic on oom, etc), it's silly to unconditionally kill the faulting
      process at page fault time.  Create a hook for pagefault oom path to call
      into instead.
      
      Only converted x86 and uml so far.
      
      [akpm@linux-foundation.org: make __out_of_memory() static]
      [akpm@linux-foundation.org: fix comment]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Jeff Dike <jdike@addtoit.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c0fe6e3
  20. 06 1月, 2009 1 次提交