1. 21 10月, 2010 3 次提交
    • F
      x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 || HIGHMEM64G · 66f2b061
      FUJITA Tomonori 提交于
      Set CONFIG_ARCH_DMA_ADDR_T_64BIT when we set dma_addr_t to 64 bits in
      <asm/types.h>; this allows Kconfig decisions based on this property.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      LKML-Reference: <201010202255.o9KMtZXu009370@imap1.linux-foundation.org>
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      66f2b061
    • S
      x86: Spread tlb flush vector between nodes · 93296720
      Shaohua Li 提交于
      Currently flush tlb vector allocation is based on below equation:
      	sender = smp_processor_id() % 8
      This isn't optimal, CPUs from different node can have the same vector, this
      causes a lot of lock contention. Instead, we can assign the same vectors to
      CPUs from the same node, while different node has different vectors. This has
      below advantages:
      a. if there is lock contention, the lock contention is between CPUs from one
      node. This should be much cheaper than the contention between nodes.
      b. completely avoid lock contention between nodes. This especially benefits
      kswapd, which is the biggest user of tlb flush, since kswapd sets its affinity
      to specific node.
      
      In my test, this could reduce > 20% CPU overhead in extreme case.The test
      machine has 4 nodes and each node has 16 CPUs. I then bind each node's kswapd
      to the first CPU of the node. I run a workload with 4 sequential mmap file
      read thread. The files are empty sparse file. This workload will trigger a
      lot of page reclaim and tlbflush. The kswapd bind is to easy trigger the
      extreme tlb flush lock contention because otherwise kswapd keeps migrating
      between CPUs of a node and I can't get stable result. Sure in real workload,
      we can't always see so big tlb flush lock contention, but it's possible.
      
      [ hpa: folded in fix from Eric Dumazet to use this_cpu_read() ]
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      LKML-Reference: <1287544023.4571.8.camel@sli10-conroe.sh.intel.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      93296720
    • B
      x86, mm: Fix incorrect data type in vmalloc_sync_all() · f01f7c56
      Borislav Petkov 提交于
      arch/x86/mm/fault.c: In function 'vmalloc_sync_all':
      arch/x86/mm/fault.c:238: warning: assignment makes integer from pointer without a cast
      
      introduced by 617d34d9.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      LKML-Reference: <20101020103642.GA3135@kryptos.osrc.amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      f01f7c56
  2. 20 10月, 2010 3 次提交
  3. 19 10月, 2010 3 次提交
  4. 16 10月, 2010 1 次提交
  5. 15 10月, 2010 4 次提交
    • S
      ftrace: Rename config option HAVE_C_MCOUNT_RECORD to HAVE_C_RECORDMCOUNT · cf4db259
      Steven Rostedt 提交于
      The config option used by archs to let the build system know that
      the C version of the recordmcount works for said arch is currently
      called HAVE_C_MCOUNT_RECORD which enables BUILD_C_RECORDMCOUNT. To
      be more consistent with the name that all archs may use, it has been
      renamed to HAVE_C_RECORDMCOUNT. This will be less confusing since
      we are building a C recordmcount and not a mcount_record.
      Suggested-by: NIngo Molnar <mingo@elte.hu>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: linux-kbuild@vger.kernel.org
      Cc: John Reiser <jreiser@bitwagon.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      cf4db259
    • S
      ftrace/x86: Add support for C version of recordmcount · 72441cb1
      Steven Rostedt 提交于
      This patch adds the support for the C version of recordmcount and
      compile times show ~ 12% improvement.
      
      After verifying this works, other archs can add:
      
       HAVE_C_MCOUNT_RECORD
      
      in its Kconfig and it will use the C version of recordmcount
      instead of the perl version.
      
      Cc: <linux-arch@vger.kernel.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: linux-kbuild@vger.kernel.org
      Cc: John Reiser <jreiser@bitwagon.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      72441cb1
    • F
      x86: Barf when vmalloc and kmemcheck faults happen in NMI · ebc8827f
      Frederic Weisbecker 提交于
      In x86, faults exit by executing the iret instruction, which then
      reenables NMIs if we faulted in NMI context. Then if a fault
      happens in NMI, another NMI can nest after the fault exits.
      
      But we don't yet support nested NMIs because we have only one NMI
      stack. To prevent from that, check that vmalloc and kmemcheck
      faults don't happen in this context. Most of the other kernel faults
      in NMIs can be more easily spotted by finding explicit
      copy_from,to_user() calls on review.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      ebc8827f
    • L
      Don't dump task struct in a.out core-dumps · 0eead9ab
      Linus Torvalds 提交于
      akiphie points out that a.out core-dumps have that odd task struct
      dumping that was never used and was never really a good idea (it goes
      back into the mists of history, probably the original core-dumping
      code).  Just remove it.
      
      Also do the access_ok() check on dump_write().  It probably doesn't
      matter (since normal filesystems all seem to do it anyway), but he
      points out that it's normally done by the VFS layer, so ...
      
      [ I suspect that we should possibly do "vfs_write()" instead of
        calling ->write directly.  That also does the whole fsnotify and write
        statistics thing, which may or may not be a good idea. ]
      
      And just to be anal, do this all for the x86-64 32-bit a.out emulation
      code too, even though it's not enabled (and won't currently even
      compile)
      Reported-by: Nakiphie <akiphie@lavabit.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0eead9ab
  6. 14 10月, 2010 3 次提交
  7. 13 10月, 2010 2 次提交
  8. 12 10月, 2010 1 次提交
    • Y
      x86, numa: For each node, register the memory blocks actually used · 73cf624d
      Yinghai Lu 提交于
      Russ reported SGI UV is broken recently. He said:
      
      | The SRAT table shows that memory range is spread over two nodes.
      |
      | SRAT: Node 0 PXM 0 100000000-800000000
      | SRAT: Node 1 PXM 1 800000000-1000000000
      | SRAT: Node 0 PXM 0 1000000000-1080000000
      |
      |Previously, the kernel early_node_map[] would show three entries
      |with the proper node.
      |
      |[    0.000000]     0: 0x00100000 -> 0x00800000
      |[    0.000000]     1: 0x00800000 -> 0x01000000
      |[    0.000000]     0: 0x01000000 -> 0x01080000
      |
      |The problem is recent community kernel early_node_map[] shows
      |only two entries with the node 0 entry overlapping the node 1
      |entry.
      |
      |    0: 0x00100000 -> 0x01080000
      |    1: 0x00800000 -> 0x01000000
      
      After looking at the changelog, Found out that it has been broken for a while by
      following commit
      
      |commit 8716273c
      |Author: David Rientjes <rientjes@google.com>
      |Date:   Fri Sep 25 15:20:04 2009 -0700
      |
      |    x86: Export srat physical topology
      
      Before that commit, register_active_regions() is called for every SRAT memory
      entry right away.
      
      Use nodememblk_range[] instead of nodes[] in order to make sure we
      capture the actual memory blocks registered with each node.  nodes[]
      contains an extended range which spans all memory regions associated
      with a node, but that does not mean that all the memory in between are
      included.
      Reported-by: NRuss Anderson <rja@sgi.com>
      Tested-by: NRuss Anderson <rja@sgi.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4CB27BDF.5000800@kernel.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: <stable@kernel.org> 2.6.33 .34 .35 .36
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      73cf624d
  9. 11 10月, 2010 3 次提交
    • Z
      KVM: x86: Move TSC reset out of vmcb_init · 47008cd8
      Zachary Amsden 提交于
      The VMCB is reset whenever we receive a startup IPI, so Linux is setting
      TSC back to zero happens very late in the boot process and destabilizing
      the TSC.  Instead, just set TSC to zero once at VCPU creation time.
      
      Why the separate patch?  So git-bisect is your friend.
      Signed-off-by: NZachary Amsden <zamsden@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      47008cd8
    • Z
      KVM: x86: Fix SVM VMCB reset · 58877679
      Zachary Amsden 提交于
      On reset, VMCB TSC should be set to zero.  Instead, code was setting
      tsc_offset to zero, which passes through the underlying TSC.
      Signed-off-by: NZachary Amsden <zamsden@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      58877679
    • B
      x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order · 6dcbfe4f
      Borislav Petkov 提交于
      This fixes possible cases of not collecting valid error info in
      the MCE error thresholding groups on F10h hardware.
      
      The current code contains a subtle problem of checking only the
      Valid bit of MSR0000_0413 (which is MC4_MISC0 - DRAM
      thresholding group) in its first iteration and breaking out if
      the bit is cleared.
      
      But (!), this MSR contains an offset value, BlkPtr[31:24], which
      points to the remaining MSRs in this thresholding group which
      might contain valid information too. But if we bail out only
      after we checked the valid bit in the first MSR and not the
      block pointer too, we miss that other information.
      
      The thing is, MC4_MISC0[BlkPtr] is not predicated on
      MCi_STATUS[MiscV] or MC4_MISC0[Valid] and should be checked
      prior to iterating over the MCI_MISCj thresholding group,
      irrespective of the MC4_MISC0[Valid] setting.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6dcbfe4f
  10. 08 10月, 2010 2 次提交
    • J
      x86, mce, therm_throt.c: Fix missing curly braces in error handling logic · b62be8ea
      Jin Dongming 提交于
      When the feature PTS is not supported by CPU, the sysfile
      package_power_limit_count for package should not be
      generated.
      
      This patch is used for fixing missing { and }.
      
      The patch is not complete as there are other error handling
      problems in this function - but that can wait until the
      merge window.
      Signed-off-by: NJin Dongming <jin.dongming@np.css.fujitsu.com>
      Reviewed-by: NFenghua Yu <fenghua.yu@initel.com>
      Acked-by: NJean Delvare <khali@linux-fr.org>
      Cc: Brown Len <len.brown@intel.com>
      Cc: Guenter Roeck <guenter.roeck@ericsson.com>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: lm-sensors@lm-sensors.org <lm-sensors@lm-sensors.org>
      LKML-Reference: <4C7625D1.4060201@np.css.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b62be8ea
    • N
      x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation · a416e9e1
      Namhyung Kim 提交于
      On 32-bit non-PAE system, cast to 'phys_addr_t' truncates value
      before subtraction. Subtracting before cast produce same result
      but remove following warnings from sparse:
      
       arch/x86/include/asm/pgtable_types.h:255:38: warning: cast truncates bits from constant value (100000000 becomes 0)
       arch/x86/include/asm/pgtable_types.h:270:38: warning: cast truncates bits from constant value (100000000 becomes 0)
       arch/x86/include/asm/pgtable.h:127:32: warning: cast truncates bits from constant value (100000000 becomes 0)
       arch/x86/include/asm/pgtable.h:132:32: warning: cast truncates bits from constant value (100000000 becomes 0)
       arch/x86/include/asm/pgtable.h:344:31: warning: cast truncates bits from constant value (100000000 becomes 0)
      
      64-bit or PAE machines will not be affected by this change.
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      LKML-Reference: <1285770588-14065-1-git-send-email-namhyung@gmail.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      a416e9e1
  11. 06 10月, 2010 2 次提交
    • J
      x86, mm: Add RESERVE_BRK_ARRAY() helper · 161b0275
      Jeremy Fitzhardinge 提交于
      This is useful when converting static arrays into boot-time brk
      allocated objects.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      LKML-Reference: <4C805EEA.1080205@goop.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      161b0275
    • L
      modules: Fix module_bug_list list corruption race · 5336377d
      Linus Torvalds 提交于
      With all the recent module loading cleanups, we've minimized the code
      that sits under module_mutex, fixing various deadlocks and making it
      possible to do most of the module loading in parallel.
      
      However, that whole conversion totally missed the rather obscure code
      that adds a new module to the list for BUG() handling.  That code was
      doubly obscure because (a) the code itself lives in lib/bugs.c (for
      dubious reasons) and (b) it gets called from the architecture-specific
      "module_finalize()" rather than from generic code.
      
      Calling it from arch-specific code makes no sense what-so-ever to begin
      with, and is now actively wrong since that code isn't protected by the
      module loading lock any more.
      
      So this commit moves the "module_bug_{finalize,cleanup}()" calls away
      from the arch-specific code, and into the generic code - and in the
      process protects it with the module_mutex so that the list operations
      are now safe.
      
      Future fixups:
       - move the module list handling code into kernel/module.c where it
         belongs.
       - get rid of 'module_bug_list' and just use the regular list of modules
         (called 'modules' - imagine that) that we already create and maintain
         for other reasons.
      Reported-and-tested-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Adrian Bunk <bunk@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5336377d
  12. 05 10月, 2010 3 次提交
  13. 02 10月, 2010 6 次提交
  14. 01 10月, 2010 4 次提交