1. 03 7月, 2009 2 次提交
    • I
      x86: atomic64: Move the 32-bit atomic64_t implementation to a .c file · b7882b7c
      Ingo Molnar 提交于
      Linus noted that the atomic64_t primitives are all inlines
      currently which is crazy because these functions have a large
      register footprint anyway.
      
      Move them to a separate file: arch/x86/lib/atomic64_32.c
      
      Also, while at it, rename all uses of 'unsigned long long' to
      the much shorter u64.
      
      This makes the appearance of the prototypes a lot nicer - and
      it also uncovered a few bugs where (yet unused) API variants
      had 'long' as their return type instead of u64.
      
      [ More intrusive changes are not yet done in this patch. ]
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b7882b7c
    • E
      x86: atomic64: The atomic64_t data type should be 8 bytes aligned on 32-bit too · bbf2a330
      Eric Dumazet 提交于
      Locked instructions on two cache lines at once are painful. If
      atomic64_t uses two cache lines, my test program is 10x slower.
      
      The chance for that is significant: 4/32 or 12.5%.
      
      Make sure an atomic64_t is 8 bytes aligned.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      LKML-Reference: <alpine.LFD.2.01.0907021653030.3210@localhost.localdomain>
      [ changed it to __aligned(8) as per Andrew's suggestion ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bbf2a330
  2. 02 7月, 2009 1 次提交
    • F
      perf_counter: Ignore the nmi call frames in the x86-64 backtraces · 0406ca6d
      Frederic Weisbecker 提交于
      About every callchains recorded with perf record are filled up
      including the internal perfcounter nmi frame:
      
       perf_callchain
       perf_counter_overflow
       intel_pmu_handle_irq
       perf_counter_nmi_handler
       notifier_call_chain
       atomic_notifier_call_chain
       notify_die
       do_nmi
       nmi
      
      We want ignore this frame as it's not interesting for
      instrumentation. To solve this, we simply ignore every frames
      from nmi context.
      
      New example of "perf report -s sym -c" after this patch:
      
      9.59%  [k] search_by_key
                   4.88%
                      search_by_key
                      reiserfs_read_locked_inode
                      reiserfs_iget
                      reiserfs_lookup
                      do_lookup
                      __link_path_walk
                      path_walk
                      do_path_lookup
                      user_path_at
                      vfs_fstatat
                      vfs_lstat
                      sys_newlstat
                      system_call_fastpath
                      __lxstat
                      0x406fb1
      
                   3.19%
                      search_by_key
                      search_by_entry_key
                      reiserfs_find_entry
                      reiserfs_lookup
                      do_lookup
                      __link_path_walk
                      path_walk
                      do_path_lookup
                      user_path_at
                      vfs_fstatat
                      vfs_lstat
                      sys_newlstat
                      system_call_fastpath
                      __lxstat
                      0x406fb1
      [...]
      
      For now this patch only solves the problem in x86-64.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1246474930-6088-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0406ca6d
  3. 01 7月, 2009 1 次提交
    • Y
      x86: only clear node_states for 64bit · 66918dcd
      Yinghai Lu 提交于
      Nathan reported that
      
      | commit 73d60b7f
      | Author: Yinghai Lu <yinghai@kernel.org>
      | Date:   Tue Jun 16 15:33:00 2009 -0700
      |
      |    page-allocator: clear N_HIGH_MEMORY map before we set it again
      |
      |    SRAT tables may contains nodes of very small size.  The arch code may
      |    decide to not activate such a node.  However, currently the early boot
      |    code sets N_HIGH_MEMORY for such nodes.  These nodes therefore seem to be
      |    active although these nodes have no present pages.
      |
      |    For 64bit N_HIGH_MEMORY == N_NORMAL_MEMORY, so that works for 64 bit too
      
      unintentionally and incorrectly clears the cpuset.mems cgroup attribute on
      an i386 kvm guest, meaning that cpuset.mems can not be used.
      
      Fix this by only clearing node_states[N_NORMAL_MEMORY] for 64bit only.
      and need to do save/restore for that in find_zone_movable_pfn
      Reported-by: NNathan Lynch <ntl@pobox.com>
      Tested-by: NNathan Lynch <ntl@pobox.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>,
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66918dcd
  4. 29 6月, 2009 1 次提交
  5. 28 6月, 2009 6 次提交
  6. 26 6月, 2009 6 次提交
    • P
      x86, delay: tsc based udelay should have rdtsc_barrier · e888d7fa
      Pallipadi, Venkatesh 提交于
      delay_tsc needs rdtsc_barrier to provide proper delay.
      
      Output from a test driver using hpet to cross check delay
      provided by udelay().
      
      Before:
      [   86.794363] Expected delay 5us actual 4679ns
      [   87.154362] Expected delay 5us actual 698ns
      [   87.514162] Expected delay 5us actual 4539ns
      [   88.653716] Expected delay 5us actual 4539ns
      [   94.664106] Expected delay 10us actual 9638ns
      [   95.049351] Expected delay 10us actual 10126ns
      [   95.416110] Expected delay 10us actual 9568ns
      [   95.799216] Expected delay 10us actual 9638ns
      [  103.624104] Expected delay 10us actual 9707ns
      [  104.020619] Expected delay 10us actual 768ns
      [  104.419951] Expected delay 10us actual 9707ns
      
      After:
      [   50.983320] Expected delay 5us actual 5587ns
      [   51.261807] Expected delay 5us actual 5587ns
      [   51.565715] Expected delay 5us actual 5657ns
      [   51.861171] Expected delay 5us actual 5587ns
      [   52.164704] Expected delay 5us actual 5726ns
      [   52.487457] Expected delay 5us actual 5657ns
      [   52.789338] Expected delay 5us actual 5726ns
      [   57.119680] Expected delay 10us actual 10755ns
      [   57.893997] Expected delay 10us actual 10615ns
      [   58.261287] Expected delay 10us actual 10755ns
      [   58.620505] Expected delay 10us actual 10825ns
      [   58.941035] Expected delay 10us actual 10755ns
      [   59.320903] Expected delay 10us actual 10615ns
      [   61.306311] Expected delay 10us actual 10755ns
      [   61.520542] Expected delay 10us actual 10615ns
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      e888d7fa
    • H
      x86, setup: correct include file in <asm/boot.h> · 658dbfeb
      H. Peter Anvin 提交于
      <asm/boot.h> needs <asm/pgtable_types.h>, not <asm/page_types.h> in
      order to resolve PMD_SHIFT.  Also, correct a +1 which really should be
      + THREAD_ORDER.
      
      This is a build error which was masked by a typoed #ifdef.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      658dbfeb
    • R
      x86, setup: Fix typo "CONFIG_x86_64" in <asm/boot.h> · 22f4319d
      Robert P. J. Day 提交于
      CONFIG_X86_64 was misspelled (wrong case), which caused the x86-64
      kernel to advertise itself as more relocatable than it really is.
      This could in theory cause boot failures once bootloaders start
      support the new relocation fields.
      Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      22f4319d
    • H
      x86, mce: percpu mcheck_timer should be pinned · 5be6066a
      Hidetoshi Seto 提交于
      If CONFIG_NO_HZ + CONFIG_SMP, timer added via add_timer() might
      be migrated on other cpu.  Use add_timer_on() instead.
      
      Avoids the following failure:
      
      Maciej Rutecki wrote:
      > > After normal boot I try:
      > >
      > > echo 1 > /sys/devices/system/machinecheck/machinecheck0/check_interval
      > >
      > > I found this in dmesg:
      > >
      > > [  141.704025] ------------[ cut here ]------------
      > > [  141.704039] WARNING: at arch/x86/kernel/cpu/mcheck/mce.c:1102
      > > mcheck_timer+0xf5/0x100()
      Reported-by: NMaciej Rutecki <maciej.rutecki@gmail.com>
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Tested-by: NMaciej Rutecki <maciej.rutecki@gmail.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      5be6066a
    • K
      x86: Add sysctl to allow panic on IOCK NMI error · 5211a242
      Kurt Garloff 提交于
      This patch introduces a new sysctl:
      
          /proc/sys/kernel/panic_on_io_nmi
      
      which defaults to 0 (off).
      
      When enabled, the kernel panics when the kernel receives an NMI
      caused by an IO error.
      
      The IO error triggered NMI indicates a serious system
      condition, which could result in IO data corruption. Rather
      than contiuing, panicing and dumping might be a better choice,
      so one can figure out what's causing the IO error.
      
      This could be especially important to companies running IO
      intensive applications where corruption must be avoided, e.g. a
      bank's databases.
      
      [ SuSE has been shipping it for a while, it was done at the
        request of a large database vendor, for their users. ]
      Signed-off-by: NKurt Garloff <garloff@suse.de>
      Signed-off-by: NRoberto Angelino <robertangelino@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      LKML-Reference: <20090624213211.GA11291@kroah.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5211a242
    • P
      perf_counter, x86: Add mmap counter read support · 194002b2
      Peter Zijlstra 提交于
      Update the mmap control page with the needed information to
      use the userspace RDPMC instruction for self monitoring.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      194002b2
  7. 25 6月, 2009 1 次提交
  8. 24 6月, 2009 4 次提交
    • C
      x86: Fix uv bau sending buffer initialization · 9c26f52b
      Cliff Wickman 提交于
      The initialization of the UV Broadcast Assist Unit's sending
      buffers was making an invalid assumption about the
      initialization of an MMR that defines its address.
      
      The BIOS will not be providing that MMR.  So
      uv_activation_descriptor_init() should unconditionally set it.
      
      Tested on UV simulator.
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      Cc: <stable@kernel.org> # for v2.6.30.x
      LKML-Reference: <E1MJTfj-0005i1-W8@eag09.americas.sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9c26f52b
    • Y
      perf_counter, x86: Set global control MSR correctly · c14dab5c
      Yong Wang 提交于
      Previous code made an assumption that the power on value of global
      control MSR has enabled all fixed and general purpose counters properly.
      
      However, this is not the case for certain Intel processors, such as
      Atom - and it might also be firmware dependent.
      
      Each enable bit in IA32_PERF_GLOBAL_CTRL is AND'ed with the
      enable bits for all privilege levels in the respective IA32_PERFEVTSELx
      or IA32_PERF_FIXED_CTR_CTRL MSRs to start/stop the counting of
      respective counters. Counting is enabled if the AND'ed results is true;
      counting is disabled when the result is false.
      
      The end result is that all fixed counters are always disabled on Atom
      processors because the assumption is just invalid.
      
      Fix this by not initializing the ctrl-mask out of the global MSR,
      but setting it to perf_counter_mask.
      Reported-by: NStephane Eranian <eranian@googlemail.com>
      Signed-off-by: NYong Wang <yong.y.wang@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <20090624021324.GA2788@ywang-moblin2.bj.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c14dab5c
    • W
      Intel-IOMMU, intr-remap: source-id checking · f007e99c
      Weidong Han 提交于
      To support domain-isolation usages, the platform hardware must be
      capable of uniquely identifying the requestor (source-id) for each
      interrupt message. Without source-id checking for interrupt remapping
      , a rouge guest/VM with assigned devices can launch interrupt attacks
      to bring down anothe guest/VM or the VMM itself.
      
      This patch adds source-id checking for interrupt remapping, and then
      really isolates interrupts for guests/VMs with assigned devices.
      
      Because PCI subsystem is not initialized yet when set up IOAPIC
      entries, use read_pci_config_byte to access PCI config space directly.
      Signed-off-by: NWeidong Han <weidong.han@intel.com>
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      f007e99c
    • H
      x86, mce: Fix mce resume on 32bit · 7262b6e4
      Hidetoshi Seto 提交于
      Calling mcheck_init() on resume is required only with
      CONFIG_X86_OLD_MCE=y.
      Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Acked-by: NAndi Kleen <andi@firstfloor.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      7262b6e4
  9. 23 6月, 2009 1 次提交
    • P
      x86: Move init_gbpages() to setup_arch() · 854c879f
      Pekka J Enberg 提交于
      The init_gbpages() function is conditionally called from
      init_memory_mapping() function. There are two call-sites where
      this 'after_bootmem' condition can be true: setup_arch() and
      mem_init() via pci_iommu_alloc().
      
      Therefore, it's safe to move the call to init_gbpages() to
      setup_arch() as it's always called before mem_init().
      
      This removes an after_bootmem use - paving the way to remove
      all uses of that state variable.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <Pine.LNX.4.64.0906221731210.19474@melkki.cs.Helsinki.FI>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      854c879f
  10. 22 6月, 2009 8 次提交
    • T
      x86: ensure percpu lpage doesn't consume too much vmalloc space · 0017c869
      Tejun Heo 提交于
      On extreme configuration (e.g. 32bit 32-way NUMA machine), lpage
      percpu first chunk allocator can consume too much of vmalloc space.
      Make it fall back to 4k allocator if the consumption goes over 20%.
      
      [ Impact: add sanity check for lpage percpu first chunk allocator ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      0017c869
    • T
      x86: implement percpu_alloc kernel parameter · fa8a7094
      Tejun Heo 提交于
      According to Andi, it isn't clear whether lpage allocator is worth the
      trouble as there are many processors where PMD TLB is far scarcer than
      PTE TLB.  The advantage or disadvantage probably depends on the actual
      size of percpu area and specific processor.  As performance
      degradation due to TLB pressure tends to be highly workload specific
      and subtle, it is difficult to decide which way to go without more
      data.
      
      This patch implements percpu_alloc kernel parameter to allow selecting
      which first chunk allocator to use to ease debugging and testing.
      
      While at it, make sure all the failure paths report why something
      failed to help determining why certain allocator isn't working.  Also,
      kill the "Great future plan" comment which had already been realized
      quite some time ago.
      
      [ Impact: allow explicit percpu first chunk allocator selection ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      fa8a7094
    • T
      x86: fix pageattr handling for lpage percpu allocator and re-enable it · e59a1bb2
      Tejun Heo 提交于
      lpage allocator aliases a PMD page for each cpu and returns whatever
      is unused to the page allocator.  When the pageattr of the recycled
      pages are changed, this makes the two aliases point to the overlapping
      regions with different attributes which isn't allowed and known to
      cause subtle data corruption in certain cases.
      
      This can be handled in simliar manner to the x86_64 highmap alias.
      pageattr code should detect if the target pages have PMD alias and
      split the PMD alias and synchronize the attributes.
      
      pcpur allocator is updated to keep the allocated PMD pages map sorted
      in ascending address order and provide pcpu_lpage_remapped() function
      which binary searches the array to determine whether the given address
      is aliased and if so to which address.  pageattr is updated to use
      pcpu_lpage_remapped() to detect the PMD alias and split it up as
      necessary from cpa_process_alias().
      
      Jan Beulich spotted the original problem and incorrect usage of vaddr
      instead of laddr for lookup.
      
      With this, lpage percpu allocator should work correctly.  Re-enable
      it.
      
      [ Impact: fix subtle lpage pageattr bug and re-enable lpage ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      e59a1bb2
    • T
      x86: reorganize cpa_process_alias() · 992f4c1c
      Tejun Heo 提交于
      Reorganize cpa_process_alias() so that new alias condition can be
      added easily.
      
      Jan Beulich spotted problem in the original cleanup thread which
      incorrectly assumed the two existing conditions were mutially
      exclusive.
      
      [ Impact: code reorganization ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      992f4c1c
    • T
      x86: prepare setup_pcpu_lpage() for pageattr fix · 0ff2587f
      Tejun Heo 提交于
      Make the following changes in preparation of coming pageattr updates.
      
      * Define and use array of struct pcpul_ent instead of array of
        pointers.  The only difference is ->cpu field which is set but
        unused yet.
      
      * Rename variables according to the above change.
      
      * Rename local variable vm to pcpul_vm and move it out of the
        function.
      
      [ Impact: no functional difference ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      0ff2587f
    • T
      x86: rename remap percpu first chunk allocator to lpage · 97c9bf06
      Tejun Heo 提交于
      The "remap" allocator remaps large pages to build the first chunk;
      however, the name isn't very good because 4k allocator remaps too and
      the whole point of the remap allocator is using large page mapping.
      The allocator will be generalized and exported outside of x86, rename
      it to lpage before that happens.
      
      percpu_alloc kernel parameter is updated to accept both "remap" and
      "lpage" for lpage allocator.
      
      [ Impact: code cleanup, kernel parameter argument updated ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      97c9bf06
    • T
      x86: fix duplicate free in setup_pcpu_remap() failure path · c5806df9
      Tejun Heo 提交于
      In the failure path, setup_pcpu_remap() tries to free the area which
      has already been freed to make holes in the large page.  Fix it.
      
      [ Impact: fix duplicate free in failure path ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      c5806df9
    • L
      Move FAULT_FLAG_xyz into handle_mm_fault() callers · d06063cc
      Linus Torvalds 提交于
      This allows the callers to now pass down the full set of FAULT_FLAG_xyz
      flags to handle_mm_fault().  All callers have been (mechanically)
      converted to the new calling convention, there's almost certainly room
      for architectures to clean up their code and then add FAULT_FLAG_RETRY
      when that support is added.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d06063cc
  11. 21 6月, 2009 4 次提交
    • J
      perf_counter, x8: Fix L1-data-Cache-Store-Referencees for AMD · d9f2a5ec
      Jaswinder Singh Rajput 提交于
      Fix AMD's Data Cache Refills from System event.
      
      After this patch :
      
       ./tools/perf/perf stat -e l1d -e l1d-misses -e l1d-write -e l1d-prefetch -e l1d-prefetch-miss -e l1i -e l1i-misses -e l1i-prefetch -e l2 -e l2-misses -e l2-write -e dtlb -e dtlb-misses -e itlb -e itlb-misses -e bpu -e bpu-misses ls /dev/ > /dev/null
      
       Performance counter stats for 'ls /dev/':
      
              2499484  L1-data-Cache-Load-Referencees             (scaled from 3.97%)
                70347  L1-data-Cache-Load-Misses                  (scaled from 7.30%)
                 9360  L1-data-Cache-Store-Referencees            (scaled from 8.64%)
                32804  L1-data-Cache-Prefetch-Referencees         (scaled from 17.72%)
                 7693  L1-data-Cache-Prefetch-Misses              (scaled from 22.97%)
              2180945  L1-instruction-Cache-Load-Referencees      (scaled from 28.48%)
                14518  L1-instruction-Cache-Load-Misses           (scaled from 35.00%)
                 2405  L1-instruction-Cache-Prefetch-Referencees  (scaled from 34.89%)
                71387  L2-Cache-Load-Referencees                  (scaled from 34.94%)
                18732  L2-Cache-Load-Misses                       (scaled from 34.92%)
                79918  L2-Cache-Store-Referencees                 (scaled from 36.02%)
              1295294  Data-TLB-Cache-Load-Referencees            (scaled from 35.99%)
                30896  Data-TLB-Cache-Load-Misses                 (scaled from 33.36%)
              1222030  Instruction-TLB-Cache-Load-Referencees     (scaled from 29.46%)
                  357  Instruction-TLB-Cache-Load-Misses          (scaled from 20.46%)
               530888  Branch-Cache-Load-Referencees              (scaled from 11.48%)
                 8638  Branch-Cache-Load-Misses                   (scaled from 5.09%)
      
          0.011295149  seconds time elapsed.
      
      Earlier it always shows value 0.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      LKML-Reference: <1245484165.3102.6.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9f2a5ec
    • A
      x86: Set cpu_llc_id on AMD CPUs · 99bd0c0f
      Andreas Herrmann 提交于
      This counts when building sched domains in case NUMA information
      is not available.
      
      ( See cpu_coregroup_mask() which uses llc_shared_map which in turn is
        created based on cpu_llc_id. )
      
      Currently Linux builds domains as follows:
      (example from a dual socket quad-core system)
      
       CPU0 attaching sched-domain:
        domain 0: span 0-7 level CPU
         groups: 0 1 2 3 4 5 6 7
      
        ...
      
       CPU7 attaching sched-domain:
        domain 0: span 0-7 level CPU
         groups: 7 0 1 2 3 4 5 6
      
      Ever since that is borked for multi-core AMD CPU systems.
      This patch fixes that and now we get a proper:
      
       CPU0 attaching sched-domain:
        domain 0: span 0-3 level MC
         groups: 0 1 2 3
         domain 1: span 0-7 level CPU
          groups: 0-3 4-7
      
        ...
      
       CPU7 attaching sched-domain:
        domain 0: span 4-7 level MC
         groups: 7 4 5 6
         domain 1: span 0-7 level CPU
          groups: 4-7 0-3
      
      This allows scheduler to assign tasks to cores on different sockets
      (i.e. that don't share last level cache) for performance reasons.
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      LKML-Reference: <20090619085909.GJ5218@alberich.amd.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      99bd0c0f
    • L
      x86, 64-bit: Clean up user address masking · 9063c61f
      Linus Torvalds 提交于
      The discussion about using "access_ok()" in get_user_pages_fast() (see
      commit 7f818906: "x86: don't use
      'access_ok()' as a range check in get_user_pages_fast()" for details and
      end result), made us notice that x86-64 was really being very sloppy
      about virtual address checking.
      
      So be way more careful and straightforward about masking x86-64 virtual
      addresses:
      
       - All the VIRTUAL_MASK* variants now cover half of the address
         space, it's not like we can use the full mask on a signed
         integer, and the larger mask just invites mistakes when
         applying it to either half of the 48-bit address space.
      
       - /proc/kcore's kc_offset_to_vaddr() becomes a lot more
         obvious when it transforms a file offset into a
         (kernel-half) virtual address.
      
       - Unify/simplify the 32-bit and 64-bit USER_DS definition to
         be based on TASK_SIZE_MAX.
      
      This cleanup and more careful/obvious user virtual address checking also
      uncovered a buglet in the x86-64 implementation of strnlen_user(): it
      would do an "access_ok()" check on the whole potential area, even if the
      string itself was much shorter, and thus return an error even for valid
      strings. Our sloppy checking had hidden this.
      
      So this fixes 'strnlen_user()' to do this properly, the same way we
      already handled user strings in 'strncpy_from_user()'.  Namely by just
      checking the first byte, and then relying on fault handling for the
      rest.  That always works, since we impose a guard page that cannot be
      mapped at the end of the user space address space (and even if we
      didn't, we'd have the address space hole).
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9063c61f
    • L
      x86: don't use 'access_ok()' as a range check in get_user_pages_fast() · 7f818906
      Linus Torvalds 提交于
      It's really not right to use 'access_ok()', since that is meant for the
      normal "get_user()" and "copy_from/to_user()" accesses, which are done
      through the TLB, rather than through the page tables.
      
      Why? access_ok() does both too few, and too many checks.  Too many,
      because it is meant for regular kernel accesses that will not honor the
      'user' bit in the page tables, and because it honors the USER_DS vs
      KERNEL_DS distinction that we shouldn't care about in GUP.  And too few,
      because it doesn't do the 'canonical' check on the address on x86-64,
      since the TLB will do that for us.
      
      So instead of using a function that isn't meant for this, and does
      something else and much more complicated, just do the real rules: we
      don't want the range to overflow, and on x86-64, we want it to be a
      canonical low address (on 32-bit, all addresses are canonical).
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f818906
  12. 20 6月, 2009 1 次提交
  13. 19 6月, 2009 4 次提交
    • I
      perf_counter, x86: Improve interactions with fast-gup · 0c871971
      Ingo Molnar 提交于
      Improve a few details in perfcounter call-chain recording that
      makes use of fast-GUP:
      
      - Use ACCESS_ONCE() to observe the pte value. ptes are fundamentally
        racy and can be changed on another CPU, so we have to be careful
        about how we access them. The PAE branch is already careful with
        read-barriers - but the non-PAE and 64-bit side needs an
        ACCESS_ONCE() to make sure the pte value is observed only once.
      
      - make the checks a bit stricter so that we can feed it any kind of
        cra^H^H^H user-space input ;-)
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0c871971
    • P
      perf_counter: Make callchain samples extensible · f9188e02
      Peter Zijlstra 提交于
      Before exposing upstream tools to a callchain-samples ABI, tidy it
      up to make it more extensible in the future:
      
      Use markers in the IP chain to denote context, use (u64)-1..-4095 range
      for these context markers because we use them for ERR_PTR(), so these
      addresses are unlikely to be mapped.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f9188e02
    • S
      function-graph: add stack frame test · 71e308a2
      Steven Rostedt 提交于
      In case gcc does something funny with the stack frames, or the return
      from function code, we would like to detect that.
      
      An arch may implement passing of a variable that is unique to the
      function and can be saved on entering a function and can be tested
      when exiting the function. Usually the frame pointer can be used for
      this purpose.
      
      This patch also implements this for x86. Where it passes in the stack
      frame of the parent function, and will test that frame on exit.
      
      There was a case in x86_32 with optimize for size (-Os) where, for a
      few functions, gcc would align the stack frame and place a copy of the
      return address into it. The function graph tracer modified the copy and
      not the actual return address. On return from the funtion, it did not go
      to the tracer hook, but returned to the parent. This broke the function
      graph tracer, because the return of the parent (where gcc did not do
      this funky manipulation) returned to the location that the child function
      was suppose to. This caused strange kernel crashes.
      
      This test detected the problem and pointed out where the issue was.
      
      This modifies the parameters of one of the functions that the arch
      specific code calls, so it includes changes to arch code to accommodate
      the new prototype.
      
      Note, I notice that the parsic arch implements its own push_return_trace.
      This is now a generic function and the ftrace_push_return_trace should be
      used instead. This patch does not touch that code.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      71e308a2
    • F
      dma-mapping: x86: use asm-generic/dma-mapping-common.h · 7c095e46
      FUJITA Tomonori 提交于
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Acked-by: NJoerg Roedel <joerg.roedel@amd.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c095e46