1. 07 12月, 2011 2 次提交
  2. 06 12月, 2011 3 次提交
    • P
      perf, x86: Prefer fixed-purpose counters when scheduling · 4defea85
      Peter Zijlstra 提交于
      This avoids a scheduling failure for cases like:
      
        cycles, cycles, instructions, instructions (on Core2)
      
      Which would end up being programmed like:
      
        PMC0, PMC1, FP-instructions, fail
      
      Because all events will have the same weight.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/n/tip-8tnwb92asqj7xajqqoty4gel@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      4defea85
    • R
      perf, x86: Fix event scheduler for constraints with overlapping counters · bc1738f6
      Robert Richter 提交于
      The current x86 event scheduler fails to resolve scheduling problems
      of certain combinations of events and constraints. This happens if the
      counter mask of such an event is not a subset of any other counter
      mask of a constraint with an equal or higher weight, e.g. constraints
      of the AMD family 15h pmu:
      
                              counter mask    weight
      
       amd_f15_PMC30          0x09            2  <--- overlapping counters
       amd_f15_PMC20          0x07            3
       amd_f15_PMC53          0x38            3
      
      The scheduler does not find then an existing solution. Here is an
      example:
      
       event code     counter         failure         possible solution
      
       0x02E          PMC[3,0]        0               3
       0x043          PMC[2:0]        1               0
       0x045          PMC[2:0]        2               1
       0x046          PMC[2:0]        FAIL            2
      
      The event scheduler may not select the correct counter in the first
      cycle because it needs to know which subsequent events will be
      scheduled. It may fail to schedule the events then.
      
      To solve this, we now save the scheduler state of events with
      overlapping counter counstraints.  If we fail to schedule the events
      we rollback to those states and try to use another free counter.
      
      Constraints with overlapping counters are marked with a new introduced
      overlap flag. We set the overlap flag for such constraints to give the
      scheduler a hint which events to select for counter rescheduling. The
      EVENT_CONSTRAINT_OVERLAP() macro can be used for this.
      
      Care must be taken as the rescheduling algorithm is O(n!) which will
      increase scheduling cycles for an over-commited system dramatically.
      The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter
      masks must be kept at a minimum. Thus, the current stack is limited to
      2 states to limit the number of loops the algorithm takes in the worst
      case.
      
      On systems with no overlapping-counter constraints, this
      implementation does not increase the loop count compared to the
      previous algorithm.
      
      V2:
      * Renamed redo -> overlap.
      * Reimplementation using perf scheduling helper functions.
      
      V3:
      * Added WARN_ON_ONCE() if out of save states.
      * Changed function interface of perf_sched_restore_state() to use bool
        as return value.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      bc1738f6
    • R
      perf, x86: Implement event scheduler helper functions · 1e2ad28f
      Robert Richter 提交于
      This patch introduces x86 perf scheduler code helper functions. We
      need this to later add more complex functionality to support
      overlapping counter constraints (next patch).
      
      The algorithm is modified so that the range of weight values is now
      generated from the constraints. There shouldn't be other functional
      changes.
      
      With the helper functions the scheduler is controlled. There are
      functions to initialize, traverse the event list, find unused counters
      etc. The scheduler keeps its own state.
      
      V3:
      * Added macro for_each_set_bit_cont().
      * Changed functions interfaces of perf_sched_find_counter() and
        perf_sched_next_event() to use bool as return value.
      * Added some comments to make code better understandable.
      
      V4:
      * Fix broken event assignment if weight of the first event is not
        wmin (perf_sched_init()).
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1321616122-1533-2-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      1e2ad28f
  3. 05 12月, 2011 8 次提交
  4. 14 11月, 2011 3 次提交
  5. 12 11月, 2011 3 次提交
  6. 10 11月, 2011 1 次提交
  7. 07 11月, 2011 1 次提交
  8. 04 11月, 2011 3 次提交
    • R
      oprofile, x86: Reimplement nmi timer mode using perf event · dcfce4a0
      Robert Richter 提交于
      The legacy x86 nmi watchdog code was removed with the implementation
      of the perf based nmi watchdog. This broke Oprofile's nmi timer
      mode. To run nmi timer mode we relied on a continuous ticking nmi
      source which the nmi watchdog provided. The nmi tick was no longer
      available and current watchdog can not be used anymore since it runs
      with very long periods in the range of seconds. This patch
      reimplements the nmi timer mode using a perf counter nmi source.
      
      V2:
      * removing pr_info()
      * fix undefined reference to `__udivdi3' for 32 bit build
      * fix section mismatch of .cpuinit.data:nmi_timer_cpu_nb
      * removed nmi timer setup in arch/x86
      * implemented function stubs for op_nmi_init/exit()
      * made code more readable in oprofile_init()
      
      V3:
      * fix architectural initialization in oprofile_init()
      * fix CONFIG_OPROFILE_NMI_TIMER dependencies
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      dcfce4a0
    • R
      oprofile, x86: Add kernel parameter oprofile.cpu_type=timer · 159a80b2
      Robert Richter 提交于
      We need this to better test x86 NMI timer mode. Otherwise it is very
      hard to setup systems with NMI timer enabled, but we have this e.g. in
      virtual machine environments.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      159a80b2
    • R
      oprofile, x86: Fix crash when unloading module (nmi timer mode) · 97f7f818
      Robert Richter 提交于
      If oprofile uses the nmi timer interrupt there is a crash while
      unloading the module. The bug can be triggered with oprofile build as
      module and kernel parameter nolapic set. This patch fixes this.
      
      oprofile: using NMI timer interrupt.
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      IP: [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
      PGD 42dbca067 PUD 41da6a067 PMD 0
      Oops: 0002 [#1] PREEMPT SMP
      CPU 5
      Modules linked in: oprofile(-) [last unloaded: oprofile]
      
      Pid: 2518, comm: modprobe Not tainted 3.1.0-rc7-00019-gb2fb49d #19 Advanced Micro Device Anaheim/Anaheim
      RIP: 0010:[<ffffffff8123c226>]  [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
      RSP: 0018:ffff88041ef71e98  EFLAGS: 00010296
      RAX: 0000000000000000 RBX: ffffffffa0017100 RCX: dead000000200200
      RDX: 0000000000000000 RSI: dead000000100100 RDI: ffffffff8178c620
      RBP: ffff88041ef71ea8 R08: 0000000000000001 R09: 0000000000000082
      R10: 0000000000000000 R11: ffff88041ef71de8 R12: 0000000000000080
      R13: fffffffffffffff5 R14: 0000000000000001 R15: 0000000000610210
      FS:  00007fc902f20700(0000) GS:ffff88042fd40000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000008 CR3: 000000041cdb6000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process modprobe (pid: 2518, threadinfo ffff88041ef70000, task ffff88041d348040)
      Stack:
       ffff88041ef71eb8 ffffffffa0017790 ffff88041ef71eb8 ffffffffa0013532
       ffff88041ef71ec8 ffffffffa00132d6 ffff88041ef71ed8 ffffffffa00159b2
       ffff88041ef71f78 ffffffff81073115 656c69666f72706f 0000000000610200
      Call Trace:
       [<ffffffffa0013532>] op_nmi_exit+0x15/0x17 [oprofile]
       [<ffffffffa00132d6>] oprofile_arch_exit+0xe/0x10 [oprofile]
       [<ffffffffa00159b2>] oprofile_exit+0x1e/0x20 [oprofile]
       [<ffffffff81073115>] sys_delete_module+0x1c3/0x22f
       [<ffffffff811bf09e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
       [<ffffffff8148070b>] system_call_fastpath+0x16/0x1b
      Code: 20 c6 78 81 e8 c5 cc 23 00 48 8b 13 48 8b 43 08 48 be 00 01 10 00 00 00 ad de 48 b9 00 02 20 00 00 00 ad de 48 c7 c7 20 c6 78 81
       89 42 08 48 89 10 48 89 33 48 89 4b 08 e8 a6 c0 23 00 5a 5b
      RIP  [<ffffffff8123c226>] unregister_syscore_ops+0x41/0x58
       RSP <ffff88041ef71e98>
      CR2: 0000000000000008
      ---[ end trace 43a541a52956b7b0 ]---
      
      CC: stable@kernel.org # 2.6.37+
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      97f7f818
  9. 03 11月, 2011 2 次提交
    • A
      thp: share get_huge_page_tail() · b35a35b5
      Andrea Arcangeli 提交于
      This avoids duplicating the function in every arch gup_fast.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b35a35b5
    • A
      mm: thp: tail page refcounting fix · 70b50f94
      Andrea Arcangeli 提交于
      Michel while working on the working set estimation code, noticed that
      calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
      wasn't safe, if the pfn ended up being a tail page of a transparent
      hugepage under splitting by __split_huge_page_refcount().
      
      He then found the problem could also theoretically materialize with
      page_cache_get_speculative() during the speculative radix tree lookups
      that uses get_page_unless_zero() in SMP if the radix tree page is freed
      and reallocated and get_user_pages is called on it before
      page_cache_get_speculative has a chance to call get_page_unless_zero().
      
      So the best way to fix the problem is to keep page_tail->_count zero at
      all times.  This will guarantee that get_page_unless_zero() can never
      succeed on any tail page.  page_tail->_mapcount is guaranteed zero and
      is unused for all tail pages of a compound page, so we can simply
      account the tail page references there and transfer them to
      tail_page->_count in __split_huge_page_refcount() (in addition to the
      head_page->_mapcount).
      
      While debugging this s/_count/_mapcount/ change I also noticed get_page is
      called by direct-io.c on pages returned by get_user_pages.  That wasn't
      entirely safe because the two atomic_inc in get_page weren't atomic.  As
      opposed to other get_user_page users like secondary-MMU page fault to
      establish the shadow pagetables would never call any superflous get_page
      after get_user_page returns.  It's safer to make get_page universally safe
      for tail pages and to use get_page_foll() within follow_page (inside
      get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
      pages without taking any locks because it is run within PT lock protected
      critical sections (PT lock for pte and page_table_lock for
      pmd_trans_huge).
      
      The standard get_page() as invoked by direct-io instead will now take
      the compound_lock but still only for tail pages.  The direct-io paths
      are usually I/O bound and the compound_lock is per THP so very
      finegrined, so there's no risk of scalability issues with it.  A simple
      direct-io benchmarks with all lockdep prove locking and spinlock
      debugging infrastructure enabled shows identical performance and no
      overhead.  So it's worth it.  Ideally direct-io should stop calling
      get_page() on pages returned by get_user_pages().  The spinlock in
      get_page() is already optimized away for no-THP builds but doing
      get_page() on tail pages returned by GUP is generally a rare operation
      and usually only run in I/O paths.
      
      This new refcounting on page_tail->_mapcount in addition to avoiding new
      RCU critical sections will also allow the working set estimation code to
      work without any further complexity associated to the tail page
      refcounting with THP.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: <stable@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70b50f94
  10. 02 11月, 2011 14 次提交