1. 21 9月, 2009 2 次提交
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
    • P
      perf_counter, powerpc, sparc: Fix compilation after perf_counter_overflow() change · cd74c86b
      Paul Mackerras 提交于
      Commit 5622f295 ("x86, perf_counter, bts: Optimize BTS overflow
      handling") removed the regs field from struct perf_sample_data and
      added a regs parameter to perf_counter_overflow().  This breaks the
      build on powerpc (and Sparc) as reported by Sachin Sant:
      
        arch/powerpc/kernel/perf_counter.c: In function 'record_and_restart':
        arch/powerpc/kernel/perf_counter.c:1165: error: unknown field 'regs' specified in initializer
      
      This adjusts arch/powerpc/kernel/perf_counter.c to correspond with the
      new struct perf_sample_data and perf_counter_overflow().
      
      [ v2: also fix Sparc, Markus Metzger <markus.t.metzger@intel.com> ]
      Reported-by: NSachin Sant <sachinp@in.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Markus Metzger <markus.t.metzger@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: benh@kernel.crashing.org
      Cc: linuxppc-dev@ozlabs.org
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <19127.8400.376239.586120@drongo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cd74c86b
  2. 19 9月, 2009 1 次提交
  3. 16 9月, 2009 3 次提交
    • P
      sched: Disable wakeup balancing · 182a85f8
      Peter Zijlstra 提交于
      Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't
      really mind too much, SD_BALANCE_NEWIDLE picks up most of the
      slack.
      
      On a dual socket, quad core, dual thread nehalem system:
      
      sysbench (--num_threads=16):
      
       SD_BALANCE_WAKE-: 13982 tx/s
       SD_BALANCE_WAKE+: 15688 tx/s
      
      kbuild (-j16):
      
       SD_BALANCE_WAKE-: 47.648295846  seconds time elapsed   ( +-   0.312% )
       SD_BALANCE_WAKE+: 47.608607360  seconds time elapsed   ( +-   0.026% )
      
      (same within noise)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      182a85f8
    • D
      sparc: Update defconfigs. · 0a375d75
      David S. Miller 提交于
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a375d75
    • D
      sparc: Kill PROM console driver. · 09d3f3f0
      David S. Miller 提交于
      Many years ago when this driver was written, it had a use, but these
      days it's nothing but trouble and distributions should not enable it
      in any situation.
      
      Pretty much every console device a sparc machine could see has a
      bonafide real driver, making the PROM console hack unnecessary.
      
      If any new device shows up, we should write a driver instead of
      depending upon this crutch to save us.  We've been able to take care
      of this even when no chip documentation exists (sunxvr500, sunxvr2500)
      so there are no excuses.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09d3f3f0
  4. 15 9月, 2009 3 次提交
    • P
      sched: Reduce forkexec_idx · b8a543ea
      Peter Zijlstra 提交于
      If we're looking to place a new task, we might as well find the
      idlest position _now_, not 1 tick ago.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b8a543ea
    • P
      sched: Tweak wake_idx · 78e7ed53
      Peter Zijlstra 提交于
      When merging select_task_rq_fair() and sched_balance_self() we lost
      the use of wake_idx, restore that and set them to 0 to make wake
      balancing more aggressive.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      78e7ed53
    • P
      sched: Merge select_task_rq_fair() and sched_balance_self() · c88d5910
      Peter Zijlstra 提交于
      The problem with wake_idle() is that is doesn't respect things like
      cpu_power, which means it doesn't deal well with SMT nor the recent
      RT interaction.
      
      To cure this, it needs to do what sched_balance_self() does, which
      leads to the possibility of merging select_task_rq_fair() and
      sched_balance_self().
      
      Modify sched_balance_self() to:
      
        - update_shares() when walking up the domain tree,
          (it only called it for the top domain, but it should
           have done this anyway), which allows us to remove
          this ugly bit from try_to_wake_up().
      
        - do wake_affine() on the smallest domain that contains
          both this (the waking) and the prev (the wakee) cpu for
          WAKE invocations.
      
      Then use the top-down balance steps it had to replace wake_idle().
      
      This leads to the dissapearance of SD_WAKE_BALANCE and
      SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE.
      
      SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective.
      
      Touch all topology bits to replace the old with new SD flags --
      platforms might need re-tuning, enabling SD_BALANCE_WAKE
      conditionally on a NUMA distance seems like a good additional
      feature, magny-core and small nehalem systems would want this
      enabled, systems with slow interconnects would not.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c88d5910
  5. 10 9月, 2009 9 次提交
  6. 09 9月, 2009 3 次提交
  7. 04 9月, 2009 2 次提交
  8. 03 9月, 2009 1 次提交
    • D
      sparc64: Kill spurious NMI watchdog triggers by increasing limit to 30 seconds. · e6617c6e
      David S. Miller 提交于
      This is a compromise and a temporary workaround for bootup NMI
      watchdog triggers some people see with qla2xxx devices present.
      
      This happens when, for example:
      
      CPU 0 is in the driver init and looping submitting mailbox commands to
      load the firmware, then waiting for completion.
      
      CPU 1 is receiving the device interrupts.  CPU 1 is where the NMI
      watchdog triggers.
      
      CPU 0 is submitting mailbox commands fast enough that by the time CPU
      1 returns from the device interrupt handler, a new one is pending.
      This sequence runs for more than 5 seconds.
      
      The problematic case is CPU 1's timer interrupt running when the
      barrage of device interrupts begin.  Then we have:
      
      	timer interrupt
      	return for softirq checking
      	pending, thus enable interrupts
      
      		 qla2xxx interrupt
      		 return
      		 qla2xxx interrupt
      		 return
      		 ... 5+ seconds pass
      		 final qla2xxx interrupt for fw load
      		 return
      
      	run timer softirq
      	return
      
      At some point in the multi-second qla2xxx interrupt storm we trigger
      the NMI watchdog on CPU 1 from the NMI interrupt handler.
      
      The timer softirq, once we get back to running it, is smart enough to
      run the timer work enough times to make up for the missed timer
      interrupts.
      
      However, the NMI watchdogs (both x86 and sparc) use the timer
      interrupt count to notice the cpu is wedged.  But in the above
      scenerio we'll receive only one such timer interrupt even if we last
      all the way back to running the timer softirq.
      
      The default watchdog trigger point is only 5 seconds, which is pretty
      low (the softwatchdog triggers at 60 seconds).  So increase it to 30
      seconds for now.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6617c6e
  9. 02 9月, 2009 2 次提交
    • D
      KEYS: Add a keyctl to install a process's session keyring on its parent [try #6] · ee18d64c
      David Howells 提交于
      Add a keyctl to install a process's session keyring onto its parent.  This
      replaces the parent's session keyring.  Because the COW credential code does
      not permit one process to change another process's credentials directly, the
      change is deferred until userspace next starts executing again.  Normally this
      will be after a wait*() syscall.
      
      To support this, three new security hooks have been provided:
      cred_alloc_blank() to allocate unset security creds, cred_transfer() to fill in
      the blank security creds and key_session_to_parent() - which asks the LSM if
      the process may replace its parent's session keyring.
      
      The replacement may only happen if the process has the same ownership details
      as its parent, and the process has LINK permission on the session keyring, and
      the session keyring is owned by the process, and the LSM permits it.
      
      Note that this requires alteration to each architecture's notify_resume path.
      This has been done for all arches barring blackfin, m68k* and xtensa, all of
      which need assembly alteration to support TIF_NOTIFY_RESUME.  This allows the
      replacement to be performed at the point the parent process resumes userspace
      execution.
      
      This allows the userspace AFS pioctl emulation to fully emulate newpag() and
      the VIOCSETTOK and VIOCSETTOK2 pioctls, all of which require the ability to
      alter the parent process's PAG membership.  However, since kAFS doesn't use
      PAGs per se, but rather dumps the keys into the session keyring, the session
      keyring of the parent must be replaced if, for example, VIOCSETTOK is passed
      the newpag flag.
      
      This can be tested with the following program:
      
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <keyutils.h>
      
      	#define KEYCTL_SESSION_TO_PARENT	18
      
      	#define OSERROR(X, S) do { if ((long)(X) == -1) { perror(S); exit(1); } } while(0)
      
      	int main(int argc, char **argv)
      	{
      		key_serial_t keyring, key;
      		long ret;
      
      		keyring = keyctl_join_session_keyring(argv[1]);
      		OSERROR(keyring, "keyctl_join_session_keyring");
      
      		key = add_key("user", "a", "b", 1, keyring);
      		OSERROR(key, "add_key");
      
      		ret = keyctl(KEYCTL_SESSION_TO_PARENT);
      		OSERROR(ret, "KEYCTL_SESSION_TO_PARENT");
      
      		return 0;
      	}
      
      Compiled and linked with -lkeyutils, you should see something like:
      
      	[dhowells@andromeda ~]$ keyctl show
      	Session Keyring
      	       -3 --alswrv   4043  4043  keyring: _ses
      	355907932 --alswrv   4043    -1   \_ keyring: _uid.4043
      	[dhowells@andromeda ~]$ /tmp/newpag
      	[dhowells@andromeda ~]$ keyctl show
      	Session Keyring
      	       -3 --alswrv   4043  4043  keyring: _ses
      	1055658746 --alswrv   4043  4043   \_ user: a
      	[dhowells@andromeda ~]$ /tmp/newpag hello
      	[dhowells@andromeda ~]$ keyctl show
      	Session Keyring
      	       -3 --alswrv   4043  4043  keyring: hello
      	340417692 --alswrv   4043  4043   \_ user: a
      
      Where the test program creates a new session keyring, sticks a user key named
      'a' into it and then installs it on its parent.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      ee18d64c
    • A
  10. 01 9月, 2009 1 次提交
    • H
      locking, sparc: Rename __spin_try_lock() and friends · 9f34ceb6
      Heiko Carstens 提交于
      Needed to avoid namespace conflicts when the common code
      function bodies of _spin_try_lock() etc. are moved to a header
      file where the function name would be __spin_try_lock().
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Horst Hartmann <horsth@linux.vnet.ibm.com>
      Cc: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <20090831124416.306495811@de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9f34ceb6
  11. 26 8月, 2009 1 次提交
    • D
      sparc64: Validate linear D-TLB misses. · d8ed1d43
      David S. Miller 提交于
      When page alloc debugging is not enabled, we essentially accept any
      virtual address for linear kernel TLB misses.  But with kgdb, kernel
      address probing, and other facilities we can try to access arbitrary
      crap.
      
      So, make sure the address we miss on will translate to physical memory
      that actually exists.
      
      In order to make this work we have to embed the valid address bitmap
      into the kernel image.  And in order to make that less expensive we
      make an adjustment, in that the max physical memory address is
      decreased to "1 << 41", even on the chips that support a 42-bit
      physical address space.  We can do this because bit 41 indicates
      "I/O space" and thus covers non-memory ranges.
      
      The result of this is that:
      
      1) kpte_linear_bitmap shrinks from 2K to 1K in size
      
      2) we need 64K more for the valid address bitmap
      
      We can't let the valid address bitmap be dynamically allocated
      once we start using it to validate TLB misses, otherwise we have
      crazy issues to deal with wrt. recursive TLB misses and such.
      
      If we're in a TLB miss it could be the deepest trap level that's legal
      inside of the cpu.  So if we TLB miss referencing the bitmap, the cpu
      will be out of trap levels and enter RED state.
      
      To guard against out-of-range accesses to the bitmap, we have to check
      to make sure no bits in the physical address above bit 40 are set.  We
      could export and use last_valid_pfn for this check, but that's just an
      unnecessary extra memory reference.
      
      On the plus side of all this, since we load all of these translations
      into the special 4MB mapping TSB, and we check the TSB first for TLB
      misses, there should be absolutely no real cost for these new checks
      in the TLB miss path.
      
      Reported-by: heyongli@gmail.com
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8ed1d43
  12. 19 8月, 2009 4 次提交
  13. 18 8月, 2009 5 次提交
  14. 17 8月, 2009 3 次提交