1. 07 12月, 2011 2 次提交
  2. 06 12月, 2011 3 次提交
    • P
      perf, x86: Prefer fixed-purpose counters when scheduling · 4defea85
      Peter Zijlstra 提交于
      This avoids a scheduling failure for cases like:
      
        cycles, cycles, instructions, instructions (on Core2)
      
      Which would end up being programmed like:
      
        PMC0, PMC1, FP-instructions, fail
      
      Because all events will have the same weight.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/n/tip-8tnwb92asqj7xajqqoty4gel@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      4defea85
    • R
      perf, x86: Fix event scheduler for constraints with overlapping counters · bc1738f6
      Robert Richter 提交于
      The current x86 event scheduler fails to resolve scheduling problems
      of certain combinations of events and constraints. This happens if the
      counter mask of such an event is not a subset of any other counter
      mask of a constraint with an equal or higher weight, e.g. constraints
      of the AMD family 15h pmu:
      
                              counter mask    weight
      
       amd_f15_PMC30          0x09            2  <--- overlapping counters
       amd_f15_PMC20          0x07            3
       amd_f15_PMC53          0x38            3
      
      The scheduler does not find then an existing solution. Here is an
      example:
      
       event code     counter         failure         possible solution
      
       0x02E          PMC[3,0]        0               3
       0x043          PMC[2:0]        1               0
       0x045          PMC[2:0]        2               1
       0x046          PMC[2:0]        FAIL            2
      
      The event scheduler may not select the correct counter in the first
      cycle because it needs to know which subsequent events will be
      scheduled. It may fail to schedule the events then.
      
      To solve this, we now save the scheduler state of events with
      overlapping counter counstraints.  If we fail to schedule the events
      we rollback to those states and try to use another free counter.
      
      Constraints with overlapping counters are marked with a new introduced
      overlap flag. We set the overlap flag for such constraints to give the
      scheduler a hint which events to select for counter rescheduling. The
      EVENT_CONSTRAINT_OVERLAP() macro can be used for this.
      
      Care must be taken as the rescheduling algorithm is O(n!) which will
      increase scheduling cycles for an over-commited system dramatically.
      The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter
      masks must be kept at a minimum. Thus, the current stack is limited to
      2 states to limit the number of loops the algorithm takes in the worst
      case.
      
      On systems with no overlapping-counter constraints, this
      implementation does not increase the loop count compared to the
      previous algorithm.
      
      V2:
      * Renamed redo -> overlap.
      * Reimplementation using perf scheduling helper functions.
      
      V3:
      * Added WARN_ON_ONCE() if out of save states.
      * Changed function interface of perf_sched_restore_state() to use bool
        as return value.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      bc1738f6
    • R
      perf, x86: Implement event scheduler helper functions · 1e2ad28f
      Robert Richter 提交于
      This patch introduces x86 perf scheduler code helper functions. We
      need this to later add more complex functionality to support
      overlapping counter constraints (next patch).
      
      The algorithm is modified so that the range of weight values is now
      generated from the constraints. There shouldn't be other functional
      changes.
      
      With the helper functions the scheduler is controlled. There are
      functions to initialize, traverse the event list, find unused counters
      etc. The scheduler keeps its own state.
      
      V3:
      * Added macro for_each_set_bit_cont().
      * Changed functions interfaces of perf_sched_find_counter() and
        perf_sched_next_event() to use bool as return value.
      * Added some comments to make code better understandable.
      
      V4:
      * Fix broken event assignment if weight of the first event is not
        wmin (perf_sched_init()).
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1321616122-1533-2-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      1e2ad28f
  3. 05 12月, 2011 2 次提交
  4. 14 11月, 2011 3 次提交
  5. 01 11月, 2011 5 次提交
    • B
      i7core_edac: Drop the edac_mce facility · 4140c542
      Borislav Petkov 提交于
      Remove edac_mce pieces and use the normal MCE decoder notifier chain by
      retaining the same functionality with considerably less code.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      4140c542
    • C
      Cross Memory Attach · fcf63409
      Christopher Yeoh 提交于
      The basic idea behind cross memory attach is to allow MPI programs doing
      intra-node communication to do a single copy of the message rather than a
      double copy of the message via shared memory.
      
      The following patch attempts to achieve this by allowing a destination
      process, given an address and size from a source process, to copy memory
      directly from the source process into its own address space via a system
      call.  There is also a symmetrical ability to copy from the current
      process's address space into a destination process's address space.
      
      - Use of /proc/pid/mem has been considered, but there are issues with
        using it:
        - Does not allow for specifying iovecs for both src and dest, assuming
          preadv or pwritev was implemented either the area read from or
        written to would need to be contiguous.
        - Currently mem_read allows only processes who are currently
        ptrace'ing the target and are still able to ptrace the target to read
        from the target. This check could possibly be moved to the open call,
        but its not clear exactly what race this restriction is stopping
        (reason  appears to have been lost)
        - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
        domain socket is a bit ugly from a userspace point of view,
        especially when you may have hundreds if not (eventually) thousands
        of processes  that all need to do this with each other
        - Doesn't allow for some future use of the interface we would like to
        consider adding in the future (see below)
        - Interestingly reading from /proc/pid/mem currently actually
        involves two copies! (But this could be fixed pretty easily)
      
      As mentioned previously use of vmsplice instead was considered, but has
      problems.  Since you need the reader and writer working co-operatively if
      the pipe is not drained then you block.  Which requires some wrapping to
      do non blocking on the send side or polling on the receive.  In all to all
      communication it requires ordering otherwise you can deadlock.  And in the
      example of many MPI tasks writing to one MPI task vmsplice serialises the
      copying.
      
      There are some cases of MPI collectives where even a single copy interface
      does not get us the performance gain we could.  For example in an
      MPI_Reduce rather than copy the data from the source we would like to
      instead use it directly in a mathops (say the reduce is doing a sum) as
      this would save us doing a copy.  We don't need to keep a copy of the data
      from the source.  I haven't implemented this, but I think this interface
      could in the future do all this through the use of the flags - eg could
      specify the math operation and type and the kernel rather than just
      copying the data would apply the specified operation between the source
      and destination and store it in the destination.
      
      Although we don't have a "second user" of the interface (though I've had
      some nibbles from people who may be interested in using it for intra
      process messaging which is not MPI).  This interface is something which
      hardware vendors are already doing for their custom drivers to implement
      fast local communication.  And so in addition to this being useful for
      OpenMPI it would mean the driver maintainers don't have to fix things up
      when the mm changes.
      
      There was some discussion about how much faster a true zero copy would
      go. Here's a link back to the email with some testing I did on that:
      
      http://marc.info/?l=linux-mm&m=130105930902915&w=2
      
      There is a basic man page for the proposed interface here:
      
      http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt
      
      This has been implemented for x86 and powerpc, other architecture should
      mainly (I think) just need to add syscall numbers for the process_vm_readv
      and process_vm_writev. There are 32 bit compatibility versions for
      64-bit kernels.
      
      For arch maintainers there are some simple tests to be able to quickly
      verify that the syscalls are working correctly here:
      
      http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: NChris Yeoh <yeohc@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: <linux-man@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fcf63409
    • P
      x86: Fix files explicitly requiring export.h for EXPORT_SYMBOL/THIS_MODULE · 69c60c88
      Paul Gortmaker 提交于
      These files were implicitly getting EXPORT_SYMBOL via device.h
      which was including module.h, but that will be fixed up shortly.
      
      By fixing these now, we can avoid seeing things like:
      
      arch/x86/kernel/rtc.c:29: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
      arch/x86/kernel/pci-dma.c:20: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
      arch/x86/kernel/e820.c:69: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL_GPL’
      
      [ with input from Randy Dunlap <rdunlap@xenotime.net> and also
        from Stephen Rothwell <sfr@canb.auug.org.au> ]
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      69c60c88
    • P
      x86: fix implicit include of <linux/topology.h> in vsyscall_64 · 29574022
      Paul Gortmaker 提交于
      In removing the presence of <linux/module.h> from some of the
      more common <linux/something.h> files, this implict include
      of <linux/topology.h> was uncovered.
      
        CC      arch/x86/kernel/vsyscall_64.o
        arch/x86/kernel/vsyscall_64.c: In function ‘vsyscall_set_cpu’:
        arch/x86/kernel/vsyscall_64.c:259: error: implicit declaration of function ‘cpu_to_node’
      
      Explicitly call it out so the cleanup can take place.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      29574022
    • B
      x86, MCE: Use notifier chain only for MCE decoding · f0cb5452
      Borislav Petkov 提交于
      Drop the edac_mce custom hook in favor of the generic notifier
      mechanism. Also, do not log the error to mcelog if the notified agent
      was able to decode it.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      f0cb5452
  6. 26 10月, 2011 2 次提交
  7. 25 10月, 2011 1 次提交
    • J
      x86: Fix compilation bug in kprobes' twobyte_is_boostable · 315eb8a2
      Josh Stone 提交于
      When compiling an i386_defconfig kernel with gcc-4.6.1-9.fc15.i686, I
      noticed a warning about the asm operand for test_bit in kprobes'
      can_boost.  I discovered that this caused only the first long of
      twobyte_is_boostable[] to be output.
      
      Jakub filed and fixed gcc PR50571 to correct the warning and this output
      issue.  But to solve it for less current gcc, we can make kprobes'
      twobyte_is_boostable[] non-const, and it won't be optimized out.
      
      Before:
      
          CC      arch/x86/kernel/kprobes.o
        In file included from include/linux/bitops.h:22:0,
                         from include/linux/kernel.h:17,
                         from [...]/arch/x86/include/asm/percpu.h:44,
                         from [...]/arch/x86/include/asm/current.h:5,
                         from [...]/arch/x86/include/asm/processor.h:15,
                         from [...]/arch/x86/include/asm/atomic.h:6,
                         from include/linux/atomic.h:4,
                         from include/linux/mutex.h:18,
                         from include/linux/notifier.h:13,
                         from include/linux/kprobes.h:34,
                         from arch/x86/kernel/kprobes.c:43:
        [...]/arch/x86/include/asm/bitops.h: In function ‘can_boost.part.1’:
        [...]/arch/x86/include/asm/bitops.h:319:2: warning: use of memory input
              without lvalue in asm operand 1 is deprecated [enabled by default]
      
        $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
             551:	0f a3 05 00 00 00 00 	bt     %eax,0x0
                                554: R_386_32	.rodata.cst4
      
        $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o
      
        arch/x86/kernel/kprobes.o:     file format elf32-i386
      
        Contents of section .data:
         0000 48000000 00000000 00000000 00000000  H...............
        Contents of section .rodata.cst4:
         0000 4c030000                             L...
      
      Only a single long of twobyte_is_boostable[] is in the object file.
      
      After, without the const on twobyte_is_boostable:
      
        $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
             551:	0f a3 05 20 00 00 00 	bt     %eax,0x20
                                554: R_386_32	.data
      
        $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o
      
        arch/x86/kernel/kprobes.o:     file format elf32-i386
      
        Contents of section .data:
         0000 48000000 00000000 00000000 00000000  H...............
         0010 00000000 00000000 00000000 00000000  ................
         0020 4c030000 0f000200 ffff0000 ffcff0c0  L...............
         0030 0000ffff 3bbbfff8 03ff2ebb 26bb2e77  ....;.......&..w
      
      Now all 32 bytes are output into .data instead.
      Signed-off-by: NJosh Stone <jistone@redhat.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      315eb8a2
  8. 19 10月, 2011 2 次提交
  9. 18 10月, 2011 1 次提交
    • J
      x86, perf, kprobes: Make kprobes's twobyte_is_boostable volatile · db45bd90
      Josh Stone 提交于
      When compiling an i386_defconfig kernel with
      gcc-4.6.1-9.fc15.i686, I noticed a warning about the asm operand
      for test_bit in kprobes' can_boost. I discovered that this
      caused only the first long of twobyte_is_boostable[] to be
      output.
      
      Jakub filed and fixed gcc PR50571 to correct the warning and
      this output issue.  But to solve it for less current gcc, we can
      make kprobes' twobyte_is_boostable[] volatile, and it won't be
      optimized out.
      
      Before:
      
          CC      arch/x86/kernel/kprobes.o
        In file included from include/linux/bitops.h:22:0,
                         from include/linux/kernel.h:17,
                         from [...]/arch/x86/include/asm/percpu.h:44,
                         from [...]/arch/x86/include/asm/current.h:5,
                         from [...]/arch/x86/include/asm/processor.h:15,
                         from [...]/arch/x86/include/asm/atomic.h:6,
                         from include/linux/atomic.h:4,
                         from include/linux/mutex.h:18,
                         from include/linux/notifier.h:13,
                         from include/linux/kprobes.h:34,
                         from arch/x86/kernel/kprobes.c:43:
        [...]/arch/x86/include/asm/bitops.h: In function ‘can_boost.part.1’:
        [...]/arch/x86/include/asm/bitops.h:319:2: warning: use of memory input without lvalue in asm operand 1 is deprecated [enabled by default]
      
        $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
             551:	0f a3 05 00 00 00 00 	bt     %eax,0x0
                                554: R_386_32	.rodata.cst4
      
        $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o
      
        arch/x86/kernel/kprobes.o:     file format elf32-i386
      
        Contents of section .data:
         0000 48000000 00000000 00000000 00000000  H...............
        Contents of section .rodata.cst4:
         0000 4c030000                             L...
      
      Only a single long of twobyte_is_boostable[] is in the object
      file.
      
      After, with volatile:
      
        $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
             551:	0f a3 05 20 00 00 00 	bt     %eax,0x20
                                554: R_386_32	.data
      
        $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o
      
        arch/x86/kernel/kprobes.o:     file format elf32-i386
      
        Contents of section .data:
         0000 48000000 00000000 00000000 00000000  H...............
         0010 00000000 00000000 00000000 00000000  ................
         0020 4c030000 0f000200 ffff0000 ffcff0c0  L...............
         0030 0000ffff 3bbbfff8 03ff2ebb 26bb2e77  ....;.......&..w
      
      Now all 32 bytes are output into .data instead.
      Signed-off-by: NJosh Stone <jistone@redhat.com>
      Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Link: http://lkml.kernel.org/r/1318899645-4068-1-git-send-email-jistone@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      db45bd90
  10. 14 10月, 2011 2 次提交
  11. 13 10月, 2011 2 次提交
  12. 12 10月, 2011 5 次提交
  13. 11 10月, 2011 1 次提交
  14. 10 10月, 2011 9 次提交