1. 24 9月, 2011 11 次提交
    • A
      perf top: Fix userspace sample addr map offset · af52aafa
      Arnaldo Carvalho de Melo 提交于
      The 'perf top' tool came from the kernel where we had each DSO (vmlinux,
      modules) loaded just once at a time.
      
      But userspace may have DSOs loaded in multiple addresses (shared
      libraries), requiring that we use the just resolved map instead of the
      first one found.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-ag53wz0yllpgers0n2w7hchp@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      af52aafa
    • S
      perf symbols: Fix issue with binaries using 16-bytes buildids (v2) · be96ea8f
      Stephane Eranian 提交于
      Buildid can vary in size. According to the man page of ld, buildid can
      be 160 bits (sha1) or 128 bits (md5, uuid). Perf assumes buildid size of
      20 bytes (160 bits) regardless. When dealing with md5 buildids, it would
      thus read more than needed and that would cause mismatches and samples
      without symbols.
      
      This patch fixes this by taking into account the actual buildid size as
      encoded int he section header. The leftover bytes are also cleared.
      
      This second version fixes a minor issue with the memset() base position.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@gmail.com>
      Link: http://lkml.kernel.org/r/4cc1af3c.8ee7d80a.5a28.ffff868e@mx.google.comSigned-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      be96ea8f
    • D
      perf tool: Fix endianness handling of u32 data in samples · 936be503
      David Ahern 提交于
      Currently, analyzing PPC data files on x86 the cpu field is always 0 and
      the tid and pid are backwards. For example, analyzing a PPC file on PPC
      the pid/tid fields show:
      
              rsyslogd  1210/1212
      
      and analyzing the same PPC file using an x86 perf binary shows:
      
              rsyslogd  1212/1210
      
      The problem is that the swap_op method for samples is
      perf_event__all64_swap which assumes all elements in the sample_data
      struct are u64s. cpu, tid and pid are u32s and need to be handled
      individually. Given that the swap is done before the sample is parsed,
      the simplest solution is to undo the 64-bit swap of those elements when
      the sample is parsed and do the proper swap.
      
      The RAW data field is generic and perf cannot have programmatic knowledge
      of how to treat that data. Instead a warning is given to the user.
      
      Thanks to Anton Blanchard for providing a data file for a mult-CPU
      PPC system so I could verify the fix for the CPU fields.
      
      v3 -> v4:
      - fixed use of WARN_ONCE
      
      v2 -> v3:
      - used WARN_ONCE for message regarding raw data
      - removed struct wrapper around union
      - fixed whitespace issues
      
      v1 -> v2:
      - added a union for undoing the byte-swap on u64 and redoing swap on
        u32's to address compiler errors (see git commit 65014ab3)
      
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1315321946-16993-1-git-send-email-dsahern@gmail.comSigned-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      936be503
    • A
      perf sort: Fix symbol sort output by separating unresolved samples by type · 6bb8f311
      Anton Blanchard 提交于
      I took a profile that suggested 60% of total CPU time was in the
      hypervisor:
      
      ...
          60.20%  [H] 0x33d43c
           4.43%  [k] ._spin_lock_irqsave
           1.07%  [k] ._spin_lock
      
      Using perf stat to get the user/kernel/hypervisor breakdown contradicted
      this.
      
      The problem is we merge all unresolved samples into the one unknown
      bucket. If add a comparison by sample type to sort__sym_cmp we get the
      real picture:
      
      ...
          57.11%  [.] 0x80fbf63c
           4.43%  [k] ._spin_lock_irqsave
           1.07%  [k] ._spin_lock
           0.65%  [H] 0x33d43c
      
      So it was almost all userspace, not hypervisor as the initial profile
      suggested.
      
      I found another issue while adding this. Symbol sorting sometimes shows
      multiple entries for the unknown bucket:
      
      ...
          16.65%  [.] 0x6cd3a8
           7.25%  [.] 0x422460
           5.37%  [.] yylex
           4.79%  [.] malloc
           4.78%  [.] _int_malloc
           4.03%  [.] _int_free
           3.95%  [.] hash_source_code_string
           2.82%  [.] 0x532908
           2.64%  [.] 0x36b538
           0.94%  [H] 0x8000000000e132a4
           0.82%  [H] 0x800000000000e8b0
      
      This happens because we aren't consistent with our sorting. On
      one hand we check to see if both symbols match and for two unresolved
      samples sym is NULL so we match:
      
              if (left->ms.sym == right->ms.sym)
                      return 0;
      
      On the other hand we use sample IP for unresolved samples when
      comparing against a symbol:
      
             ip_l = left->ms.sym ? left->ms.sym->start : left->ip;
             ip_r = right->ms.sym ? right->ms.sym->start : right->ip;
      
      This means unresolved samples end up spread across the rbtree and we
      can't merge them all.
      
      If we use cmp_null all unresolved samples will end up in the one bucket
      and the output makes more sense:
      
      ...
          39.12%  [.] 0x36b538
           5.37%  [.] yylex
           4.79%  [.] malloc
           4.78%  [.] _int_malloc
           4.03%  [.] _int_free
           3.95%  [.] hash_source_code_string
           2.26%  [H] 0x800000000000e8b0
      Acked-by: NEric B Munson <emunson@mgebm.net>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ian Munsie <imunsie@au1.ibm.com>
      Link: http://lkml.kernel.org/r/20110831115145.4f598ab2@krytenSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6bb8f311
    • A
      perf symbols: Synthesize anonymous mmap events · 6a0e55d8
      Anton Blanchard 提交于
      perf_event__synthesize_mmap_events does not create anonymous mmap events
      even though the kernel does. As a result an already running application
      with dynamically created code will not get profiled - all samples end up
      in the unknown bucket.
      
      This patch skips any entries with '[' in the name to avoid adding events
      for special regions (eg the vsyscall page). All other executable mmaps
      are assumed to be anonymous and an event is synthesized.
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Link: http://lkml.kernel.org/r/20110830091506.60b51fe8@krytenSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      6a0e55d8
    • D
      perf record: Create events initially disabled and enable after init · 764e16a3
      David Ahern 提交于
      perf-record currently creates events enabled. When doing a system wide
      collection (-a arg) this causes data collection for perf's
      initialization activities -- eg., perf_event__synthesize_threads().
      
      For some events (e.g., context switch S/W event or tracepoints like
      syscalls) perf's initialization causes a lot of events to be captured
      frequently generating "Check IO/CPU overload!" warnings on larger
      systems (e.g., 2 socket, quad core, hyperthreading).
      
      perf's initialization phase can be skipped by creating events
      disabled and then enabling them once the initialization is done.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1314289075-14706-1-git-send-email-dsahern@gmail.comSigned-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      764e16a3
    • A
      perf symbols: Add some heuristics for choosing the best duplicate symbol · 694bf407
      Anton Blanchard 提交于
      Try and pick the best symbol based on a few heuristics:
      
      -  Prefer a non weak symbol over a weak one
      -  Prefer a global symbol over a non global one
      -  Prefer a symbol with less underscores (idea taken from kallsyms.c)
      -  If all else fails, choose the symbol with the longest name
      
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110824065243.161953371@samba.orgSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      694bf407
    • A
      perf symbols: Preserve symbol scope when parsing /proc/kallsyms · 31877908
      Anton Blanchard 提交于
      kallsyms__parse capitalises the symbol type, so every symbol is marked
      global. Remove this and fix symbol_type__is_a to handle both local and
      global symbols.
      
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110824065243.077125989@samba.orgSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      31877908
    • A
      perf symbols: /proc/kallsyms does not sort module symbols · 3f5a4272
      Anton Blanchard 提交于
      kallsyms__parse assumes that /proc/kallsyms is sorted and sets the end
      of the previous symbol to the start of the current one.
      
      Unfortunately module symbols are not sorted, eg:
      
      ffffffffa0081f30 t e1000_clean_rx_irq   [e1000e]
      ffffffffa00817a0 t e1000_alloc_rx_buffers       [e1000e]
      
      Some symbols end up with a negative length and others have a length
      larger than they should. This results in confusing perf output.
      
      We already have a function to fixup the end of zero length symbols so
      use that instead.
      
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110824065242.969681349@samba.orgSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3f5a4272
    • A
      perf symbols: Fix ppc64 SEGV in dso__load_sym with debuginfo files · adb09184
      Anton Blanchard 提交于
      64bit PowerPC debuginfo files have an empty function descriptor section.
      I hit a SEGV when perf tried to use this section for symbol resolution.
      
      To fix this we need to check the section is valid and we can do this by
      checking for type SHT_PROGBITS.
      
      Cc: <stable@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Eric B Munson <emunson@mgebm.net>
      Link: http://lkml.kernel.org/r/20110824065242.895239970@samba.orgSigned-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      adb09184
    • M
      perf probe: Fix regression of variable finder · f66fedcb
      Masami Hiramatsu 提交于
      Fix to call convert_variable() if previous call does not fail.
      
      To call convert_variable, it ensures "ret" is 0. However, since
      "ret" has the return value of synthesize_perf_probe_arg() which
      always returns positive value if it succeeded, perf probe doesn't
      call convert_variable(). This will cause a SEGV when we add an
      event with arguments.
      
      This has to be fixed as it ensures "ret" is greater than 0
      (or not negative).
      
      This regression has been introduced by my previous patch, f182e3e1.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: yrl.pp-manager.tt@hitachi.com
      Link: http://lkml.kernel.org/r/20110820053922.3286.65805.stgit@fedora15Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f66fedcb
  2. 22 9月, 2011 10 次提交
    • L
      Linux 3.1-rc7 · d93dc5c4
      Linus Torvalds 提交于
      d93dc5c4
    • L
      XZ: Fix incorrect XZ_BUF_ERROR · 9c1f8594
      Lasse Collin 提交于
      xz_dec_run() could incorrectly return XZ_BUF_ERROR if all of the
      following was true:
      
       - The caller knows how many bytes of output to expect and only provides
         that much output space.
      
       - When the last output bytes are decoded, the caller-provided input
         buffer ends right before the LZMA2 end of payload marker.  So LZMA2
         won't provide more output anymore, but it won't know it yet and thus
         won't return XZ_STREAM_END yet.
      
       - A BCJ filter is in use and it hasn't left any unfiltered bytes in the
         temp buffer.  This can happen with any BCJ filter, but in practice
         it's more likely with filters other than the x86 BCJ.
      
      This fixes <https://bugzilla.redhat.com/show_bug.cgi?id=735408> where
      Squashfs thinks that a valid file system is corrupt.
      
      This also fixes a similar bug in single-call mode where the uncompressed
      size of a block using BCJ + LZMA2 was 0 bytes and caller provided no
      output space.  Many empty .xz files don't contain any blocks and thus
      don't trigger this bug.
      
      This also tweaks a closely related detail: xz_dec_bcj_run() could call
      xz_dec_lzma2_run() to decode into temp buffer when it was known to be
      useless.  This was harmless although it wasted a minuscule number of CPU
      cycles.
      Signed-off-by: NLasse Collin <lasse.collin@tukaani.org>
      Cc: stable <stable@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c1f8594
    • L
      Merge git://github.com/davem330/net · e5b26a88
      Linus Torvalds 提交于
      * git://github.com/davem330/net: (27 commits)
        xfrm: Perform a replay check after return from async codepaths
        fib:fix BUG_ON in fib_nl_newrule when add new fib rule
        ixgbe: fix possible null buffer error
        tg3: fix VLAN tagging regression
        net: pxa168: Fix build errors by including interrupt.h
        netconsole: switch init_netconsole() to late_initcall
        gianfar: Fix overflow check and return value for gfar_get_cls_all()
        ppp_generic: fix multilink fragment MTU calculation (again)
        GRETH: avoid overwrite IP-stack's IP-frags checksum
        GRETH: RX/TX bytes were never increased
        ipv6: fix a possible double free
        b43: Fix beacon problem in ad-hoc mode
        Bluetooth: add support for 2011 mac mini
        Bluetooth: Add MacBookAir4,1 support
        Bluetooth: Fixed BT ST Channel reg order
        r8169: do not enable the TBI for anything but the original 8169.
        r8169: remove erroneous processing of always set bit.
        r8169: fix WOL setting for 8105 and 8111evl
        r8169: add MODULE_FIRMWARE for the firmware of 8111evl
        r8169: fix the reset setting for 8111evl
        ...
      e5b26a88
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · fed678dc
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.dk/linux-block:
        floppy: use del_timer_sync() in init cleanup
        blk-cgroup: be able to remove the record of unplugged device
        block: Don't check QUEUE_FLAG_SAME_COMP in __blk_complete_request
        mm: Add comment explaining task state setting in bdi_forker_thread()
        mm: Cleanup clearing of BDI_pending bit in bdi_forker_thread()
        block: simplify force plug flush code a little bit
        block: change force plug flush call order
        block: Fix queue_flag update when rq_affinity goes from 2 to 1
        block: separate priority boosting from REQ_META
        block: remove READ_META and WRITE_META
        xen-blkback: fixed indentation and comments
        xen-blkback: Don't disconnect backend until state switched to XenbusStateClosed.
      fed678dc
    • A
      init: carefully handle loglevel option on kernel cmdline. · 808bf29b
      Alexander Sverdlin 提交于
      When a malformed loglevel value (for example "${abc}") is passed on the
      kernel cmdline, the loglevel itself is being set to 0.
      
      That then suppresses all following messages, including all the errors
      and crashes caused by other malformed cmdline options.  This could make
      debugging process quite tricky.
      
      This patch leaves the previous value of loglevel if the new value is
      incorrect and reports an error code in this case.
      Signed-off-by: NAlexander Sverdlin <alexander.sverdlin@sysgo.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      808bf29b
    • D
      teach /proc/$pid/numa_maps about transparent hugepages · 32ef4384
      Dave Hansen 提交于
      This is modeled after the smaps code.
      
      It detects transparent hugepages and then does a single gather_stats()
      for the page as a whole.  This has two benifits:
       1. It is more efficient since it does many pages in a single shot.
       2. It does not have to break down the huge page.
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32ef4384
    • D
      break out numa_maps gather_pte_stats() checks · 3200a8aa
      Dave Hansen 提交于
      gather_pte_stats() does a number of checks on a target page
      to see whether it should even be considered for statistics.
      This breaks that code out in to a separate function so that
      we can use it in the transparent hugepage case in the next
      patch.
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NChristoph Lameter <cl@gentwo.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3200a8aa
    • D
      make /proc/$pid/numa_maps gather_stats() take variable page size · eb4866d0
      Dave Hansen 提交于
      We need to teach the numa_maps code about transparent huge pages.  The
      first step is to teach gather_stats() that the pte it is dealing with
      might represent more than one page.
      
      Note that will we use this in a moment for transparent huge pages since
      they have use a single pmd_t which _acts_ as a "surrogate" for a bunch
      of smaller pte_t's.
      
      I'm a _bit_ unhappy that this interface counts in hugetlbfs page sizes
      for hugetlbfs pages and PAGE_SIZE for normal pages.  That means that to
      figure out how many _bytes_ "dirty=1" means, you must first know the
      hugetlbfs page size.  That's easier said than done especially if you
      don't have visibility in to the mount.
      
      But, that's probably a discussion for another day especially since it
      would change behavior to fix it.  But, just in case anyone wonders why
      this patch only passes a '1' in the hugetlb case...
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb4866d0
    • S
      xfrm: Perform a replay check after return from async codepaths · bcf66bf5
      Steffen Klassert 提交于
      When asyncronous crypto algorithms are used, there might be many
      packets that passed the xfrm replay check, but the replay advance
      function is not called yet for these packets. So the replay check
      function would accept a replay of all of these packets. Also the
      system might crash if there are more packets in async processing
      than the size of the anti replay window, because the replay advance
      function would try to update the replay window beyond the bounds.
      
      This pach adds a second replay check after resuming from the async
      processing to fix these issues.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcf66bf5
    • G
      fib:fix BUG_ON in fib_nl_newrule when add new fib rule · 561dac2d
      Gao feng 提交于
      add new fib rule can cause BUG_ON happen
      the reproduce shell is
      ip rule add pref 38
      ip rule add pref 38
      ip rule add to 192.168.3.0/24 goto 38
      ip rule del pref 38
      ip rule add to 192.168.3.0/24 goto 38
      ip rule add pref 38
      
      then the BUG_ON will happen
      del BUG_ON and use (ctarget == NULL) identify whether this rule is unresolved
      Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      561dac2d
  3. 21 9月, 2011 19 次提交