1. 18 6月, 2012 1 次提交
    • S
      perf: Use css_tryget() to avoid propping up css refcount · 9c5da09d
      Salman Qazi 提交于
      An rmdir pushes css's ref count to zero.  However, if the associated
      directory is open at the time, the dentry ref count is non-zero.  If
      the fd for this directory is then passed into perf_event_open, it
      does a css_get().  This bounces the ref count back up from zero.  This
      is a problem by itself.  But what makes it turn into a crash is the
      fact that we end up doing an extra dput, since we perform a dput
      when css_put sees the ref count go down to zero.
      
      css_tryget() does not fall into that trap. So, we use that instead.
      
      Reproduction test-case for the bug:
      
       #include <unistd.h>
       #include <sys/types.h>
       #include <sys/stat.h>
       #include <fcntl.h>
       #include <linux/unistd.h>
       #include <linux/perf_event.h>
       #include <string.h>
       #include <errno.h>
       #include <stdio.h>
      
       #define PERF_FLAG_PID_CGROUP    (1U << 2)
      
       int perf_event_open(struct perf_event_attr *hw_event_uptr,
                           pid_t pid, int cpu, int group_fd, unsigned long flags) {
               return syscall(__NR_perf_event_open,hw_event_uptr, pid, cpu,
                       group_fd, flags);
       }
      
       /*
        * Directly poke at the perf_event bug, since it's proving hard to repro
        * depending on where in the kernel tree.  what moved?
        */
       int main(int argc, char **argv)
       {
              int fd;
              struct perf_event_attr attr;
              memset(&attr, 0, sizeof(attr));
              attr.exclude_kernel = 1;
              attr.size = sizeof(attr);
              mkdir("/dev/cgroup/perf_event/blah", 0777);
              fd = open("/dev/cgroup/perf_event/blah", O_RDONLY);
              perror("open");
              rmdir("/dev/cgroup/perf_event/blah");
              sleep(2);
              perf_event_open(&attr, fd, 0, -1,  PERF_FLAG_PID_CGROUP);
              perror("perf_event_open");
              close(fd);
              return 0;
       }
      Signed-off-by: NSalman Qazi <sqazi@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NTejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/20120614223108.1025.2503.stgit@dungbeetle.mtv.corp.google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9c5da09d
  2. 16 6月, 2012 1 次提交
  3. 14 6月, 2012 1 次提交
    • D
      watchdog: Quiet down the boot messages · a7027046
      Don Zickus 提交于
      A bunch of bugzillas have complained how noisy the nmi_watchdog
      is during boot-up especially with its expected failure cases
      (like virt and bios resource contention).
      
      This is my attempt to quiet them down and keep it less confusing
      for the end user.  What I did is print the message for cpu0 and
      save it for future comparisons.  If future cpus have an
      identical message as cpu0, then don't print the redundant info.
      However, if a future cpu has a different message, happily print
      that loudly.
      
      Before the change, you would see something like:
      
          ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
          CPU0: Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz stepping 0a
          Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver.
          ... version:                2
          ... bit width:              40
          ... generic registers:      2
          ... value mask:             000000ffffffffff
          ... max period:             000000007fffffff
          ... fixed-purpose events:   3
          ... event mask:             0000000700000003
          NMI watchdog enabled, takes one hw-pmu counter.
          Booting Node   0, Processors  #1
          NMI watchdog enabled, takes one hw-pmu counter.
           #2
          NMI watchdog enabled, takes one hw-pmu counter.
           #3 Ok.
          NMI watchdog enabled, takes one hw-pmu counter.
          Brought up 4 CPUs
          Total of 4 processors activated (22607.24 BogoMIPS).
      
      After the change, it is simplified to:
      
          ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
          CPU0: Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz stepping 0a
          Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver.
          ... version:                2
          ... bit width:              40
          ... generic registers:      2
          ... value mask:             000000ffffffffff
          ... max period:             000000007fffffff
          ... fixed-purpose events:   3
          ... event mask:             0000000700000003
          NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
          Booting Node   0, Processors  #1 #2 #3 Ok.
          Brought up 4 CPUs
      
      V2: little changes based on Joe Perches' feedback
      V3: printk cleanup based on Ingo's feedback; checkpatch fix
      V4: keep printk as one long line
      V5: Ingo fix ups
      Reported-and-tested-by: NNathan Zimmer <nzimmer@sgi.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: nzimmer@sgi.com
      Cc: joe@perches.com
      Link: http://lkml.kernel.org/r/1339594548-17227-1-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a7027046
  4. 13 6月, 2012 1 次提交
  5. 12 6月, 2012 1 次提交
    • A
      perf tools: Fix synthesizing tracepoint names from the perf.data headers · cb9dd49e
      Arnaldo Carvalho de Melo 提交于
      We need to use the per event info snapshoted at record time to
      synthesize the events name, so do it just after reading the perf.data
      headers, when we already processed the /sys events data, otherwise we'll
      end up using the local /sys that only by sheer luck will have the same
      tracepoint ID -> real event association.
      
      Example:
      
        # uname -a
        Linux felicio.ghostprotocols.net 3.4.0-rc5+ #1 SMP Sat May 19 15:27:11 BRT 2012 x86_64 x86_64 x86_64 GNU/Linux
        # perf record -e sched:sched_switch usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.015 MB perf.data (~648 samples) ]
        # cat /t/events/sched/sched_switch/id
        279
        # perf evlist -v
        sched:sched_switch: sample_freq=1, type: 2, config: 279, size: 80, sample_type: 1159, read_format: 7, disabled: 1, inherit: 1, mmap: 1, comm: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1
        #
      
      So on the above machine the sched:sched_switch has tracepoint id 279, but on
      the machine were we'll analyse it it has a different id:
      
        $ cat /t/events/sched/sched_switch/id
        56
        $ perf evlist -i /tmp/perf.data
        kmem:mm_balancedirty_writeout
        $ cat /t/events/kmem/mm_balancedirty_writeout/id
        279
      
      With this fix:
      
        $ perf evlist -i /tmp/perf.data
        sched:sched_switch
      Reported-by: NDmitry Antipov <dmitry.antipov@linaro.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-auwks8fpuhmrdpiefs55o5oz@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cb9dd49e
  6. 11 6月, 2012 2 次提交
    • S
      perf stat: Fix default output file · fc3e4d07
      Stephane Eranian 提交于
      The following commit:
      
      commit 56f3bae7
      Author: Jim Cromie <jim.cromie@gmail.com>
      Date:   Wed Sep 7 17:14:00 2011 -0600
      
          perf stat: Add --log-fd <N> option to redirect stderr elsewhere
      
      introduced a bug in the way perf stat outputs the results by default,
      i.e., without the --log-fd or --output option. It would default to
      writing to file descriptor 0, i.e., stdin. Writing to stdin is allowed
      and is equivalent to writing to stdout. However, there is a major
      difference for any script that was already capturing the output of perf
      stat via redirection:
      
          perf stat >/tmp/log .... or perf stat 2>/tmp/log ....
      
      They would not capture anything anymore. They would have to do:
          perf stat 0>/tmp/log ...
      
      This breaks compatibility with existing scripts and does not look very
      natural.
      
      This patch fixes the problem by looking at output_fd only when it was
      modified by user (> 0). It also checks that the value if positive.
      Passing --log-fd 0 is ignored.
      
      I would also argue that defaulting to stderr for the results is not the
      right thing to do, though this patch does not address this specific
      issue.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jim Cromie <jim.cromie@gmail.com>
      Link: http://lkml.kernel.org/r/20120515111111.GA9870@quadSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fc3e4d07
    • D
      perf tools: Fix endianity swapping for adds_features bitmask · 80c0120a
      David Ahern 提交于
      Based on Jiri's latest attempt:
      https://lkml.org/lkml/2012/5/16/61
      
      Basically, adds_features should be byte swapped assuming unsigned
      longs are either 8-bytes (u64) or 4-bytes (u32).
      
          Fixes 32-bit ppc dumping 64-bit x86 feature data:
           ========
           captured on: Sun May 20 19:23:23 2012
           hostname : nxos-vdc-dev3
           os release : 3.4.0-rc7+
           perf version : 3.4.rc4.137.g978da3
           arch : x86_64
           nrcpus online : 16
           nrcpus avail : 16
           cpudesc : Intel(R) Xeon(R) CPU E5540 @ 2.53GHz
           cpuid : GenuineIntel,6,26,5
           total memory : 24680324 kB
          ...
      
      Verified 64-bit x86 can still dump feature data for 32-bit ppc.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Reviewed-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/4FBBB539.5010805@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      80c0120a
  7. 08 6月, 2012 1 次提交
  8. 07 6月, 2012 1 次提交
    • S
      tracing: Have tracing_off() actually turn tracing off · f2bf1f6f
      Steven Rostedt 提交于
      A recent update to have tracing_on/off() only affect the ftrace ring
      buffers instead of all ring buffers had a cut and paste error.
      The tracing_off() did the exact same thing as tracing_on() and
      would not actually turn off tracing. Unfortunately, tracing_off()
      is more important to be working than tracing_on() as this is a key
      development tool, as it lets the developer turn off tracing as soon
      as a problem is discovered. It is also used by panic and oops code.
      
      This bug also breaks the 'echo func:traceoff > set_ftrace_filter'
      
      Cc: <stable@vger.kernel.org> # 3.4
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f2bf1f6f
  9. 06 6月, 2012 12 次提交
  10. 05 6月, 2012 8 次提交
  11. 04 6月, 2012 5 次提交
  12. 03 6月, 2012 6 次提交
    • L
      Linux 3.5-rc1 · f8f5701b
      Linus Torvalds 提交于
      f8f5701b
    • L
      Merge tag 'dm-3.5-changes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm · 912afc36
      Linus Torvalds 提交于
      Pull device-mapper updates from Alasdair G Kergon:
       "Improve multipath's retrying mechanism in some defined circumstances
        and provide a simple reserve/release mechanism for userspace tools to
        access thin provisioning metadata while the pool is in use."
      
      * tag 'dm-3.5-changes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
        dm thin: provide userspace access to pool metadata
        dm thin: use slab mempools
        dm mpath: allow ioctls to trigger pg init
        dm mpath: delay retry of bypassed pg
        dm mpath: reduce size of struct multipath
      912afc36
    • J
      dm thin: provide userspace access to pool metadata · cc8394d8
      Joe Thornber 提交于
      This patch implements two new messages that can be sent to the thin
      pool target allowing it to take a snapshot of the _metadata_.  This,
      read-only snapshot can be accessed by userland, concurrently with the
      live target.
      
      Only one metadata snapshot can be held at a time.  The pool's status
      line will give the block location for the current msnap.
      
      Since version 0.1.5 of the userland thin provisioning tools, the
      thin_dump program displays the msnap as follows:
      
          thin_dump -m <msnap root> <metadata dev>
      
      Available here: https://github.com/jthornber/thin-provisioning-tools
      
      Now that userland can access the metadata we can do various things
      that have traditionally been kernel side tasks:
      
           i) Incremental backups.
      
           By using metadata snapshots we can work out what blocks have
           changed over time.  Combined with data snapshots we can ensure
           the data doesn't change while we back it up.
      
           A short proof of concept script can be found here:
      
           https://github.com/jthornber/thinp-test-suite/blob/master/incremental_backup_example.rb
      
           ii) Migration of thin devices from one pool to another.
      
           iii) Merging snapshots back into an external origin.
      
           iv) Asyncronous replication.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      cc8394d8
    • M
      dm thin: use slab mempools · a24c2569
      Mike Snitzer 提交于
      Use dedicated caches prefixed with a "dm_" name rather than relying on
      kmalloc mempools backed by generic slab caches so the memory usage of
      thin provisioning (and any leaks) can be accounted for independently.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a24c2569
    • M
      dm mpath: allow ioctls to trigger pg init · 35991652
      Mikulas Patocka 提交于
      After the failure of a group of paths, any alternative paths that
      need initialising do not become available until further I/O is sent to
      the device.  Until this has happened, ioctls return -EAGAIN.
      
      With this patch, new paths are made available in response to an ioctl
      too.  The processing of the ioctl gets delayed until this has happened.
      
      Instead of returning an error, we submit a work item to kmultipathd
      (that will potentially activate the new path) and retry in ten
      milliseconds.
      
      Note that the patch doesn't retry an ioctl if the ioctl itself fails due
      to a path failure.  Such retries should be handled intelligently by the
      code that generated the ioctl in the first place, noting that some SCSI
      commands should not be retried because they are not idempotent (XOR write
      commands).  For commands that could be retried, there is a danger that
      if the device rejected the SCSI command, the path could be errorneously
      marked as failed, and the request would be retried on another path which
      might fail too.  It can be determined if the failure happens on the
      device or on the SCSI controller, but there is no guarantee that all
      SCSI drivers set these flags correctly.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      35991652
    • M
      dm mpath: delay retry of bypassed pg · f220fd4e
      Mike Christie 提交于
      If I/O needs retrying and only bypassed priority groups are available,
      set the pg_init_delay_retry flag to wait before retrying.
      
      If, for example, the reason for the bypass is that the controller is
      getting reset or there is a firmware upgrade happening, retrying right
      away would cause a flood of log messages and retries for what could be a
      few seconds or even several minutes.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f220fd4e