1. 03 12月, 2008 4 次提交
    • S
      ftrace: print real return in dumpstack for function graph · 7ee991fb
      Steven Rostedt 提交于
      Impact: better dumpstack output
      
      I noticed in my crash dumps and even in the stack tracer that a
      lot of functions listed in the stack trace are simply
      return_to_handler which is ftrace graphs way to insert its own
      call into the return of a function.
      
      But we lose out where the actually function was called from.
      
      This patch adds in hooks to the dumpstack mechanism that detects
      this and finds the real function to print. Both are printed to
      let the user know that a hook is still in place.
      
      This does give a funny side effect in the stack tracer output:
      
              Depth   Size      Location    (80 entries)
              -----   ----      --------
        0)     4144      48   save_stack_trace+0x2f/0x4d
        1)     4096     128   ftrace_call+0x5/0x2b
        2)     3968      16   mempool_alloc_slab+0x16/0x18
        3)     3952     384   return_to_handler+0x0/0x73
        4)     3568    -240   stack_trace_call+0x11d/0x209
        5)     3808     144   return_to_handler+0x0/0x73
        6)     3664    -128   mempool_alloc+0x4d/0xfe
        7)     3792     128   return_to_handler+0x0/0x73
        8)     3664     -32   scsi_sg_alloc+0x48/0x4a [scsi_mod]
      
      As you can see, the real functions are now negative. This is due
      to them not being found inside the stack.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7ee991fb
    • S
      ftrace: add ftrace_graph_stop() · 14a866c5
      Steven Rostedt 提交于
      Impact: new ftrace_graph_stop function
      
      While developing more features of function graph, I hit a bug that
      caused the WARN_ON to trigger in the prepare_ftrace_return function.
      Well, it was hard for me to find out that was happening because the
      bug would not print, it would just cause a hard lockup or reboot.
      The reason is that it is not safe to call printk from this function.
      
      Looking further, I also found that it calls unregister_ftrace_graph,
      which grabs a mutex and calls kstop machine. This would definitely
      lock the box up if it were to trigger.
      
      This patch adds a fast and safe ftrace_graph_stop() which will
      stop the function tracer. Then it is safe to call the WARN ON.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      14a866c5
    • S
      ftrace: have function graph use mcount caller address · bb4304c7
      Steven Rostedt 提交于
      Impact: consistency change for function graph
      
      This patch makes function graph record the mcount caller address
      the same way the function tracer does.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bb4304c7
    • S
      ftrace: clean up function graph asm · 347fdd9d
      Steven Rostedt 提交于
      Impact: clean up
      
      There exists macros for x86 asm to handle x86_64 and i386.
      This patch updates function graph asm to use them.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      347fdd9d
  2. 02 12月, 2008 1 次提交
    • F
      tracing/function-graph-tracer: support for x86-64 · 48d68b20
      Frederic Weisbecker 提交于
      Impact: extend and enable the function graph tracer to 64-bit x86
      
      This patch implements the support for function graph tracer under x86-64.
      Both static and dynamic tracing are supported.
      
      This causes some small CPP conditional asm on arch/x86/kernel/ftrace.c I
      wanted to use probe_kernel_read/write to make the return address
      saving/patching code more generic but it causes tracing recursion.
      
      That would be perhaps useful to implement a notrace version of these
      function for other archs ports.
      
      Note that arch/x86/process_64.c is not traced, as in X86-32. I first
      thought __switch_to() was responsible of crashes during tracing because I
      believed current task were changed inside but that's actually not the
      case (actually yes, but not the "current" pointer).
      
      So I will have to investigate to find the functions that harm here, to
      enable tracing of the other functions inside (but there is no issue at
      this time, while process_64.c stays out of -pg flags).
      
      A little possible race condition is fixed inside this patch too. When the
      tracer allocate a return stack dynamically, the current depth is not
      initialized before but after. An interrupt could occur at this time and,
      after seeing that the return stack is allocated, the tracer could try to
      trace it with a random uninitialized depth. It's a prevention, even if I
      hadn't problems with it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tim Bird <tim.bird@am.sony.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      48d68b20
  3. 01 12月, 2008 5 次提交
  4. 27 11月, 2008 1 次提交
    • J
      x86: always define DECLARE_PCI_UNMAP* macros · b627c8b1
      Joerg Roedel 提交于
      Impact: fix boot crash on AMD IOMMU if CONFIG_GART_IOMMU is off
      
      Currently these macros evaluate to a no-op except the kernel is compiled
      with GART or Calgary support. But we also need these macros when we have
      SWIOTLB, VT-d or AMD IOMMU in the kernel. Since we always compile at
      least with SWIOTLB we can define these macros always.
      
      This patch is also for stable backport for the same reason the SWIOTLB
      default selection patch is.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b627c8b1
  5. 26 11月, 2008 14 次提交
    • A
      tracing: add "power-tracer": C/P state tracer to help power optimization · f3f47a67
      Arjan van de Ven 提交于
      Impact: new "power-tracer" ftrace plugin
      
      This patch adds a C/P-state ftrace plugin that will generate
      detailed statistics about the C/P-states that are being used,
      so that we can look at detailed decisions that the C/P-state
      code is making, rather than the too high level "average"
      that we have today.
      
      An example way of using this is:
      
       mount -t debugfs none /sys/kernel/debug
       echo cstate > /sys/kernel/debug/tracing/current_tracer
       echo 1 > /sys/kernel/debug/tracing/tracing_enabled
       sleep 1
       echo 0 > /sys/kernel/debug/tracing/tracing_enabled
       cat /sys/kernel/debug/tracing/trace | perl scripts/trace/cstate.pl > out.svg
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f3f47a67
    • S
      ftrace: use code patching for ftrace graph tracer · 5a45cfe1
      Steven Rostedt 提交于
      Impact: more efficient code for ftrace graph tracer
      
      This patch uses the dynamic patching, when available, to patch
      the function graph code into the kernel.
      
      This patch will ease the way for letting both function tracing
      and function graph tracing run together.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5a45cfe1
    • A
      x86: fixup config space size of CPU functions for AMD family 11h · ffd565a8
      Andreas Herrmann 提交于
      Impact: extend allowed configuration space access on 11h CPUs from 256 to 4K
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Acked-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ffd565a8
    • I
      tracing: function graph tracer, fix · c2324b69
      Ingo Molnar 提交于
      fix return-tracer => graph-tracer namespace rename fallout.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c2324b69
    • F
      tracing/function-return-tracer: set a more human readable output · 287b6e68
      Frederic Weisbecker 提交于
      Impact: feature
      
      This patch sets a C-like output for the function graph tracing.
      For this aim, we now call two handler for each function: one on the entry
      and one other on return. This way we can draw a well-ordered call stack.
      
      The pid of the previous trace is loosely stored to be compared against
      the one of the current trace to see if there were a context switch.
      
      Without this little feature, the call tree would seem broken at
      some locations.
      We could use the sched_tracer to capture these sched_events but this
      way of processing is much more simpler.
      
      2 spaces have been chosen for indentation to fit the screen while deep
      calls. The time of execution in nanosecs is printed just after closed
      braces, it seems more easy this way to find the corresponding function.
      If the time was printed as a first column, it would be not so easy to
      find the corresponding function if it is called on a deep depth.
      
      I plan to output the return value but on 32 bits CPU, the return value
      can be 32 or 64, and its difficult to guess on which case we are.
      I don't know what would be the better solution on X86-32: only print
      eax (low-part) or even edx (high-part).
      
      Actually it's thee same problem when a function return a 8 bits value, the
      high part of eax could contain junk values...
      
      Here is an example of trace:
      
      sys_read() {
        fget_light() {
        } 526
        vfs_read() {
          rw_verify_area() {
            security_file_permission() {
              cap_file_permission() {
              } 519
            } 1564
          } 2640
          do_sync_read() {
            pipe_read() {
              __might_sleep() {
              } 511
              pipe_wait() {
                prepare_to_wait() {
                } 760
                deactivate_task() {
                  dequeue_task() {
                    dequeue_task_fair() {
                      dequeue_entity() {
                        update_curr() {
                          update_min_vruntime() {
                          } 504
                        } 1587
                        clear_buddies() {
                        } 512
                        add_cfs_task_weight() {
                        } 519
                        update_min_vruntime() {
                        } 511
                      } 5602
                      dequeue_entity() {
                        update_curr() {
                          update_min_vruntime() {
                          } 496
                        } 1631
                        clear_buddies() {
                        } 496
                        update_min_vruntime() {
                        } 527
                      } 4580
                      hrtick_update() {
                        hrtick_start_fair() {
                        } 488
                      } 1489
                    } 13700
                  } 14949
                } 16016
                msecs_to_jiffies() {
                } 496
                put_prev_task_fair() {
                } 504
                pick_next_task_fair() {
                } 489
                pick_next_task_rt() {
                } 496
                pick_next_task_fair() {
                } 489
                pick_next_task_idle() {
                } 489
      
      ------------8<---------- thread 4 ------------8<----------
      
      finish_task_switch() {
      } 1203
      do_softirq() {
        __do_softirq() {
          __local_bh_disable() {
          } 669
          rcu_process_callbacks() {
            __rcu_process_callbacks() {
              cpu_quiet() {
                rcu_start_batch() {
                } 503
              } 1647
            } 3128
            __rcu_process_callbacks() {
            } 542
          } 5362
          _local_bh_enable() {
          } 587
        } 8880
      } 9986
      kthread_should_stop() {
      } 669
      deactivate_task() {
        dequeue_task() {
          dequeue_task_fair() {
            dequeue_entity() {
              update_curr() {
                calc_delta_mine() {
                } 511
                update_min_vruntime() {
                } 511
              } 2813
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      287b6e68
    • F
      tracing/function-return-tracer: change the name into function-graph-tracer · fb52607a
      Frederic Weisbecker 提交于
      Impact: cleanup
      
      This patch changes the name of the "return function tracer" into
      function-graph-tracer which is a more suitable name for a tracing
      which makes one able to retrieve the ordered call stack during
      the code flow.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fb52607a
    • A
      [CPUFREQ] powernow-k8: ignore out-of-range PstateStatus value · a266d9f1
      Andreas Herrmann 提交于
      A workaround for AMD CPU family 11h erratum 311 might cause that the
      P-state Status Register shows a "current P-state" which is larger than
      the "current P-state limit" in P-state Current Limit Register. For the
      wrong P-state value there is no ACPI _PSS object defined and
      powernow-k8/cpufreq can't determine the proper CPU frequency for that
      state.
      
      As a consequence this can cause a panic during boot (potentially with
      all recent kernel versions -- at least I have reproduced it with
      various 2.6.27 kernels and with the current .28 series), as an
      example:
      
      powernow-k8: Found 1 AMD Turion(tm)X2 Ultra DualCore Mobile ZM-82 processors (2 \
      )
      powernow-k8:    0 : pstate 0 (2200 MHz)
      powernow-k8:    1 : pstate 1 (1100 MHz)
      powernow-k8:    2 : pstate 2 (600 MHz)
      BUG: unable to handle kernel paging request at ffff88086e7528b8
      IP: [<ffffffff80486361>] cpufreq_stats_update+0x4a/0x5f
      PGD 202063 PUD 0
      Oops: 0002 [#1] SMP
      last sysfs file:
      CPU 1
      Modules linked in:
      Pid: 1, comm: swapper Not tainted 2.6.28-rc3-dirty #16
      RIP: 0010:[<ffffffff80486361>]  [<ffffffff80486361>] cpufreq_stats_update+0x4a/0\
      f
      Synaptics claims to have extended capabilities, but I'm not able to read them.<6\
      6
      RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffff88006e7528c0
      RDX: 00000000ffffffff RSI: ffff88006e54af00 RDI: ffffffff808f056c
      RBP: 00000000fffee697 R08: 0000000000000003 R09: ffff88006e73f080
      R10: 0000000000000001 R11: 00000000002191c0 R12: ffff88006fb83c10
      R13: 00000000ffffffff R14: 0000000000000001 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff88006fb50740(0000) knlGS:0000000000000000
      Unable to initialize Synaptics hardware.
      CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      CR2: ffff88086e7528b8 CR3: 0000000000201000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper (pid: 1, threadinfo ffff88006fb82000, task ffff88006fb816d0)
      Stack:
       ffff88006e74da50 0000000000000000 ffff88006e54af00 ffffffff804863c7
       ffff88006e74da50 0000000000000000 00000000ffffffff 0000000000000000
       ffff88006fb83c10 ffffffff8024b46c ffffffff808f0560 ffff88006fb83c10
      Call Trace:
       [<ffffffff804863c7>] ? cpufreq_stat_notifier_trans+0x51/0x83
       [<ffffffff8024b46c>] ? notifier_call_chain+0x29/0x4c
       [<ffffffff8024b561>] ? __srcu_notifier_call_chain+0x46/0x61
       [<ffffffff8048496d>] ? cpufreq_notify_transition+0x93/0xa9
       [<ffffffff8021ab8d>] ? powernowk8_target+0x1e8/0x5f3
       [<ffffffff80486687>] ? cpufreq_governor_performance+0x1b/0x20
       [<ffffffff80484886>] ? __cpufreq_governor+0x71/0xa8
       [<ffffffff80484b21>] ? __cpufreq_set_policy+0x101/0x13e
       [<ffffffff80485bcd>] ? cpufreq_add_dev+0x3f0/0x4cd
       [<ffffffff8048577a>] ? handle_update+0x0/0x8
       [<ffffffff803c2062>] ? sysdev_driver_register+0xb6/0x10d
       [<ffffffff8056592c>] ? powernowk8_init+0x0/0x7e
       [<ffffffff8048604c>] ? cpufreq_register_driver+0x8f/0x140
       [<ffffffff80209056>] ? _stext+0x56/0x14f
       [<ffffffff802c2234>] ? proc_register+0x122/0x17d
       [<ffffffff802c23a0>] ? create_proc_entry+0x73/0x8a
       [<ffffffff8025c259>] ? register_irq_proc+0x92/0xaa
       [<ffffffff8025c2c8>] ? init_irq_proc+0x57/0x69
       [<ffffffff807fc85f>] ? kernel_init+0x116/0x169
       [<ffffffff8020cc79>] ? child_rip+0xa/0x11
       [<ffffffff807fc749>] ? kernel_init+0x0/0x169
       [<ffffffff8020cc6f>] ? child_rip+0x0/0x11
      Code: 05 c5 83 36 00 48 c7 c2 48 5d 86 80 48 8b 04 d8 48 8b 40 08 48 8b 34 02 48\
      
      RIP  [<ffffffff80486361>] cpufreq_stats_update+0x4a/0x5f
       RSP <ffff88006fb83b20>
      CR2: ffff88086e7528b8
      ---[ end trace 0678bac75e67a2f7 ]---
      Kernel panic - not syncing: Attempted to kill init!
      
      In short, aftereffect of the wrong P-state is that
      cpufreq_stats_update() uses "-1" as index for some array in
      
      cpufreq_stats_update (unsigned int cpu)
      {
      ...
           if (stat->time_in_state)
                      stat->time_in_state[stat->last_index] =
                              cputime64_add(stat->time_in_state[stat->last_index],
                                            cputime_sub(cur_time, stat->last_time));
      ...
      }
      
      Fortunately, the wrong P-state value is returned only if the core is
      in P-state 0. This fix solves the problem by detecting the
      out-of-range P-state, ignoring it, and using "0" instead.
      
      Cc: Mark Langsdorf <mark.langsdorf@amd.com>
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Signed-off-by: NDave Jones <davej@redhat.com>
      a266d9f1
    • M
      x86, bts, ftrace: a BTS ftrace plug-in prototype · 1e9b51c2
      Markus Metzger 提交于
      Impact: add new ftrace plugin
      
      A prototype for a BTS ftrace plug-in.
      
      The tracer collects branch trace in a cyclic buffer for each cpu.
      
      The tracer is not configurable and the trace for each snapshot is
      appended when doing cat /debug/tracing/trace.
      
      This is a proof of concept that will be extended with future patches
      to become a (hopefully) useful tool.
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1e9b51c2
    • M
      x86, bts, ptrace: move BTS buffer allocation from ds.c into ptrace.c · 6abb11ae
      Markus Metzger 提交于
      Impact: restructure DS memory allocation to be done by the usage site of DS
      
      Require pre-allocated buffers in ds.h.
      
      Move the BTS buffer allocation for ptrace into ptrace.c.
      The pointer to the allocated buffer is stored in the traced task's
      task_struct together with the handle returned by ds_request_bts().
      
      Removes memory accounting code.
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6abb11ae
    • M
      x86, bts: base in-kernel ds interface on handles · ca0002a1
      Markus Metzger 提交于
      Impact: generalize the DS code to shared buffers
      
      Change the in-kernel ds.h interface to identify the tracer via a
      handle returned on ds_request_~().
      
      Tracers used to be identified via their task_struct.
      
      The changes are required to allow DS to be shared between different
      tasks, which is needed for perfmon2 and for ftrace.
      
      For ptrace, the handle is stored in the traced task's task_struct.
      This should probably go into a (arch-specific) ptrace context some
      time.
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ca0002a1
    • M
      x86, bts: fix wrmsr and spinlock over kmalloc · de90add3
      Markus Metzger 提交于
      Impact: fix sleeping-with-spinlock-held bugs/crashes
      
      - Turn a wrmsr to write the DS_AREA MSR into a wrmsrl.
      - Use irqsave variants of spinlocks.
      - Do not allocate memory while holding spinlocks.
      Reported-by: NStephane Eranian <eranian@googlemail.com>
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      de90add3
    • M
      x86, pebs: fix PEBS record size configuration · c4858ffc
      Markus Metzger 提交于
      Impact: fix DS hw enablement on 64-bit x86
      
      Fix the PEBS record size in the DS configuration.
      Reported-by: NStephane Eranian <eranian@googlemail.com>
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c4858ffc
    • M
      x86, bts: turn macro into static inline function · e5e8ca63
      Markus Metzger 提交于
      Impact: cleanup
      
      Replace a macro with a static inline function.
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e5e8ca63
    • M
      x86, bts: exclude ds.c from build when disabled · 292c669c
      Markus Metzger 提交于
      Impact: cleanup
      
      Move the CONFIG guard from the .c file into the makefile.
      Reported-by: NAndi Kleen <andi-suse@firstfloor.org>
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      292c669c
  6. 25 11月, 2008 2 次提交
  7. 23 11月, 2008 6 次提交
    • I
      xen: pin correct PGD on suspend · 86bbc2c2
      Ian Campbell 提交于
      Impact: fix Xen guest boot failure
      
      commit eefb47f6 ("xen: use
      spin_lock_nest_lock when pinning a pagetable") changed xen_pgd_walk to
      walk over mm->pgd rather than taking pgd as an argument.
      
      This breaks xen_mm_(un)pin_all() because it makes init_mm.pgd readonly
      instead of the pgd we are interested in and therefore the pin subsequently
      fails.
      
      (XEN) mm.c:2280:d15 Bad type (saw 00000000e8000001 != exp 0000000060000000) for mfn bc464 (pfn 21ca7)
      (XEN) mm.c:2665:d15 Error while pinning mfn bc464
      
      [   14.586913] 1 multicall(s) failed: cpu 0
      [   14.586926] Pid: 14, comm: kstop/0 Not tainted 2.6.28-rc5-x86_32p-xenU-00172-gee2f6cc7 #200
      [   14.586940] Call Trace:
      [   14.586955]  [<c030c17a>] ? printk+0x18/0x1e
      [   14.586972]  [<c0103df3>] xen_mc_flush+0x163/0x1d0
      [   14.586986]  [<c0104bc1>] __xen_pgd_pin+0xa1/0x110
      [   14.587000]  [<c015a330>] ? stop_cpu+0x0/0xf0
      [   14.587015]  [<c0104d7b>] xen_mm_pin_all+0x4b/0x70
      [   14.587029]  [<c022bcb9>] xen_suspend+0x39/0xe0
      [   14.587042]  [<c015a330>] ? stop_cpu+0x0/0xf0
      [   14.587054]  [<c015a3cd>] stop_cpu+0x9d/0xf0
      [   14.587067]  [<c01417cd>] run_workqueue+0x8d/0x150
      [   14.587080]  [<c030e4b3>] ? _spin_unlock_irqrestore+0x23/0x40
      [   14.587094]  [<c014558a>] ? prepare_to_wait+0x3a/0x70
      [   14.587107]  [<c0141918>] worker_thread+0x88/0xf0
      [   14.587120]  [<c01453c0>] ? autoremove_wake_function+0x0/0x50
      [   14.587133]  [<c0141890>] ? worker_thread+0x0/0xf0
      [   14.587146]  [<c014509c>] kthread+0x3c/0x70
      [   14.587157]  [<c0145060>] ? kthread+0x0/0x70
      [   14.587170]  [<c0109d1b>] kernel_thread_helper+0x7/0x10
      [   14.587181]   call  1/3: op=14 arg=[c0415000] result=0
      [   14.587192]   call  2/3: op=14 arg=[e1ca2000] result=0
      [   14.587204]   call  3/3: op=26 arg=[c1808860] result=-22
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      86bbc2c2
    • T
      x86: revert irq number limitation · a1967d64
      Thomas Gleixner 提交于
      Impact: fix MSIx not enough irq numbers available regression
      
      The manual revert of the sparse_irq patches missed to bring the number
      of possible irqs back to the .27 status. This resulted in a regression
      when two multichannel network cards were placed in a system with only
      one IO_APIC - causing the networking driver to not have the right
      IRQ and the device not coming up.
      
      Remove the dynamic allocation logic leftovers and simply return
      NR_IRQS in probe_nr_irqs() for now.
      
         Fixes: http://lkml.org/lkml/2008/11/19/354Reported-by: NJesper Dangaard Brouer <hawk@diku.dk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJesper Dangaard Brouer <hawk@diku.dk>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a1967d64
    • T
      tracing/stack-tracer: introduce CONFIG_USER_STACKTRACE_SUPPORT · 8d26487f
      Török Edwin 提交于
      Impact: cleanup
      
      User stack tracing is just implemented for x86, but it is not x86 specific.
      
      Introduce a generic config flag, that is currently enabled only for x86.
      When other arches implement it, they will have to
      SELECT USER_STACKTRACE_SUPPORT.
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8d26487f
    • T
      tracing/stack-tracer: fix style issues · 8d7c6a96
      Török Edwin 提交于
      Impact: cleanup
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8d7c6a96
    • T
      tracing: add support for userspace stacktraces in tracing/iter_ctrl · 02b67518
      Török Edwin 提交于
      Impact: add new (default-off) tracing visualization feature
      
      Usage example:
      
       mount -t debugfs nodev /sys/kernel/debug
       cd /sys/kernel/debug/tracing
       echo userstacktrace >iter_ctrl
       echo sched_switch >current_tracer
       echo 1 >tracing_enabled
       .... run application ...
       echo 0 >tracing_enabled
      
      Then read one of 'trace','latency_trace','trace_pipe'.
      
      To get the best output you can compile your userspace programs with
      frame pointers (at least glibc + the app you are tracing).
      Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      02b67518
    • F
      tracing/function-return-tracer: store return stack into task_struct and allocate it dynamically · f201ae23
      Frederic Weisbecker 提交于
      Impact: use deeper function tracing depth safely
      
      Some tests showed that function return tracing needed a more deeper depth
      of function calls. But it could be unsafe to store these return addresses
      to the stack.
      
      So these arrays will now be allocated dynamically into task_struct of current
      only when the tracer is activated.
      
      Typical scheme when tracer is activated:
      - allocate a return stack for each task in global list.
      - fork: allocate the return stack for the newly created task
      - exit: free return stack of current
      - idle init: same as fork
      
      I chose a default depth of 50. I don't have overruns anymore.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f201ae23
  8. 21 11月, 2008 1 次提交
    • M
      x86: Fix interrupt leak due to migration · 0ca4b6b0
      Matthew Wilcox 提交于
      When we migrate an interrupt from one CPU to another, we set the
      move_in_progress flag and clean up the vectors later once they're not
      being used.  If you're unlucky and call destroy_irq() before the vectors
      become un-used, the move_in_progress flag is never cleared, which causes
      the interrupt to become unusable.
      
      This was discovered by Jesse Brandeburg for whom it manifested as an
      MSI-X device refusing to use MSI-X mode when the driver was unloaded
      and reloaded repeatedly.
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ca4b6b0
  9. 20 11月, 2008 3 次提交
    • R
      x86: fixing __cpuinit/__init tangle, xsave_cntxt_init() · bfe085f6
      Rakib Mullick 提交于
      Annotate xsave_cntxt_init() as "can be called outside of __init".
      Signed-off-by: NRakib Mullick <rakib.mullick@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bfe085f6
    • R
      x86: fix __cpuinit/__init tangle in init_thread_xstate() · 9bc646f1
      Rakib Mullick 提交于
      Impact:	fix incorrect __init annotation
      
      This patch removes the following section mismatch warning. A patch set
      was send previously (http://lkml.org/lkml/2008/11/10/407). But
      introduce some other problem, reported by Rufus
      (http://lkml.org/lkml/2008/11/11/46). Then Ingo Molnar suggest that,
      it's best to remove __init from xsave_cntxt_init(void). Which is the
      second patch in this series. Now, this one removes the following
      warning.
      
      WARNING: arch/x86/kernel/built-in.o(.cpuinit.text+0x2237): Section
      mismatch in reference from the function cpu_init() to the function
      .init.text:init_thread_xstate()
      The function __cpuinit cpu_init() references
      a function __init init_thread_xstate().
      If init_thread_xstate is only used by cpu_init then
      annotate init_thread_xstate with a matching annotation.
      Signed-off-by: NRakib Mullick <rakib.mullick@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9bc646f1
    • U
      reintroduce accept4 · de11defe
      Ulrich Drepper 提交于
      Introduce a new accept4() system call.  The addition of this system call
      matches analogous changes in 2.6.27 (dup3(), evenfd2(), signalfd4(),
      inotify_init1(), epoll_create1(), pipe2()) which added new system calls
      that differed from analogous traditional system calls in adding a flags
      argument that can be used to access additional functionality.
      
      The accept4() system call is exactly the same as accept(), except that
      it adds a flags bit-mask argument.  Two flags are initially implemented.
      (Most of the new system calls in 2.6.27 also had both of these flags.)
      
      SOCK_CLOEXEC causes the close-on-exec (FD_CLOEXEC) flag to be enabled
      for the new file descriptor returned by accept4().  This is a useful
      security feature to avoid leaking information in a multithreaded
      program where one thread is doing an accept() at the same time as
      another thread is doing a fork() plus exec().  More details here:
      http://udrepper.livejournal.com/20407.html "Secure File Descriptor Handling",
      Ulrich Drepper).
      
      The other flag is SOCK_NONBLOCK, which causes the O_NONBLOCK flag
      to be enabled on the new open file description created by accept4().
      (This flag is merely a convenience, saving the use of additional calls
      fcntl(F_GETFL) and fcntl (F_SETFL) to achieve the same result.
      
      Here's a test program.  Works on x86-32.  Should work on x86-64, but
      I (mtk) don't have a system to hand to test with.
      
      It tests accept4() with each of the four possible combinations of
      SOCK_CLOEXEC and SOCK_NONBLOCK set/clear in 'flags', and verifies
      that the appropriate flags are set on the file descriptor/open file
      description returned by accept4().
      
      I tested Ulrich's patch in this thread by applying against 2.6.28-rc2,
      and it passes according to my test program.
      
      /* test_accept4.c
      
        Copyright (C) 2008, Linux Foundation, written by Michael Kerrisk
             <mtk.manpages@gmail.com>
      
        Licensed under the GNU GPLv2 or later.
      */
      #define _GNU_SOURCE
      #include <unistd.h>
      #include <sys/syscall.h>
      #include <sys/socket.h>
      #include <netinet/in.h>
      #include <stdlib.h>
      #include <fcntl.h>
      #include <stdio.h>
      #include <string.h>
      
      #define PORT_NUM 33333
      
      #define die(msg) do { perror(msg); exit(EXIT_FAILURE); } while (0)
      
      /**********************************************************************/
      
      /* The following is what we need until glibc gets a wrapper for
        accept4() */
      
      /* Flags for socket(), socketpair(), accept4() */
      #ifndef SOCK_CLOEXEC
      #define SOCK_CLOEXEC    O_CLOEXEC
      #endif
      #ifndef SOCK_NONBLOCK
      #define SOCK_NONBLOCK   O_NONBLOCK
      #endif
      
      #ifdef __x86_64__
      #define SYS_accept4 288
      #elif __i386__
      #define USE_SOCKETCALL 1
      #define SYS_ACCEPT4 18
      #else
      #error "Sorry -- don't know the syscall # on this architecture"
      #endif
      
      static int
      accept4(int fd, struct sockaddr *sockaddr, socklen_t *addrlen, int flags)
      {
         printf("Calling accept4(): flags = %x", flags);
         if (flags != 0) {
             printf(" (");
             if (flags & SOCK_CLOEXEC)
                 printf("SOCK_CLOEXEC");
             if ((flags & SOCK_CLOEXEC) && (flags & SOCK_NONBLOCK))
                 printf(" ");
             if (flags & SOCK_NONBLOCK)
                 printf("SOCK_NONBLOCK");
             printf(")");
         }
         printf("\n");
      
      #if USE_SOCKETCALL
         long args[6];
      
         args[0] = fd;
         args[1] = (long) sockaddr;
         args[2] = (long) addrlen;
         args[3] = flags;
      
         return syscall(SYS_socketcall, SYS_ACCEPT4, args);
      #else
         return syscall(SYS_accept4, fd, sockaddr, addrlen, flags);
      #endif
      }
      
      /**********************************************************************/
      
      static int
      do_test(int lfd, struct sockaddr_in *conn_addr,
             int closeonexec_flag, int nonblock_flag)
      {
         int connfd, acceptfd;
         int fdf, flf, fdf_pass, flf_pass;
         struct sockaddr_in claddr;
         socklen_t addrlen;
      
         printf("=======================================\n");
      
         connfd = socket(AF_INET, SOCK_STREAM, 0);
         if (connfd == -1)
             die("socket");
         if (connect(connfd, (struct sockaddr *) conn_addr,
                     sizeof(struct sockaddr_in)) == -1)
             die("connect");
      
         addrlen = sizeof(struct sockaddr_in);
         acceptfd = accept4(lfd, (struct sockaddr *) &claddr, &addrlen,
                            closeonexec_flag | nonblock_flag);
         if (acceptfd == -1) {
             perror("accept4()");
             close(connfd);
             return 0;
         }
      
         fdf = fcntl(acceptfd, F_GETFD);
         if (fdf == -1)
             die("fcntl:F_GETFD");
         fdf_pass = ((fdf & FD_CLOEXEC) != 0) ==
                    ((closeonexec_flag & SOCK_CLOEXEC) != 0);
         printf("Close-on-exec flag is %sset (%s); ",
                 (fdf & FD_CLOEXEC) ? "" : "not ",
                 fdf_pass ? "OK" : "failed");
      
         flf = fcntl(acceptfd, F_GETFL);
         if (flf == -1)
             die("fcntl:F_GETFD");
         flf_pass = ((flf & O_NONBLOCK) != 0) ==
                    ((nonblock_flag & SOCK_NONBLOCK) !=0);
         printf("nonblock flag is %sset (%s)\n",
                 (flf & O_NONBLOCK) ? "" : "not ",
                 flf_pass ? "OK" : "failed");
      
         close(acceptfd);
         close(connfd);
      
         printf("Test result: %s\n", (fdf_pass && flf_pass) ? "PASS" : "FAIL");
         return fdf_pass && flf_pass;
      }
      
      static int
      create_listening_socket(int port_num)
      {
         struct sockaddr_in svaddr;
         int lfd;
         int optval;
      
         memset(&svaddr, 0, sizeof(struct sockaddr_in));
         svaddr.sin_family = AF_INET;
         svaddr.sin_addr.s_addr = htonl(INADDR_ANY);
         svaddr.sin_port = htons(port_num);
      
         lfd = socket(AF_INET, SOCK_STREAM, 0);
         if (lfd == -1)
             die("socket");
      
         optval = 1;
         if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR, &optval,
                        sizeof(optval)) == -1)
             die("setsockopt");
      
         if (bind(lfd, (struct sockaddr *) &svaddr,
                  sizeof(struct sockaddr_in)) == -1)
             die("bind");
      
         if (listen(lfd, 5) == -1)
             die("listen");
      
         return lfd;
      }
      
      int
      main(int argc, char *argv[])
      {
         struct sockaddr_in conn_addr;
         int lfd;
         int port_num;
         int passed;
      
         passed = 1;
      
         port_num = (argc > 1) ? atoi(argv[1]) : PORT_NUM;
      
         memset(&conn_addr, 0, sizeof(struct sockaddr_in));
         conn_addr.sin_family = AF_INET;
         conn_addr.sin_addr.s_addr = htonl(INADDR_LOOPBACK);
         conn_addr.sin_port = htons(port_num);
      
         lfd = create_listening_socket(port_num);
      
         if (!do_test(lfd, &conn_addr, 0, 0))
             passed = 0;
         if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, 0))
             passed = 0;
         if (!do_test(lfd, &conn_addr, 0, SOCK_NONBLOCK))
             passed = 0;
         if (!do_test(lfd, &conn_addr, SOCK_CLOEXEC, SOCK_NONBLOCK))
             passed = 0;
      
         close(lfd);
      
         exit(passed ? EXIT_SUCCESS : EXIT_FAILURE);
      }
      
      [mtk.manpages@gmail.com: rewrote changelog, updated test program]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Tested-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Cc: <linux-api@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de11defe
  10. 19 11月, 2008 2 次提交
  11. 18 11月, 2008 1 次提交
    • P
      x86: more general identifier for Phoenix BIOS · 0af40a4b
      Philipp Kohlbecher 提交于
      Impact: widen the reach of the low-memory-protect DMI quirk
      
      Phoenix BIOSes variously identify their vendor as "Phoenix Technologies,
      LTD" or "Phoenix Technologies LTD" (without the comma.)
      
      This patch makes the identification string in the bad_bios_dmi_table
      more general (following a suggestion by Ingo Molnar), so that both
      versions are handled.
      
      Again, the patched file compiles cleanly and the patch has been tested
      successfully on my machine.
      Signed-off-by: NPhilipp Kohlbecher <xt28@gmx.de>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0af40a4b