1. 16 11月, 2008 1 次提交
  2. 13 11月, 2008 1 次提交
  3. 12 11月, 2008 5 次提交
    • I
      tracing: branch tracer, fix vdso crash · 2b7d0390
      Ingo Molnar 提交于
      Impact: fix bootup crash
      
      the branch tracer missed arch/x86/vdso/vclock_gettime.c from
      disabling tracing, which caused such bootup crashes:
      
        [  201.840097] init[1]: segfault at 7fffed3fe7c0 ip 00007fffed3fea2e sp 000077
      
      also clean up the ugly ifdefs in arch/x86/kernel/vsyscall_64.c by
      creating DISABLE_UNLIKELY_PROFILE facility for code to turn off
      instrumentation on a per file basis.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2b7d0390
    • S
      tracing: profile likely and unlikely annotations · 1f0d69a9
      Steven Rostedt 提交于
      Impact: new unlikely/likely profiler
      
      Andrew Morton recently suggested having an in-kernel way to profile
      likely and unlikely macros. This patch achieves that goal.
      
      When configured, every(*) likely and unlikely macro gets a counter attached
      to it. When the condition is hit, the hit and misses of that condition
      are recorded. These numbers can later be retrieved by:
      
        /debugfs/tracing/profile_likely    - All likely markers
        /debugfs/tracing/profile_unlikely  - All unlikely markers.
      
      # cat /debug/tracing/profile_unlikely | head
       correct incorrect  %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
          2167        0   0 do_arch_prctl                  process_64.c         832
             0        0   0 do_arch_prctl                  process_64.c         804
          2670        0   0 IS_ERR                         err.h                34
         71230     5693   7 __switch_to                    process_64.c         673
         76919        0   0 __switch_to                    process_64.c         639
         43184    33743  43 __switch_to                    process_64.c         624
         12740    64181  83 __switch_to                    process_64.c         594
         12740    64174  83 __switch_to                    process_64.c         590
      
      # cat /debug/tracing/profile_unlikely | \
        awk '{ if ($3 > 25) print $0; }' |head -20
         44963    35259  43 __switch_to                    process_64.c         624
         12762    67454  84 __switch_to                    process_64.c         594
         12762    67447  84 __switch_to                    process_64.c         590
          1478      595  28 syscall_get_error              syscall.h            51
             0     2821 100 syscall_trace_leave            ptrace.c             1567
             0        1 100 native_smp_prepare_cpus        smpboot.c            1237
         86338   265881  75 calc_delta_fair                sched_fair.c         408
        210410   108540  34 calc_delta_mine                sched.c              1267
             0    54550 100 sched_info_queued              sched_stats.h        222
         51899    66435  56 pick_next_task_fair            sched_fair.c         1422
             6       10  62 yield_task_fair                sched_fair.c         982
          7325     2692  26 rt_policy                      sched.c              144
             0     1270 100 pre_schedule_rt                sched_rt.c           1261
          1268    48073  97 pick_next_task_rt              sched_rt.c           884
             0    45181 100 sched_info_dequeued            sched_stats.h        177
             0       15 100 sched_move_task                sched.c              8700
             0       15 100 sched_move_task                sched.c              8690
         53167    33217  38 schedule                       sched.c              4457
             0    80208 100 sched_info_switch              sched_stats.h        270
         30585    49631  61 context_switch                 sched.c              2619
      
      # cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }'
         39900    36577  47 pick_next_task                 sched.c              4397
         20824    15233  42 switch_mm                      mmu_context_64.h     18
             0        7 100 __cancel_work_timer            workqueue.c          560
           617    66484  99 clocksource_adjust             timekeeping.c        456
             0   346340 100 audit_syscall_exit             auditsc.c            1570
            38   347350  99 audit_get_context              auditsc.c            732
             0   345244 100 audit_syscall_entry            auditsc.c            1541
            38     1017  96 audit_free                     auditsc.c            1446
             0     1090 100 audit_alloc                    auditsc.c            862
          2618     1090  29 audit_alloc                    auditsc.c            858
             0        6 100 move_masked_irq                migration.c          9
             1      198  99 probe_sched_wakeup             trace_sched_switch.c 58
             2        2  50 probe_wakeup                   trace_sched_wakeup.c 227
             0        2 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 144
          4514     2090  31 __grab_cache_page              filemap.c            2149
         12882   228786  94 mapping_unevictable            pagemap.h            50
             4       11  73 __flush_cpu_slab               slub.c               1466
        627757   330451  34 slab_free                      slub.c               1731
          2959    61245  95 dentry_lru_del_init            dcache.c             153
           946     1217  56 load_elf_binary                binfmt_elf.c         904
           102       82  44 disk_put_part                  genhd.h              206
             1        1  50 dst_gc_task                    dst.c                82
             0       19 100 tcp_mss_split_point            tcp_output.c         1126
      
      As you can see by the above, there's a bit of work to do in rethinking
      the use of some unlikelys and likelys. Note: the unlikely case had 71 hits
      that were more than 25%.
      
      Note:  After submitting my first version of this patch, Andrew Morton
        showed me a version written by Daniel Walker, where I picked up
        the following ideas from:
      
        1)  Using __builtin_constant_p to avoid profiling fixed values.
        2)  Using __FILE__ instead of instruction pointers.
        3)  Using the preprocessor to stop all profiling of likely
             annotations from vsyscall_64.c.
      
      Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar
      for their feed back on this patch.
      
      (*) Not ever unlikely is recorded, those that are used by vsyscalls
       (a few of them) had to have profiling disabled.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f0d69a9
    • F
      tracing/fastboot: move boot tracer structs and funcs into their own header. · 3f5ec136
      Frederic Weisbecker 提交于
      Impact: Cleanups on the boot tracer and ftrace
      
      This patch bring some cleanups about the boot tracer headers. The
      functions and structures of this tracer have nothing related to ftrace
      and should have so their own header file.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3f5ec136
    • S
      ring-buffer: buffer record on/off switch · a3583244
      Steven Rostedt 提交于
      Impact: enable/disable ring buffer recording API added
      
      Several kernel developers have requested that there be a way to stop
      recording into the ring buffers with a simple switch that can also
      be enabled from userspace. This patch addes a new kernel API to the
      ring buffers called:
      
       tracing_on()
       tracing_off()
      
      When tracing_off() is called, all ring buffers will not be able to record
      into their buffers.
      
      tracing_on() will enable the ring buffers again.
      
      These two act like an on/off switch. That is, there is no counting of the
      number of times tracing_off or tracing_on has been called.
      
      A new file is added to the debugfs/tracing directory called
      
        tracing_on
      
      This allows for userspace applications to also flip the switch.
      
        echo 0 > debugfs/tracing/tracing_on
      
      disables the tracing.
      
        echo 1 > /debugfs/tracing/tracing_on
      
      enables it.
      
      Note, this does not disable or enable any tracers. It only sets or clears
      a flag that needs to be set in order for the ring buffers to write to
      their buffers. It is a global flag, and affects all ring buffers.
      
      The buffers start out with tracing_on enabled.
      
      There are now three flags that control recording into the buffers:
      
       tracing_on: which affects all ring buffer tracers.
      
       buffer->record_disabled: which affects an allocated buffer, which may be set
           if an anomaly is detected, and tracing is disabled.
      
       cpu_buffer->record_disabled: which is set by tracing_stop() or if an
           anomaly is detected. tracing_start can not reenable this if
           an anomaly occurred.
      
      The userspace debugfs/tracing/tracing_enabled is implemented with
      tracing_stop() but the user space code can not enable it if the kernel
      called tracing_stop().
      
      Userspace can enable the tracing_on even if the kernel disabled it.
      It is just a switch used to stop tracing if a condition was hit.
      tracing_on is not for protecting critical areas in the kernel nor is
      it for stopping tracing if an anomaly occurred. This is because userspace
      can reenable it at any time.
      
      Side effect: With this patch, I discovered a dead variable in ftrace.c
        called tracing_on. This patch removes it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      a3583244
    • A
      telephony: trivial: fix up email address · 0906dd9d
      Alan Cox 提交于
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0906dd9d
  4. 11 11月, 2008 4 次提交
    • F
      tracing, x86: add low level support for ftrace return tracing · caf4b323
      Frederic Weisbecker 提交于
      Impact: add infrastructure for function-return tracing
      
      Add low level support for ftrace return tracing.
      
      This plug-in stores return addresses on the thread_info structure of
      the current task.
      
      The index of the current return address is initialized when the task
      is the first one (init) and when a process forks (the child). It is
      not needed when a task does a sys_execve because after this syscall,
      it still needs to return on the kernel functions it called.
      
      Note that the code of return_to_handler has been suggested by Steven
      Rostedt as almost all of the ideas of improvements in this V3.
      
      For purpose of security, arch/x86/kernel/process_32.c is not traced
      because __switch_to() changes the current task during its execution.
      That could cause inconsistency in the stored return address of this
      function even if I didn't have any crash after testing with tracing on
      this function enabled.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      caf4b323
    • O
      fix for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock · ad474cac
      Oleg Nesterov 提交于
      Impact: fix hang/crash on ia64 under high load
      
      This is ugly, but the simplest patch by far.
      
      Unlike other similar routines, account_group_exec_runtime() could be
      called "implicitly" from within scheduler after exit_notify(). This
      means we can race with the parent doing release_task(), we can't just
      check ->signal != NULL.
      
      Change __exit_signal() to do spin_unlock_wait(&task_rq(tsk)->lock)
      before __cleanup_signal() to make sure ->signal can't be freed under
      task_rq(tsk)->lock. Note that task_rq_unlock_wait() doesn't care
      about the case when tsk changes cpu/rq under us, this should be OK.
      
      Thanks to Ingo who nacked my previous buggy patch.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Reported-by: NDoug Chapman <doug.chapman@hp.com>
      ad474cac
    • M
      ssb: Fix DMA-API compilation for non-PCI systems · fd0fcf5c
      Michael Buesch 提交于
      This fixes compilation of the SSB DMA-API code on non-PCI platforms.
      Signed-off-by: NMichael Buesch <mb@bu3sch.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd0fcf5c
    • T
      libata: revert convert-to-block-tagging patches · 8a8bc223
      Tejun Heo 提交于
      This patch reverts the following three commits which convert libata to
      use block layer tagging.
      
       43a49cbd
       e013e13b
       2fca5ccf
      
      Although using block layer tagging is the right direction, due to the
      tight coupling among tag number, data structure allocation and
      hardware command slot allocation, libata doesn't work correctly with
      the current conversion.
      
      The biggest problem is guaranteeing that tag 0 is always used for
      non-NCQ commands.  Due to the way blk-tag is implemented and how SCSI
      starts and finishes requests, such guarantee can't be made.  I'm not
      sure whether this would actually break any low level driver but it
      doesn't look like a good idea to break such assumption given the
      frailty of ATA controllers.
      
      So, for the time being, keep using the old dumb in-libata qc
      allocation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jens Axobe <jens.axboe@oracle.com>
      Cc: Jeff Garzik <jeff@garzik.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a8bc223
  5. 10 11月, 2008 2 次提交
  6. 09 11月, 2008 1 次提交
  7. 07 11月, 2008 9 次提交
  8. 06 11月, 2008 6 次提交
    • R
      cpumask: introduce new API, without changing anything · 2d3854a3
      Rusty Russell 提交于
      Impact: introduce new APIs
      
      We want to deprecate cpumasks on the stack, as we are headed for
      gynormous numbers of CPUs.  Eventually, we want to head towards an
      undefined 'struct cpumask' so they can never be declared on stack.
      
      1) New cpumask functions which take pointers instead of copies.
         (cpus_* -> cpumask_*)
      
      2) Several new helpers to reduce requirements for temporary cpumasks
         (cpumask_first_and, cpumask_next_and, cpumask_any_and)
      
      3) Helpers for declaring cpumasks on or offstack for large NR_CPUS
         (cpumask_var_t, alloc_cpumask_var and free_cpumask_var)
      
      4) 'struct cpumask' for explicitness and to mark new-style code.
      
      5) Make iterator functions stop at nr_cpu_ids (a runtime constant),
         not NR_CPUS for time efficiency and for smaller dynamic allocations
         in future.
      
      6) cpumask_copy() so we can allocate less than a full cpumask eventually
         (for alloc_cpumask_var), and so we can eliminate the 'struct cpumask'
         definition eventually.
      
      7) work_on_cpu() helper for doing task on a CPU, rather than saving old
         cpumask for current thread and manipulating it.
      
      8) smp_call_function_many() which is smp_call_function_mask() except
         taking a cpumask pointer.
      
      Note that this patch simply introduces the new functions and leaves
      the obsolescent ones in place.  This is to simplify the transition
      patches.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2d3854a3
    • A
      Add round_jiffies_up and related routines · 9c133c46
      Alan Stern 提交于
      This patch (as1158b) adds round_jiffies_up() and friends.  These
      routines work like the analogous round_jiffies() functions, except
      that they will never round down.
      
      The new routines will be useful for timeouts where we don't care
      exactly when the timer expires, provided it doesn't expire too soon.
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      9c133c46
    • J
      bio: define __BIOVEC_PHYS_MERGEABLE · f92131c3
      Jeremy Fitzhardinge 提交于
      Define __BIOVEC_PHYS_MERGEABLE as the default implementation of
      BIOVEC_PHYS_MERGEABLE, so that its available for reuse within an
      arch-specific definition of BIOVEC_PHYS_MERGEABLE.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      f92131c3
    • S
      ftrace: soft tracing stop and start · 0f048701
      Steven Rostedt 提交于
      Impact: add way to quickly start stop tracing from the kernel
      
      This patch adds a soft stop and start to the trace. This simply
      disables function tracing via the ftrace_disabled flag, and
      disables the trace buffers to prevent recording. The tracing
      code may still be executed, but the trace will not be recorded.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0f048701
    • S
      ftrace: add quick function trace stop · 60a7ecf4
      Steven Rostedt 提交于
      Impact: quick start and stop of function tracer
      
      This patch adds a way to disable the function tracer quickly without
      the need to run kstop_machine. It adds a new variable called
      function_trace_stop which will stop the calls to functions from mcount
      when set.  This is just an on/off switch and does not handle recursion
      like preempt_disable().
      
      It's main purpose is to help other tracers/debuggers start and stop tracing
      fuctions without the need to call kstop_machine.
      
      The config option HAVE_FUNCTION_TRACE_MCOUNT_TEST is added for archs
      that implement the testing of the function_trace_stop in the mcount
      arch dependent code. Otherwise, the test is done in the C code.
      
      x86 is the only arch at the moment that supports this.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60a7ecf4
    • I
      sched: re-tune balancing · 9fcd18c9
      Ingo Molnar 提交于
      Impact: improve wakeup affinity on NUMA systems, tweak SMP systems
      
      Given the fixes+tweaks to the wakeup-buddy code, re-tweak the domain
      balancing defaults on NUMA and SMP systems.
      
      Turn on SD_WAKE_AFFINE which was off on x86 NUMA - there's no reason
      why we would not want to have wakeup affinity across nodes as well.
      (we already do this in the standard NUMA template.)
      
      lat_ctx on a NUMA box is particularly happy about this change:
      
      before:
      
       |   phoenix:~/l> ./lat_ctx -s 0 2
       |   "size=0k ovr=2.60
       |   2 5.70
      
      after:
      
       |   phoenix:~/l> ./lat_ctx -s 0 2
       |   "size=0k ovr=2.65
       |   2 2.07
      
      a 2.75x speedup.
      
      pipe-test is similarly happy about it too:
      
       |  phoenix:~/sched-tests> ./pipe-test
       |   18.26 usecs/loop.
       |   14.70 usecs/loop.
       |   14.38 usecs/loop.
       |   10.55 usecs/loop.              # +WAKE_AFFINE on domain0+domain1
       |   8.63 usecs/loop.
       |   8.59 usecs/loop.
       |   9.03 usecs/loop.
       |   8.94 usecs/loop.
       |   8.96 usecs/loop.
       |   8.63 usecs/loop.
      
      Also:
      
       - disable SD_BALANCE_NEWIDLE on NUMA and SMP domains (keep it for siblings)
       - enable SD_WAKE_BALANCE on SMP domains
      
      Sysbench+postgresql improves all around the board, quite significantly:
      
                 .28-rc3-11474e2c  .28-rc3-11474e2c-tune
      -------------------------------------------------
          1:             571              688    +17.08%
          2:            1236             1206    -2.55%
          4:            2381             2642    +9.89%
          8:            4958             5164    +3.99%
         16:            9580             9574    -0.07%
         32:            7128             8118    +12.20%
         64:            7342             8266    +11.18%
        128:            7342             8064    +8.95%
        256:            7519             7884    +4.62%
        512:            7350             7731    +4.93%
      -------------------------------------------------
        SUM:           55412            59341    +6.62%
      
      So it's a win both for the runup portion, the peak area and the tail.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9fcd18c9
  9. 05 11月, 2008 3 次提交
    • E
      [MTD] [NOR] Fix cfi_send_gen_cmd handling of x16 devices in x8 mode (v4) · 467622ef
      Eric W. Biederman 提交于
      For "unlock" cycles to 16bit devices in 8bit compatibility mode we need
      to use the byte addresses 0xaaa and 0x555. These effectively match
      the word address 0x555 and 0x2aa, except the latter has its low bit set.
      
      Most chips don't care about the value of the 'A-1' pin in x8 mode,
      but some -- like the ST M29W320D -- do. So we need to be careful to
      set it where appropriate.
      
      cfi_send_gen_cmd is only ever passed addresses where the low byte
      is 0x00, 0x55 or 0xaa. Of those, only addresses ending 0xaa are
      affected by this patch, by masking in the extra low bit when the device
      is known to be in compatibility mode.
      
      [dwmw2: Do it only when (cmd_ofs & 0xff) == 0xaa]
      v4: Fix  stupid typo in cfi_build_cmd_addr that failed to compile
          I'm writing this patch way to late at night.
      v3: Bring all of the work back into cfi_build_cmd_addr
          including calling of map_bankwidth(map) and cfi_interleave(cfi)
          So every caller doesn't need to.
      v2: Only modified the address if we our device_type is larger than our
          bus width.
      
      Cc: stable@kernel.org
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      467622ef
    • P
      net: fix packet socket delivery in rx irq handler · 9b22ea56
      Patrick McHardy 提交于
      The changes to deliver hardware accelerated VLAN packets to packet
      sockets (commit bc1d0411) caused a warning for non-NAPI drivers.
      The __vlan_hwaccel_rx() function is called directly from the drivers
      RX function, for non-NAPI drivers that means its still in RX IRQ
      context:
      
      [   27.779463] ------------[ cut here ]------------
      [   27.779509] WARNING: at kernel/softirq.c:136 local_bh_enable+0x37/0x81()
      ...
      [   27.782520]  [<c0264755>] netif_nit_deliver+0x5b/0x75
      [   27.782590]  [<c02bba83>] __vlan_hwaccel_rx+0x79/0x162
      [   27.782664]  [<f8851c1d>] atl1_intr+0x9a9/0xa7c [atl1]
      [   27.782738]  [<c0155b17>] handle_IRQ_event+0x23/0x51
      [   27.782808]  [<c015692e>] handle_edge_irq+0xc2/0x102
      [   27.782878]  [<c0105fd5>] do_IRQ+0x4d/0x64
      
      Split hardware accelerated VLAN reception into two parts to fix this:
      
      - __vlan_hwaccel_rx just stores the VLAN TCI and performs the VLAN
        device lookup, then calls netif_receive_skb()/netif_rx()
      
      - vlan_hwaccel_do_receive(), which is invoked by netif_receive_skb()
        in softirq context, performs the real reception and delivery to
        packet sockets.
      Reported-and-tested-by: NRamon Casellas <ramon.casellas@cttc.es>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b22ea56
    • F
      tracing/fastboot: Enable boot tracing only during initcalls · 71566a0d
      Frederic Weisbecker 提交于
      Impact: modify boot tracer
      
      We used to disable the initcall tracing at a specified time (IE: end
      of builtin initcalls). But we don't need it anymore. It will be
      stopped when initcalls are finished.
      
      However we want two things:
      
      _Start this tracing only after pre-smp initcalls are finished.
      
      _Since we are planning to trace sched_switches at the same time, we
      want to enable them only during the initcall execution.
      
      For this purpose, this patch introduce two functions to enable/disable
      the sched_switch tracing during boot.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      71566a0d
  10. 04 11月, 2008 3 次提交
  11. 03 11月, 2008 3 次提交
  12. 31 10月, 2008 2 次提交