1. 12 11月, 2008 2 次提交
    • I
      tracing: branch tracer, fix vdso crash · 2b7d0390
      Ingo Molnar 提交于
      Impact: fix bootup crash
      
      the branch tracer missed arch/x86/vdso/vclock_gettime.c from
      disabling tracing, which caused such bootup crashes:
      
        [  201.840097] init[1]: segfault at 7fffed3fe7c0 ip 00007fffed3fea2e sp 000077
      
      also clean up the ugly ifdefs in arch/x86/kernel/vsyscall_64.c by
      creating DISABLE_UNLIKELY_PROFILE facility for code to turn off
      instrumentation on a per file basis.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2b7d0390
    • S
      tracing: profile likely and unlikely annotations · 1f0d69a9
      Steven Rostedt 提交于
      Impact: new unlikely/likely profiler
      
      Andrew Morton recently suggested having an in-kernel way to profile
      likely and unlikely macros. This patch achieves that goal.
      
      When configured, every(*) likely and unlikely macro gets a counter attached
      to it. When the condition is hit, the hit and misses of that condition
      are recorded. These numbers can later be retrieved by:
      
        /debugfs/tracing/profile_likely    - All likely markers
        /debugfs/tracing/profile_unlikely  - All unlikely markers.
      
      # cat /debug/tracing/profile_unlikely | head
       correct incorrect  %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
          2167        0   0 do_arch_prctl                  process_64.c         832
             0        0   0 do_arch_prctl                  process_64.c         804
          2670        0   0 IS_ERR                         err.h                34
         71230     5693   7 __switch_to                    process_64.c         673
         76919        0   0 __switch_to                    process_64.c         639
         43184    33743  43 __switch_to                    process_64.c         624
         12740    64181  83 __switch_to                    process_64.c         594
         12740    64174  83 __switch_to                    process_64.c         590
      
      # cat /debug/tracing/profile_unlikely | \
        awk '{ if ($3 > 25) print $0; }' |head -20
         44963    35259  43 __switch_to                    process_64.c         624
         12762    67454  84 __switch_to                    process_64.c         594
         12762    67447  84 __switch_to                    process_64.c         590
          1478      595  28 syscall_get_error              syscall.h            51
             0     2821 100 syscall_trace_leave            ptrace.c             1567
             0        1 100 native_smp_prepare_cpus        smpboot.c            1237
         86338   265881  75 calc_delta_fair                sched_fair.c         408
        210410   108540  34 calc_delta_mine                sched.c              1267
             0    54550 100 sched_info_queued              sched_stats.h        222
         51899    66435  56 pick_next_task_fair            sched_fair.c         1422
             6       10  62 yield_task_fair                sched_fair.c         982
          7325     2692  26 rt_policy                      sched.c              144
             0     1270 100 pre_schedule_rt                sched_rt.c           1261
          1268    48073  97 pick_next_task_rt              sched_rt.c           884
             0    45181 100 sched_info_dequeued            sched_stats.h        177
             0       15 100 sched_move_task                sched.c              8700
             0       15 100 sched_move_task                sched.c              8690
         53167    33217  38 schedule                       sched.c              4457
             0    80208 100 sched_info_switch              sched_stats.h        270
         30585    49631  61 context_switch                 sched.c              2619
      
      # cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }'
         39900    36577  47 pick_next_task                 sched.c              4397
         20824    15233  42 switch_mm                      mmu_context_64.h     18
             0        7 100 __cancel_work_timer            workqueue.c          560
           617    66484  99 clocksource_adjust             timekeeping.c        456
             0   346340 100 audit_syscall_exit             auditsc.c            1570
            38   347350  99 audit_get_context              auditsc.c            732
             0   345244 100 audit_syscall_entry            auditsc.c            1541
            38     1017  96 audit_free                     auditsc.c            1446
             0     1090 100 audit_alloc                    auditsc.c            862
          2618     1090  29 audit_alloc                    auditsc.c            858
             0        6 100 move_masked_irq                migration.c          9
             1      198  99 probe_sched_wakeup             trace_sched_switch.c 58
             2        2  50 probe_wakeup                   trace_sched_wakeup.c 227
             0        2 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 144
          4514     2090  31 __grab_cache_page              filemap.c            2149
         12882   228786  94 mapping_unevictable            pagemap.h            50
             4       11  73 __flush_cpu_slab               slub.c               1466
        627757   330451  34 slab_free                      slub.c               1731
          2959    61245  95 dentry_lru_del_init            dcache.c             153
           946     1217  56 load_elf_binary                binfmt_elf.c         904
           102       82  44 disk_put_part                  genhd.h              206
             1        1  50 dst_gc_task                    dst.c                82
             0       19 100 tcp_mss_split_point            tcp_output.c         1126
      
      As you can see by the above, there's a bit of work to do in rethinking
      the use of some unlikelys and likelys. Note: the unlikely case had 71 hits
      that were more than 25%.
      
      Note:  After submitting my first version of this patch, Andrew Morton
        showed me a version written by Daniel Walker, where I picked up
        the following ideas from:
      
        1)  Using __builtin_constant_p to avoid profiling fixed values.
        2)  Using __FILE__ instead of instruction pointers.
        3)  Using the preprocessor to stop all profiling of likely
             annotations from vsyscall_64.c.
      
      Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar
      for their feed back on this patch.
      
      (*) Not ever unlikely is recorded, those that are used by vsyscalls
       (a few of them) had to have profiling disabled.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f0d69a9
  2. 11 11月, 2008 12 次提交
  3. 09 11月, 2008 5 次提交
    • K
      powerpc: Updated Freescale PPC related defconfigs · ea37194d
      Kumar Gala 提交于
      unset CONFIG_PCI_LEGACY in the defconfigs as none of them enable
      ISDN drivers which seem to be the only place we are using pci_find_device
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      ea37194d
    • L
      powerpc: Update QE/CPM2 usb_ctlr structures for USB support · 2b487065
      Li Yang 提交于
      Fixes following build error:
      
        CC      drivers/usb/gadget/fsl_qe_udc.o
      drivers/usb/gadget/fsl_qe_udc.c: In function 'qe_eprx_stall_change':
      drivers/usb/gadget/fsl_qe_udc.c:156: error: 'struct usb_ctlr' has no member named 'usb_usep'
      drivers/usb/gadget/fsl_qe_udc.c:163: error: 'struct usb_ctlr' has no member named 'usb_usep'
      drivers/usb/gadget/fsl_qe_udc.c: In function 'qe_eptx_stall_change':
      drivers/usb/gadget/fsl_qe_udc.c:173: error: 'struct usb_ctlr' has no member named 'usb_usep'
      drivers/usb/gadget/fsl_qe_udc.c:180: error: 'struct usb_ctlr' has no member named 'usb_usep'
      drivers/usb/gadget/fsl_qe_udc.c: In function 'qe_eprx_nack':
      drivers/usb/gadget/fsl_qe_udc.c:201: error: 'struct usb_ctlr' has no member named 'usb_usep'
      drivers/usb/gadget/fsl_qe_udc.c:201: error: 'struct usb_ctlr' has no member named 'usb_usep'
      drivers/usb/gadget/fsl_qe_udc.c: In function 'qe_eprx_normal':
      drivers/usb/gadget/fsl_qe_udc.c:218: error: 'struct usb_ctlr' has no member named 'usb_usep'
      drivers/usb/gadget/fsl_qe_udc.c:218: error: 'struct usb_ctlr' has no member named 'usb_usep'
      drivers/usb/gadget/fsl_qe_udc.c: In function 'qe_ep_reset':
      drivers/usb/gadget/fsl_qe_udc.c:325: error: 'struct usb_ctlr' has no member named 'usb_usep'
      drivers/usb/gadget/fsl_qe_udc.c:342: error: 'struct usb_ctlr' has no member named 'usb_usep'
      drivers/usb/gadget/fsl_qe_udc.c: In function 'qe_ep_register_init':
      drivers/usb/gadget/fsl_qe_udc.c:515: error: 'struct usb_ctlr' has no member named 'usb_usep'
      drivers/usb/gadget/fsl_qe_udc.c: In function 'ch9getstatus':
      drivers/usb/gadget/fsl_qe_udc.c:1981: error: 'struct usb_ctlr' has no member named 'usb_usep'
      make[2]: *** [drivers/usb/gadget/fsl_qe_udc.o] Error 1
      Signed-off-by: NLi Yang <leoli@freescale.com>
      Signed-off-by: NAnton Vorontsov <avorontsov@ru.mvista.com>
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      2b487065
    • M
      powerpc/86xx: Correct SOC bus-frequency in GE Fanuc SBC610 DTS · 33d2d78b
      Martyn Welch 提交于
      This patch corrects the bus-frequency value provided in the SBC610's dts.
      Signed-off-by: NMartyn Welch <martyn.welch@gefanuc.com>
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      33d2d78b
    • K
      powerpc/fsl-booke: Fix synchronization bug w/local tlb invalidates · b41d6fee
      Kumar Gala 提交于
      The implemetation of _tlbil_pid() on Freescale Book-E cores needs
      an msync & isync after we flash invalidate the TLBs.  This was causing
      the following oops reported by Sebastian Andrzej Siewior:
      
        VFS: Mounted root (nfs filesystem) readonly.
        Freeing unused kernel memory: 148k init
        BUG: sleeping function called from invalid context at /home/bigeasy/git/linux-2.6-powerpc/mm/mmap.c:234
        in_atomic():1, irqs_disabled():0
        Call Trace:
        [df189df0] [c0007160] show_stack+0x48/0x148 (unreliable)
        [df189e30] [c0029480] __might_sleep+0xf0/0x100
        [df189e40] [c0070ac0] remove_vma+0x28/0x98
        [df189e50] [c0070c1c] exit_mmap+0xec/0x128
        [df189e80] [c002d2f4] mmput+0x54/0xec
        [df189ea0] [c0030b6c] exit_mm+0x10c/0x120
        [df189ed0] [c003288c] do_exit+0x1ac/0x6e8
        [df189f20] [c0032e48] do_group_exit+0x80/0xac
        [df189f40] [c000e9dc] ret_from_syscall+0x0/0x3c
        BUG: scheduling while atomic: udevd/956/0x10000002
        Modules linked in:
        Call Trace:
        [df189df0] [c0007160] show_stack+0x48/0x148 (unreliable)
        [df189e30] [c002ac88] __schedule_bug+0x58/0x6c
        [df189e40] [c023e6cc] schedule+0xa8/0x4a8
        [df189e90] [c002ad6c] __cond_resched+0x38/0x64
        [df189ea0] [c023ebc8] _cond_resched+0x3c/0x58
        [df189eb0] [c0030e70] put_files_struct+0x90/0xec
        [df189ed0] [c00328a8] do_exit+0x1c8/0x6e8
        [df189f20] [c0032e48] do_group_exit+0x80/0xac
        [df189f40] [c000e9dc] ret_from_syscall+0x0/0x3c
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      b41d6fee
    • I
      sched: optimize sched_clock() a bit · 7cbaef9c
      Ingo Molnar 提交于
      sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
      variant of __cycles_2_ns().
      
      Most of the time sched_clock() is called with irqs disabled already.
      The few places that call it with irqs enabled need to be updated.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7cbaef9c
  4. 08 11月, 2008 3 次提交
  5. 07 11月, 2008 7 次提交
  6. 06 11月, 2008 10 次提交
    • E
      Revert "x86: default to reboot via ACPI" · 8d00450d
      Eduardo Habkost 提交于
      This reverts commit c7ffa6c2.
      
      the assumptio of this change was that this would not break
      any existing machine. Andrey Borzenkov reported troubles with
      the ACPI reboot method: the system would hang on reboot, necessiating
      a power cycle. Probably more systems are affected as well.
      
      Also, there are patches queued up for v2.6.29 to disable virtualization
      on emergency_restart() - which was the original motivation of
      this change.
      Reported-by: NAndrey Borzenkov <arvidjaar@mail.ru>
      Bisected-by: NAndrey Borzenkov <arvidjaar@mail.ru>
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      Acked-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8d00450d
    • H
      x86: align DirectMap in /proc/meminfo · b9c3bfc2
      Hugh Dickins 提交于
      Impact: right-align /proc/meminfo consistent with other fields
      
      When the split-LRU patches added Inactive(anon) and Inactive(file) lines
      to /proc/meminfo, all counts were moved two columns rightwards to fit in.
      Now move x86's DirectMap lines two columns rightwards to line up.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b9c3bfc2
    • J
      AMD IOMMU: fix lazy IO/TLB flushing in unmap path · 80be308d
      Joerg Roedel 提交于
      Lazy flushing needs to take care of the unmap path too which is not yet
      implemented and leads to stale IO/TLB entries. This is fixed by this
      patch.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      80be308d
    • S
      x86: add smp_mb() before sending INVALIDATE_TLB_VECTOR · d6f0f39b
      Suresh Siddha 提交于
      Impact: fix rare x2apic hang
      
      On x86, x2apic mode accesses for sending IPI's don't have serializing
      semantics. If the IPI receivner refers(in lock-free fashion) to some
      memory setup by the sender, the need for smp_mb() before sending the
      IPI becomes critical in x2apic mode.
      
      Add the smp_mb() in native_flush_tlb_others() before sending the IPI.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d6f0f39b
    • Y
      x86: remove VISWS and PARAVIRT around NR_IRQS puzzle · 7db282fa
      Yinghai Lu 提交于
      Impact: fix warning message when PARAVIRT is set in config
      
      Remove stale #ifdef components from our IRQ sizing logic.
      x86/Voyager is the only holdout.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7db282fa
    • B
      x86: mention ACPI in top-level Kconfig menu · da85f865
      Bjorn Helgaas 提交于
      Impact: clarify menuconfig text
      
      Mention ACPI in the top-level menu to give a clue as to where
      it lives. This matches what ia64 does.
      Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      da85f865
    • S
      ftrace: add quick function trace stop · 60a7ecf4
      Steven Rostedt 提交于
      Impact: quick start and stop of function tracer
      
      This patch adds a way to disable the function tracer quickly without
      the need to run kstop_machine. It adds a new variable called
      function_trace_stop which will stop the calls to functions from mcount
      when set.  This is just an on/off switch and does not handle recursion
      like preempt_disable().
      
      It's main purpose is to help other tracers/debuggers start and stop tracing
      fuctions without the need to call kstop_machine.
      
      The config option HAVE_FUNCTION_TRACE_MCOUNT_TEST is added for archs
      that implement the testing of the function_trace_stop in the mcount
      arch dependent code. Otherwise, the test is done in the C code.
      
      x86 is the only arch at the moment that supports this.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60a7ecf4
    • Y
      x86: size NR_IRQS on 32-bit systems the same way as 64-bit · 1b489768
      Yinghai Lu 提交于
      Impact: make NR_IRQS big enough for system with lots of apic/pins
      
      If lots of IO_APIC's are there (or can be there), size the same way
      as 64-bit, depending on MAX_IO_APICS and NR_CPUS.
      
      This fixes the boot problem reported by Ben Hutchings on a 32-bit
      server with 5 IO-APICs and 240 IO-APIC pins.
      Signed-off-by: NYinghai <yinghai@kernel.org>
      Tested-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1b489768
    • B
      x86: don't allow nr_irqs > NR_IRQS · c78d0cf2
      Ben Hutchings 提交于
      Impact: fix boot hang on 32-bit systems with more than 224 IO-APIC pins
      
      On some 32-bit systems with a lot of IO-APICs probe_nr_irqs() can
      return a value larger than NR_IRQS. This will lead to probe_irq_on()
      overrunning the irq_desc array.
      
      I hit this when running net-next-2.6 (close to 2.6.28-rc3) on a
      Supermicro dual Xeon system.  NR_IRQS is 224 but probe_nr_irqs() detects
      5 IOAPICs and returns 240.  Here are the log messages:
      
      Tue Nov  4 16:53:47 2008 ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
      Tue Nov  4 16:53:47 2008 IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23
      Tue Nov  4 16:53:47 2008 ACPI: IOAPIC (id[0x02] address[0xfec81000] gsi_base[24])
      Tue Nov  4 16:53:47 2008 IOAPIC[1]: apic_id 2, version 32, address 0xfec81000, GSI 24-47
      Tue Nov  4 16:53:47 2008 ACPI: IOAPIC (id[0x03] address[0xfec81400] gsi_base[48])
      Tue Nov  4 16:53:47 2008 IOAPIC[2]: apic_id 3, version 32, address 0xfec81400, GSI 48-71
      Tue Nov  4 16:53:47 2008 ACPI: IOAPIC (id[0x04] address[0xfec82000] gsi_base[72])
      Tue Nov  4 16:53:47 2008 IOAPIC[3]: apic_id 4, version 32, address 0xfec82000, GSI 72-95
      Tue Nov  4 16:53:47 2008 ACPI: IOAPIC (id[0x05] address[0xfec82400] gsi_base[96])
      Tue Nov  4 16:53:47 2008 IOAPIC[4]: apic_id 5, version 32, address 0xfec82400, GSI 96-119
      Tue Nov  4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
      Tue Nov  4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
      Tue Nov  4 16:53:47 2008 Enabling APIC mode:  Flat.  Using 5 I/O APICs
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Acked-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c78d0cf2
    • I
      sched: re-tune balancing · 9fcd18c9
      Ingo Molnar 提交于
      Impact: improve wakeup affinity on NUMA systems, tweak SMP systems
      
      Given the fixes+tweaks to the wakeup-buddy code, re-tweak the domain
      balancing defaults on NUMA and SMP systems.
      
      Turn on SD_WAKE_AFFINE which was off on x86 NUMA - there's no reason
      why we would not want to have wakeup affinity across nodes as well.
      (we already do this in the standard NUMA template.)
      
      lat_ctx on a NUMA box is particularly happy about this change:
      
      before:
      
       |   phoenix:~/l> ./lat_ctx -s 0 2
       |   "size=0k ovr=2.60
       |   2 5.70
      
      after:
      
       |   phoenix:~/l> ./lat_ctx -s 0 2
       |   "size=0k ovr=2.65
       |   2 2.07
      
      a 2.75x speedup.
      
      pipe-test is similarly happy about it too:
      
       |  phoenix:~/sched-tests> ./pipe-test
       |   18.26 usecs/loop.
       |   14.70 usecs/loop.
       |   14.38 usecs/loop.
       |   10.55 usecs/loop.              # +WAKE_AFFINE on domain0+domain1
       |   8.63 usecs/loop.
       |   8.59 usecs/loop.
       |   9.03 usecs/loop.
       |   8.94 usecs/loop.
       |   8.96 usecs/loop.
       |   8.63 usecs/loop.
      
      Also:
      
       - disable SD_BALANCE_NEWIDLE on NUMA and SMP domains (keep it for siblings)
       - enable SD_WAKE_BALANCE on SMP domains
      
      Sysbench+postgresql improves all around the board, quite significantly:
      
                 .28-rc3-11474e2c  .28-rc3-11474e2c-tune
      -------------------------------------------------
          1:             571              688    +17.08%
          2:            1236             1206    -2.55%
          4:            2381             2642    +9.89%
          8:            4958             5164    +3.99%
         16:            9580             9574    -0.07%
         32:            7128             8118    +12.20%
         64:            7342             8266    +11.18%
        128:            7342             8064    +8.95%
        256:            7519             7884    +4.62%
        512:            7350             7731    +4.93%
      -------------------------------------------------
        SUM:           55412            59341    +6.62%
      
      So it's a win both for the runup portion, the peak area and the tail.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9fcd18c9
  7. 05 11月, 2008 1 次提交