1. 21 11月, 2008 1 次提交
    • M
      x86: Fix interrupt leak due to migration · 0ca4b6b0
      Matthew Wilcox 提交于
      When we migrate an interrupt from one CPU to another, we set the
      move_in_progress flag and clean up the vectors later once they're not
      being used.  If you're unlucky and call destroy_irq() before the vectors
      become un-used, the move_in_progress flag is never cleared, which causes
      the interrupt to become unusable.
      
      This was discovered by Jesse Brandeburg for whom it manifested as an
      MSI-X device refusing to use MSI-X mode when the driver was unloaded
      and reloaded repeatedly.
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ca4b6b0
  2. 19 11月, 2008 1 次提交
  3. 18 11月, 2008 8 次提交
    • P
      x86: more general identifier for Phoenix BIOS · 0af40a4b
      Philipp Kohlbecher 提交于
      Impact: widen the reach of the low-memory-protect DMI quirk
      
      Phoenix BIOSes variously identify their vendor as "Phoenix Technologies,
      LTD" or "Phoenix Technologies LTD" (without the comma.)
      
      This patch makes the identification string in the bad_bios_dmi_table
      more general (following a suggestion by Ingo Molnar), so that both
      versions are handled.
      
      Again, the patched file compiles cleanly and the patch has been tested
      successfully on my machine.
      Signed-off-by: NPhilipp Kohlbecher <xt28@gmx.de>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0af40a4b
    • J
      AMD IOMMU: check for next_bit also in unmapped area · 8501c45c
      Joerg Roedel 提交于
      Impact: fix possible use of stale IO/TLB entries
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      8501c45c
    • J
      AMD IOMMU: fix fullflush comparison length · 695b5676
      Joerg Roedel 提交于
      Impact: fix comparison length for 'fullflush'
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      695b5676
    • J
      AMD IOMMU: enable device isolation per default · 3ce1f93c
      Joerg Roedel 提交于
      Impact: makes device isolation the default for AMD IOMMU
      
      Some device drivers showed double-free bugs of DMA memory while testing
      them with AMD IOMMU. If all devices share the same protection domain
      this can lead to data corruption and data loss. Prevent this by putting
      each device into its own protection domain per default.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      3ce1f93c
    • J
      AMD IOMMU: add parameter to disable device isolation · e5e1f606
      Joerg Roedel 提交于
      Impact: add a new AMD IOMMU kernel command line parameter
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      e5e1f606
    • I
      x86, PEBS/DS: fix code flow in ds_request() · 10db4ef7
      Ingo Molnar 提交于
      this compiler warning:
      
        arch/x86/kernel/ds.c: In function 'ds_request':
        arch/x86/kernel/ds.c:368: warning: 'context' may be used uninitialized in this function
      
      Shows that the code flow in ds_request() is buggy - it goes into
      the unlock+release-context path even when the context is not allocated
      yet.
      
      First allocate the context, then do the other checks.
      
      Also, take care with GFP allocations under the ds_lock spinlock.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      10db4ef7
    • F
      tracing/function-return-tracer: add the overrun field · 0231022c
      Frederic Weisbecker 提交于
      Impact: help to find the better depth of trace
      
      We decided to arbitrary define the depth of function return trace as
      "20". Perhaps this is not enough. To help finding an optimal depth, we
      measure now the overrun: the number of functions that have been missed
      for the current thread. By default this is not displayed, we have to
      do set a particular flag on the return tracer: echo overrun >
      /debug/tracing/trace_options And the overrun will be printed on the
      right.
      
      As the trace shows below, the current 20 depth is not enough.
      
      update_wall_time+0x37f/0x8c0 -> update_xtime_cache (345 ns) (Overruns: 2838)
      update_wall_time+0x384/0x8c0 -> clocksource_get_next (1141 ns) (Overruns: 2838)
      do_timer+0x23/0x100 -> update_wall_time (3882 ns) (Overruns: 2838)
      tick_do_update_jiffies64+0xbf/0x160 -> do_timer (5339 ns) (Overruns: 2838)
      tick_sched_timer+0x6a/0xf0 -> tick_do_update_jiffies64 (7209 ns) (Overruns: 2838)
      vgacon_set_cursor_size+0x98/0x120 -> native_io_delay (2613 ns) (Overruns: 274)
      vgacon_cursor+0x16e/0x1d0 -> vgacon_set_cursor_size (33151 ns) (Overruns: 274)
      set_cursor+0x5f/0x80 -> vgacon_cursor (36432 ns) (Overruns: 274)
      con_flush_chars+0x34/0x40 -> set_cursor (38790 ns) (Overruns: 274)
      release_console_sem+0x1ec/0x230 -> up (721 ns) (Overruns: 274)
      release_console_sem+0x225/0x230 -> wake_up_klogd (316 ns) (Overruns: 274)
      con_flush_chars+0x39/0x40 -> release_console_sem (2996 ns) (Overruns: 274)
      con_write+0x22/0x30 -> con_flush_chars (46067 ns) (Overruns: 274)
      n_tty_write+0x1cc/0x360 -> con_write (292670 ns) (Overruns: 274)
      smp_apic_timer_interrupt+0x2a/0x90 -> native_apic_mem_write (330 ns) (Overruns: 274)
      irq_enter+0x17/0x70 -> idle_cpu (413 ns) (Overruns: 274)
      smp_apic_timer_interrupt+0x2f/0x90 -> irq_enter (1525 ns) (Overruns: 274)
      ktime_get_ts+0x40/0x70 -> getnstimeofday (465 ns) (Overruns: 274)
      ktime_get_ts+0x60/0x70 -> set_normalized_timespec (436 ns) (Overruns: 274)
      ktime_get+0x16/0x30 -> ktime_get_ts (2501 ns) (Overruns: 274)
      hrtimer_interrupt+0x77/0x1a0 -> ktime_get (3439 ns) (Overruns: 274)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0231022c
    • V
      x86: add rdtsc barrier to TSC sync check · 93ce99e8
      Venki Pallipadi 提交于
      Impact: fix incorrectly marked unstable TSC clock
      
      Patch (commit 0d12cdd5 "sched: improve sched_clock() performance") has
      a regression on one of the test systems here.
      
      With the patch, I see:
      
       checking TSC synchronization [CPU#0 -> CPU#1]:
       Measured 28 cycles TSC warp between CPUs, turning off TSC clock.
       Marking TSC unstable due to check_tsc_sync_source failed
      
      Whereas, without the patch syncs pass fine on all CPUs:
      
       checking TSC synchronization [CPU#0 -> CPU#1]: passed.
      
      Due to this, TSC is marked unstable, when it is not actually unstable.
      This is because syncs in check_tsc_wrap() goes away due to this commit.
      
      As per the discussion on this thread, correct way to fix this is to add
      explicit syncs as below?
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      93ce99e8
  4. 16 11月, 2008 6 次提交
    • Y
      x86: fix es7000 compiling · d3c6aa1e
      Yinghai Lu 提交于
      Impact: fix es7000 build
      
        CC      arch/x86/kernel/es7000_32.o
      arch/x86/kernel/es7000_32.c: In function find_unisys_acpi_oem_table:
      arch/x86/kernel/es7000_32.c:255: error: implicit declaration of function acpi_get_table_with_size
      arch/x86/kernel/es7000_32.c:261: error: implicit declaration of function early_acpi_os_unmap_memory
      arch/x86/kernel/es7000_32.c: In function unmap_unisys_acpi_oem_table:
      arch/x86/kernel/es7000_32.c:277: error: implicit declaration of function __acpi_unmap_table
      make[1]: *** [arch/x86/kernel/es7000_32.o] Error 1
      
      we applied one patch out of order...
      
      | commit a73aaedd
      | Author: Yinghai Lu <yhlu.kernel@gmail.com>
      | Date:   Sun Sep 14 02:33:14 2008 -0700
      |
      |    x86: check dsdt before find oem table for es7000, v2
      |
      |    v2: use __acpi_unmap_table()
      
      that patch need:
      
      	x86: use early_ioremap in __acpi_map_table
      	x86: always explicitly map acpi memory
      	acpi: remove final __acpi_map_table mapping before setting acpi_gbl_permanent_mmap
      	acpi/x86: introduce __apci_map_table, v4
      
      submitted to the ACPI tree but not upstream yet.
      
      fix it until those patches applied, need to revert this one
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d3c6aa1e
    • M
      x86, bts: fix unlock problem in ds.c · d1f1e9c0
      Markus Metzger 提交于
      Fix a problem where ds_request() returned an error without releasing the
      ds lock.
      Reported-by: NStephane Eranian <eranian@gmail.com>
      Signed-off-by: NMarkus Metzger <markus.t.metzger@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d1f1e9c0
    • F
      tracing/function-return-tracer: support for dynamic ftrace on function return tracer · e7d3737e
      Frederic Weisbecker 提交于
      This patch adds the support for dynamic tracing on the function return tracer.
      The whole difference with normal dynamic function tracing is that we don't need
      to hook on a particular callback. The only pro that we want is to nop or set
      dynamically the calls to ftrace_caller (which is ftrace_return_caller here).
      
      Some security checks ensure that we are not trying to launch dynamic tracing for
      return tracing while normal function tracing is already running.
      
      An example of trace with getnstimeofday set as a filter:
      
      ktime_get_ts+0x22/0x50 -> getnstimeofday (2283 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1396 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1825 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1426 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1524 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1434 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1502 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1404 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1397 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1051 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1314 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1344 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1163 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1390 ns)
      ktime_get_ts+0x22/0x50 -> getnstimeofday (1374 ns)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e7d3737e
    • F
      tracing/function-return-tracer: add a barrier to ensure return stack index is incremented in memory · b01c7466
      Frederic Weisbecker 提交于
      Impact: fix possible race condition in ftrace function return tracer
      
      This fixes a possible race condition if index incrementation
      is not immediately flushed in memory.
      
      Thanks for Andi Kleen and Steven Rostedt for pointing out this issue
      and give me this solution.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b01c7466
    • S
      ftrace: pass module struct to arch dynamic ftrace functions · 31e88909
      Steven Rostedt 提交于
      Impact: allow archs more flexibility on dynamic ftrace implementations
      
      Dynamic ftrace has largly been developed on x86. Since x86 does not
      have the same limitations as other architectures, the ftrace interaction
      between the generic code and the architecture specific code was not
      flexible enough to handle some of the issues that other architectures
      have.
      
      Most notably, module trampolines. Due to the limited branch distance
      that archs make in calling kernel core code from modules, the module
      load code must create a trampoline to jump to what will make the
      larger jump into core kernel code.
      
      The problem arises when this happens to a call to mcount. Ftrace checks
      all code before modifying it and makes sure the current code is what
      it expects. Right now, there is not enough information to handle modifying
      module trampolines.
      
      This patch changes the API between generic dynamic ftrace code and
      the arch dependent code. There is now two functions for modifying code:
      
        ftrace_make_nop(mod, rec, addr) - convert the code at rec->ip into
             a nop, where the original text is calling addr. (mod is the
             module struct if called by module init)
      
        ftrace_make_caller(rec, addr) - convert the code rec->ip that should
             be a nop into a caller to addr.
      
      The record "rec" now has a new field called "arch" where the architecture
      can add any special attributes to each call site record.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      31e88909
    • D
      Revert "x86: blacklist DMAR on Intel G31/G33 chipsets" · 52168e60
      David Woodhouse 提交于
      This reverts commit e51af663, which was
      wrongly hoovered up and submitted about a month after a better fix had
      already been merged.
      
      The better fix is commit cbda1ba8
      ("PCI/iommu: blacklist DMAR on Intel G31/G33 chipsets"), where we do
      this blacklisting based on the DMI identification for the offending
      motherboard, since sometimes this chipset (or at least a chipset with
      the same PCI ID) apparently _does_ actually have an IOMMU.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52168e60
  5. 13 11月, 2008 3 次提交
  6. 12 11月, 2008 4 次提交
    • I
      tracing: branch tracer, fix vdso crash · 2b7d0390
      Ingo Molnar 提交于
      Impact: fix bootup crash
      
      the branch tracer missed arch/x86/vdso/vclock_gettime.c from
      disabling tracing, which caused such bootup crashes:
      
        [  201.840097] init[1]: segfault at 7fffed3fe7c0 ip 00007fffed3fea2e sp 000077
      
      also clean up the ugly ifdefs in arch/x86/kernel/vsyscall_64.c by
      creating DISABLE_UNLIKELY_PROFILE facility for code to turn off
      instrumentation on a per file basis.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2b7d0390
    • S
      tracing: profile likely and unlikely annotations · 1f0d69a9
      Steven Rostedt 提交于
      Impact: new unlikely/likely profiler
      
      Andrew Morton recently suggested having an in-kernel way to profile
      likely and unlikely macros. This patch achieves that goal.
      
      When configured, every(*) likely and unlikely macro gets a counter attached
      to it. When the condition is hit, the hit and misses of that condition
      are recorded. These numbers can later be retrieved by:
      
        /debugfs/tracing/profile_likely    - All likely markers
        /debugfs/tracing/profile_unlikely  - All unlikely markers.
      
      # cat /debug/tracing/profile_unlikely | head
       correct incorrect  %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
          2167        0   0 do_arch_prctl                  process_64.c         832
             0        0   0 do_arch_prctl                  process_64.c         804
          2670        0   0 IS_ERR                         err.h                34
         71230     5693   7 __switch_to                    process_64.c         673
         76919        0   0 __switch_to                    process_64.c         639
         43184    33743  43 __switch_to                    process_64.c         624
         12740    64181  83 __switch_to                    process_64.c         594
         12740    64174  83 __switch_to                    process_64.c         590
      
      # cat /debug/tracing/profile_unlikely | \
        awk '{ if ($3 > 25) print $0; }' |head -20
         44963    35259  43 __switch_to                    process_64.c         624
         12762    67454  84 __switch_to                    process_64.c         594
         12762    67447  84 __switch_to                    process_64.c         590
          1478      595  28 syscall_get_error              syscall.h            51
             0     2821 100 syscall_trace_leave            ptrace.c             1567
             0        1 100 native_smp_prepare_cpus        smpboot.c            1237
         86338   265881  75 calc_delta_fair                sched_fair.c         408
        210410   108540  34 calc_delta_mine                sched.c              1267
             0    54550 100 sched_info_queued              sched_stats.h        222
         51899    66435  56 pick_next_task_fair            sched_fair.c         1422
             6       10  62 yield_task_fair                sched_fair.c         982
          7325     2692  26 rt_policy                      sched.c              144
             0     1270 100 pre_schedule_rt                sched_rt.c           1261
          1268    48073  97 pick_next_task_rt              sched_rt.c           884
             0    45181 100 sched_info_dequeued            sched_stats.h        177
             0       15 100 sched_move_task                sched.c              8700
             0       15 100 sched_move_task                sched.c              8690
         53167    33217  38 schedule                       sched.c              4457
             0    80208 100 sched_info_switch              sched_stats.h        270
         30585    49631  61 context_switch                 sched.c              2619
      
      # cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }'
         39900    36577  47 pick_next_task                 sched.c              4397
         20824    15233  42 switch_mm                      mmu_context_64.h     18
             0        7 100 __cancel_work_timer            workqueue.c          560
           617    66484  99 clocksource_adjust             timekeeping.c        456
             0   346340 100 audit_syscall_exit             auditsc.c            1570
            38   347350  99 audit_get_context              auditsc.c            732
             0   345244 100 audit_syscall_entry            auditsc.c            1541
            38     1017  96 audit_free                     auditsc.c            1446
             0     1090 100 audit_alloc                    auditsc.c            862
          2618     1090  29 audit_alloc                    auditsc.c            858
             0        6 100 move_masked_irq                migration.c          9
             1      198  99 probe_sched_wakeup             trace_sched_switch.c 58
             2        2  50 probe_wakeup                   trace_sched_wakeup.c 227
             0        2 100 probe_wakeup_sched_switch      trace_sched_wakeup.c 144
          4514     2090  31 __grab_cache_page              filemap.c            2149
         12882   228786  94 mapping_unevictable            pagemap.h            50
             4       11  73 __flush_cpu_slab               slub.c               1466
        627757   330451  34 slab_free                      slub.c               1731
          2959    61245  95 dentry_lru_del_init            dcache.c             153
           946     1217  56 load_elf_binary                binfmt_elf.c         904
           102       82  44 disk_put_part                  genhd.h              206
             1        1  50 dst_gc_task                    dst.c                82
             0       19 100 tcp_mss_split_point            tcp_output.c         1126
      
      As you can see by the above, there's a bit of work to do in rethinking
      the use of some unlikelys and likelys. Note: the unlikely case had 71 hits
      that were more than 25%.
      
      Note:  After submitting my first version of this patch, Andrew Morton
        showed me a version written by Daniel Walker, where I picked up
        the following ideas from:
      
        1)  Using __builtin_constant_p to avoid profiling fixed values.
        2)  Using __FILE__ instead of instruction pointers.
        3)  Using the preprocessor to stop all profiling of likely
             annotations from vsyscall_64.c.
      
      Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar
      for their feed back on this patch.
      
      (*) Not ever unlikely is recorded, those that are used by vsyscalls
       (a few of them) had to have profiling disabled.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f0d69a9
    • B
      ACPI: pci_link: remove acpi_irq_balance_set() interface · 32836259
      Bjorn Helgaas 提交于
      This removes the acpi_irq_balance_set() interface from the PCI
      interrupt link driver.
      
      x86 used acpi_irq_balance_set() to tell the PCI interrupt link
      driver to configure links to minimize IRQ sharing.  But the link
      driver can easily figure out whether to turn on IRQ balancing
      based on the IRQ model (PIC/IOAPIC/etc), so we can get rid of
      that external interface.
      
      It's better for the driver to figure this out at init-time.  If
      we set it externally via the x86 code, the interface reduces
      modularity, and we depend on the fact that acpi_process_madt()
      happens before we process the kernel command line.
      Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      32836259
    • R
      x86: KVM guest: fix section mismatch warning in kvmclock.c · a29a2af3
      Rakib Mullick 提交于
      WARNING: arch/x86/kernel/built-in.o(.text+0x1722c): Section mismatch
      in reference from the function kvm_setup_secondary_clock() to the
      function .devinit.text:setup_secondary_APIC_clock()
      The function kvm_setup_secondary_clock() references
      the function __devinit setup_secondary_APIC_clock().
      This is often because kvm_setup_secondary_clock lacks a __devinit
      annotation or the annotation of setup_secondary_APIC_clock is wrong.
      Signed-off-by: NMd.Rakib H. Mullick <rakib.mullick@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      a29a2af3
  7. 11 11月, 2008 6 次提交
    • I
      tracing: function return tracer, build fix · 19b3e967
      Ingo Molnar 提交于
      fix:
      
       arch/x86/kernel/ftrace.c: In function 'ftrace_return_to_handler':
       arch/x86/kernel/ftrace.c:112: error: implicit declaration of function 'cpu_clock'
      
      cpu_clock() is implicitly included via a number of ways, but its real
      location is sched.h. (Build failure is triggerable if enough other
      kernel components are turned off.)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      19b3e967
    • I
      tracing, x86: function return tracer, fix assembly constraints · 867f7fb3
      Ingo Molnar 提交于
      fix:
      
       arch/x86/kernel/ftrace.c: Assembler messages:
       arch/x86/kernel/ftrace.c:140: Error: missing ')'
       arch/x86/kernel/ftrace.c:140: Error: junk `(%ebp))' after expression
       arch/x86/kernel/ftrace.c:141: Error: missing ')'
       arch/x86/kernel/ftrace.c:141: Error: junk `(%ebp))' after expression
      
      the [parent_replaced] is used in an =rm fashion, so that constraint
      is correct in isolation - but [parent_old] aliases register %0 and uses
      it in an addressing mode that is only valid with registers - so change
      the constraint from =rm to =r.
      
      This fixes the build failure.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      867f7fb3
    • F
      tracing, x86: add low level support for ftrace return tracing · caf4b323
      Frederic Weisbecker 提交于
      Impact: add infrastructure for function-return tracing
      
      Add low level support for ftrace return tracing.
      
      This plug-in stores return addresses on the thread_info structure of
      the current task.
      
      The index of the current return address is initialized when the task
      is the first one (init) and when a process forks (the child). It is
      not needed when a task does a sys_execve because after this syscall,
      it still needs to return on the kernel functions it called.
      
      Note that the code of return_to_handler has been suggested by Steven
      Rostedt as almost all of the ideas of improvements in this V3.
      
      For purpose of security, arch/x86/kernel/process_32.c is not traced
      because __switch_to() changes the current task during its execution.
      That could cause inconsistency in the stored return address of this
      function even if I didn't have any crash after testing with tracing on
      this function enabled.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      caf4b323
    • M
      x86: HPET: enter hpet_interrupt_handler with interrupts disabled · 5ceb1a04
      Matt Fleming 提交于
      Some functions that may be called from this handler require that
      interrupts are disabled. Also, combining IRQF_DISABLED and
      IRQF_SHARED does not reliably disable interrupts in a handler, so
      remove IRQF_SHARED from the irq flags (this irq is not shared anyway).
      Signed-off-by: NMatt Fleming <mjf@gentoo.org>
      Cc: mingo@elte.hu
      Cc: venkatesh.pallipadi@intel.com
      Cc: "Will Newton" <will.newton@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5ceb1a04
    • M
      x86: HPET: read from HPET_Tn_CMP() not HPET_T0_CMP · 89d77a1e
      Matt Fleming 提交于
      In hpet_next_event() we check that the value we just wrote to
      HPET_Tn_CMP(timer) has reached the chip. Currently, we're checking that
      the value we wrote to HPET_Tn_CMP(timer) is in HPET_T0_CMP, which, if
      timer is anything other than timer 0, is likely to fail.
      Signed-off-by: NMatt Fleming <mjf@gentoo.org>
      Cc: mingo@elte.hu
      Cc: venkatesh.pallipadi@intel.com
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      89d77a1e
    • M
      x86: HPET: convert WARN_ON to WARN_ON_ONCE · 1de5b085
      Matt Fleming 提交于
      It is possible to flood the console with call traces if the WARN_ON
      condition is true because of the frequency with which this function is
      called.
      Signed-off-by: NMatt Fleming <mjf@gentoo.org>
      Cc: mingo@elte.hu
      Cc: venkatesh.pallipadi@intel.com
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1de5b085
  8. 09 11月, 2008 1 次提交
    • I
      sched: optimize sched_clock() a bit · 7cbaef9c
      Ingo Molnar 提交于
      sched_clock() uses cycles_2_ns() needlessly - which is an irq-disabling
      variant of __cycles_2_ns().
      
      Most of the time sched_clock() is called with irqs disabled already.
      The few places that call it with irqs enabled need to be updated.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7cbaef9c
  9. 06 11月, 2008 5 次提交
    • E
      Revert "x86: default to reboot via ACPI" · 8d00450d
      Eduardo Habkost 提交于
      This reverts commit c7ffa6c2.
      
      the assumptio of this change was that this would not break
      any existing machine. Andrey Borzenkov reported troubles with
      the ACPI reboot method: the system would hang on reboot, necessiating
      a power cycle. Probably more systems are affected as well.
      
      Also, there are patches queued up for v2.6.29 to disable virtualization
      on emergency_restart() - which was the original motivation of
      this change.
      Reported-by: NAndrey Borzenkov <arvidjaar@mail.ru>
      Bisected-by: NAndrey Borzenkov <arvidjaar@mail.ru>
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      Acked-by: NAvi Kivity <avi@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8d00450d
    • J
      AMD IOMMU: fix lazy IO/TLB flushing in unmap path · 80be308d
      Joerg Roedel 提交于
      Lazy flushing needs to take care of the unmap path too which is not yet
      implemented and leads to stale IO/TLB entries. This is fixed by this
      patch.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      80be308d
    • S
      x86: add smp_mb() before sending INVALIDATE_TLB_VECTOR · d6f0f39b
      Suresh Siddha 提交于
      Impact: fix rare x2apic hang
      
      On x86, x2apic mode accesses for sending IPI's don't have serializing
      semantics. If the IPI receivner refers(in lock-free fashion) to some
      memory setup by the sender, the need for smp_mb() before sending the
      IPI becomes critical in x2apic mode.
      
      Add the smp_mb() in native_flush_tlb_others() before sending the IPI.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d6f0f39b
    • S
      ftrace: add quick function trace stop · 60a7ecf4
      Steven Rostedt 提交于
      Impact: quick start and stop of function tracer
      
      This patch adds a way to disable the function tracer quickly without
      the need to run kstop_machine. It adds a new variable called
      function_trace_stop which will stop the calls to functions from mcount
      when set.  This is just an on/off switch and does not handle recursion
      like preempt_disable().
      
      It's main purpose is to help other tracers/debuggers start and stop tracing
      fuctions without the need to call kstop_machine.
      
      The config option HAVE_FUNCTION_TRACE_MCOUNT_TEST is added for archs
      that implement the testing of the function_trace_stop in the mcount
      arch dependent code. Otherwise, the test is done in the C code.
      
      x86 is the only arch at the moment that supports this.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60a7ecf4
    • B
      x86: don't allow nr_irqs > NR_IRQS · c78d0cf2
      Ben Hutchings 提交于
      Impact: fix boot hang on 32-bit systems with more than 224 IO-APIC pins
      
      On some 32-bit systems with a lot of IO-APICs probe_nr_irqs() can
      return a value larger than NR_IRQS. This will lead to probe_irq_on()
      overrunning the irq_desc array.
      
      I hit this when running net-next-2.6 (close to 2.6.28-rc3) on a
      Supermicro dual Xeon system.  NR_IRQS is 224 but probe_nr_irqs() detects
      5 IOAPICs and returns 240.  Here are the log messages:
      
      Tue Nov  4 16:53:47 2008 ACPI: IOAPIC (id[0x01] address[0xfec00000] gsi_base[0])
      Tue Nov  4 16:53:47 2008 IOAPIC[0]: apic_id 1, version 32, address 0xfec00000, GSI 0-23
      Tue Nov  4 16:53:47 2008 ACPI: IOAPIC (id[0x02] address[0xfec81000] gsi_base[24])
      Tue Nov  4 16:53:47 2008 IOAPIC[1]: apic_id 2, version 32, address 0xfec81000, GSI 24-47
      Tue Nov  4 16:53:47 2008 ACPI: IOAPIC (id[0x03] address[0xfec81400] gsi_base[48])
      Tue Nov  4 16:53:47 2008 IOAPIC[2]: apic_id 3, version 32, address 0xfec81400, GSI 48-71
      Tue Nov  4 16:53:47 2008 ACPI: IOAPIC (id[0x04] address[0xfec82000] gsi_base[72])
      Tue Nov  4 16:53:47 2008 IOAPIC[3]: apic_id 4, version 32, address 0xfec82000, GSI 72-95
      Tue Nov  4 16:53:47 2008 ACPI: IOAPIC (id[0x05] address[0xfec82400] gsi_base[96])
      Tue Nov  4 16:53:47 2008 IOAPIC[4]: apic_id 5, version 32, address 0xfec82400, GSI 96-119
      Tue Nov  4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
      Tue Nov  4 16:53:47 2008 ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
      Tue Nov  4 16:53:47 2008 Enabling APIC mode:  Flat.  Using 5 I/O APICs
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Acked-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c78d0cf2
  10. 04 11月, 2008 1 次提交
  11. 02 11月, 2008 1 次提交
    • L
      x86: Clean up late e820 resource allocation · 1f987577
      Linus Torvalds 提交于
      This makes the late e820 resources use 'insert_resource_expand_to_fit()'
      instead of doing a 'reserve_region_with_split()', and also avoids
      marking them as IORESOURCE_BUSY.
      
      This results in us being perfectly happy to use pre-existing PCI
      resources even if they were marked as being in a reserved region, while
      still avoiding any _new_ allocations in the reserved regions.  It also
      makes for a simpler and more accurate resource tree.
      
      Example resource allocation from Jonathan Corbet, who has firmware that
      has an e820 reserved entry that covered a big range (e0000000-fed003ff),
      and that had various PCI resources in it set up by firmware.
      
      With old kernels, the reserved range would force us to re-allocate all
      pre-existing PCI resources, and his reserved range would end up looking
      like this:
      
      	e0000000-fed003ff : reserved
      	  fec00000-fec00fff : IOAPIC 0
      	  fed00000-fed003ff : HPET 0
      
      where only the pre-allocated special regions (IOAPIC and HPET) were kept
      around.
      
      With 2.6.28-rc2, which uses 'reserve_region_with_split()', Jonathan's
      resource tree looked like this:
      
      	e0000000-fe7fffff : reserved
      	fe800000-fe8fffff : PCI Bus 0000:01
      	 fe800000-fe8fffff : reserved
      	fe900000-fe9d9aff : reserved
      	fe9d9b00-fe9d9bff : 0000:00:1f.3
      	 fe9d9b00-fe9d9bff : reserved
      	fe9d9c00-fe9d9fff : 0000:00:1a.7
      	 fe9d9c00-fe9d9fff : reserved
      	fe9da000-fe9dafff : 0000:00:03.3
      	 fe9da000-fe9dafff : reserved
      	fe9db000-fe9dbfff : 0000:00:19.0
      	 fe9db000-fe9dbfff : reserved
      	fe9dc000-fe9dffff : 0000:00:1b.0
      	 fe9dc000-fe9dffff : reserved
      	fe9e0000-fe9fffff : 0000:00:19.0
      	 fe9e0000-fe9fffff : reserved
      	fea00000-fea7ffff : 0000:00:02.0
      	 fea00000-fea7ffff : reserved
      	fea80000-feafffff : 0000:00:02.1
      	 fea80000-feafffff : reserved
      	feb00000-febfffff : 0000:00:02.0
      	 feb00000-febfffff : reserved
      	fec00000-fed003ff : reserved
      	 fec00000-fec00fff : IOAPIC 0
      	 fed00000-fed003ff : HPET 0
      
      and because the reserved entry had been split and moved into the
      individual resources, and because it used the IORESOURCE_BUSY flag, the
      drivers that actually wanted to _use_ those resources couldn't actually
      attach to them:
      
      	e1000e 0000:00:19.0: BAR 0: can't reserve mem region [0xfe9e0000-0xfe9fffff]
      	HDA Intel 0000:00:1b.0: BAR 0: can't reserve mem region [0xfe9dc000-0xfe9dffff]
      
      with this patch, the resource tree instead becomes
      
      	e0000000-fed003ff : reserved
      	  fe800000-fe8fffff : PCI Bus 0000:01
      	  fe9d9b00-fe9d9bff : 0000:00:1f.3
      	  fe9d9c00-fe9d9fff : 0000:00:1a.7
      	    fe9d9c00-fe9d9fff : ehci_hcd
      	  fe9da000-fe9dafff : 0000:00:03.3
      	  fe9db000-fe9dbfff : 0000:00:19.0
      	    fe9db000-fe9dbfff : e1000e
      	  fe9dc000-fe9dffff : 0000:00:1b.0
      	    fe9dc000-fe9dffff : ICH HD audio
      	  fe9e0000-fe9fffff : 0000:00:19.0
      	    fe9e0000-fe9fffff : e1000e
      	  fea00000-fea7ffff : 0000:00:02.0
      	  fea80000-feafffff : 0000:00:02.1
      	  feb00000-febfffff : 0000:00:02.0
      	  fec00000-fec00fff : IOAPIC 0
      	  fed00000-fed003ff : HPET 0
      
      ie the one reserved region now ends up surrounding all the PCI resources
      that were allocated inside of it by firmware, and because it is not
      marked BUSY, drivers have no problem attaching to the pre-allocated
      resources.
      Reported-and-tested-by: NJonathan Corbet <corbet@lwn.net>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Robert Hancock <hancockr@shaw.ca>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f987577
  12. 31 10月, 2008 3 次提交
    • S
      ftrace: nmi safe code clean ups · a26a2a27
      Steven Rostedt 提交于
      Impact: cleanup
      
      This patch cleans up the NMI safe code for dynamic ftrace as suggested
      by Andrew Morton.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a26a2a27
    • I
      x86: build fix · b342797c
      Ingo Molnar 提交于
      Impact: build fix on certain UP configs
      
      fix:
      
       arch/x86/kernel/cpu/common.c: In function 'cpu_init':
       arch/x86/kernel/cpu/common.c:1141: error: 'boot_cpu_id' undeclared (first use in this function)
       arch/x86/kernel/cpu/common.c:1141: error: (Each undeclared identifier is reported only once
       arch/x86/kernel/cpu/common.c:1141: error: for each function it appears in.)
      
      Pull in asm/smp.h on UP, so that we get the definition of
      boot_cpu_id.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b342797c
    • I
      x86: cpu_index build fix · 1c4acdb4
      Ingo Molnar 提交于
      fix:
      
       arch/x86/kernel/cpu/common.c: In function 'early_identify_cpu':
       arch/x86/kernel/cpu/common.c:553: error: 'struct cpuinfo_x86' has no member named 'cpu_index'
      
      as cpu_index is only available on SMP.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1c4acdb4