1. 31 12月, 2006 2 次提交
  2. 24 12月, 2006 1 次提交
    • O
      [PATCH] arch/i386/pci/mmconfig.c tlb flush fix · 8d1c4819
      OGAWA Hirofumi 提交于
      We use the fixmap for accessing pci config space in pci_mmcfg_read/write().
      The problem is in pci_exp_set_dev_base(). It is caching a last
      accessed address to avoid calling set_fixmap_nocache() whenever
      pci_mmcfg_read/write() is used.
      
        static inline void pci_exp_set_dev_base(int bus, int devfn)
        {
      	u32 dev_base = base | (bus << 20) | (devfn << 12);
      	if (dev_base != mmcfg_last_accessed_device) {
      		mmcfg_last_accessed_device = dev_base;
      		set_fixmap_nocache(FIX_PCIE_MCFG, dev_base);
      	}
        }
      
                  cpu0                                        cpu1
        ---------------------------------------------------------------------------
          pci_mmcfg_read("device-A")
              pci_exp_set_dev_base()
                  set_fixmap_nocache()
                                                    pci_mmcfg_read("device-B")
                                                        pci_exp_set_dev_base()
                                                            set_fixmap_nocache()
          pci_mmcfg_read("device-B")
              pci_exp_set_dev_base()
                  /* doesn't flush tlb */
      
      But if cpus accessed the above order, the second pci_mmcfg_read() on
      cpu0 doesn't flush the TLB, because "mmcfg_last_accessed_device" is
      device-B.  So, second pci_mmcfg_read() on cpu0 accesses a device-A via
      a previous TLB cache. This problem became the cause of several strange
      behavior.
      
      This patches fixes this situation by adds "mmcfg_last_accessed_cpu" check.
      
      [ Alternatively, we could make a per-cpu mapping area or something. Not
        that it's probably worth it, but if we wanted to avoid all locking and
        instead just disable preemption, that would be the way to go. --Linus ]
      Signed-off-by: NOGAWA Hirofumi <hogawa@miraclelinux.com>
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8d1c4819
  3. 23 12月, 2006 5 次提交
    • I
      [PATCH] sched: fix bad missed wakeups in the i386, x86_64, ia64, ACPI and APM idle code · 0888f06a
      Ingo Molnar 提交于
      Fernando Lopez-Lezcano reported frequent scheduling latencies and audio
      xruns starting at the 2.6.18-rt kernel, and those problems persisted all
      until current -rt kernels. The latencies were serious and unjustified by
      system load, often in the milliseconds range.
      
      After a patient and heroic multi-month effort of Fernando, where he
      tested dozens of kernels, tried various configs, boot options,
      test-patches of mine and provided latency traces of those incidents, the
      following 'smoking gun' trace was captured by him:
      
                       _------=> CPU#
                      / _-----=> irqs-off
                     | / _----=> need-resched
                     || / _---=> hardirq/softirq
                     ||| / _--=> preempt-depth
                     |||| /
                     |||||     delay
         cmd     pid ||||| time  |   caller
            \   /    |||||   \   |   /
        IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup (try_to_wake_up)
        IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup <<...>-5856> (37 0)
        IRQ_19-1479  1D..1    0us : __trace_start_sched_wakeup (c01262ba 0 0)
        IRQ_19-1479  1D..1    0us : resched_task (try_to_wake_up)
        IRQ_19-1479  1D..1    0us : __spin_unlock_irqrestore (try_to_wake_up)
        ...
        <idle>-0     1...1   11us!: default_idle (cpu_idle)
        ...
        <idle>-0     0Dn.1  602us : smp_apic_timer_interrupt (c0103baf 1 0)
        ...
         <...>-5856  0D..2  618us : __switch_to (__schedule)
         <...>-5856  0D..2  618us : __schedule <<idle>-0> (20 162)
         <...>-5856  0D..2  619us : __spin_unlock_irq (__schedule)
         <...>-5856  0...1  619us : trace_stop_sched_switched (__schedule)
         <...>-5856  0D..1  619us : trace_stop_sched_switched <<...>-5856> (37 0)
      
      what is visible in this trace is that CPU#1 ran try_to_wake_up() for
      PID:5856, it placed PID:5856 on CPU#0's runqueue and ran resched_task()
      for CPU#0. But it decided to not send an IPI that no CPU - due to
      TS_POLLING. But CPU#0 never woke up after its NEED_RESCHED bit was set,
      and only rescheduled to PID:5856 upon the next lapic timer IRQ. The
      result was a 600+ usecs latency and a missed wakeup!
      
      the bug turned out to be an idle-wakeup bug introduced into the mainline
      kernel this summer via an optimization in the x86_64 tree:
      
          commit 495ab9c0
          Author: Andi Kleen <ak@suse.de>
          Date:   Mon Jun 26 13:59:11 2006 +0200
      
          [PATCH] i386/x86-64/ia64: Move polling flag into thread_info_status
      
          During some profiling I noticed that default_idle causes a lot of
          memory traffic. I think that is caused by the atomic operations
          to clear/set the polling flag in thread_info. There is actually
          no reason to make this atomic - only the idle thread does it
          to itself, other CPUs only read it. So I moved it into ti->status.
      
      the problem is this type of change:
      
              if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
      -               clear_thread_flag(TIF_POLLING_NRFLAG);
      +               current_thread_info()->status &= ~TS_POLLING;
                      smp_mb__after_clear_bit();
                      while (!need_resched()) {
                              local_irq_disable();
      
      this changes clear_thread_flag() to an explicit clearing of TS_POLLING.
      clear_thread_flag() is defined as:
      
              clear_bit(flag, &ti->flags);
      
      and clear_bit() is a LOCK-ed atomic instruction on all x86 platforms:
      
        static inline void clear_bit(int nr, volatile unsigned long * addr)
        {
                __asm__ __volatile__( LOCK_PREFIX
                        "btrl %1,%0"
      
      hence smp_mb__after_clear_bit() is defined as a simple compile barrier:
      
        #define smp_mb__after_clear_bit()       barrier()
      
      but the explicit TS_POLLING clearing introduced by the patch:
      
      +               current_thread_info()->status &= ~TS_POLLING;
      
      is not an atomic op! So the clearing of the TS_POLLING bit is freely
      reorderable with the reading of the NEED_RESCHED bit - and both now
      reside in different memory addresses.
      
      CPU idle wakeup very much depends on ordered memory ops, the clearing of
      the TS_POLLING flag must always be done before we test need_resched()
      and hit the idle instruction(s). [Symmetrically, the wakeup code needs
      to set NEED_RESCHED before it tests the TS_POLLING flag, so memory
      ordering is paramount.]
      
      Fernando's dual-core Athlon64 system has a sufficiently advanced memory
      ordering model so that it triggered this scenario very often.
      
      ( And it also turned out that the reason why these latencies never
        triggered on my testsystems is that i routinely use idle=poll, which
        was the only idle variant not affected by this bug. )
      
      The fix is to change the smp_mb__after_clear_bit() to an smp_mb(), to
      act as an absolute barrier between the TS_POLLING write and the
      NEED_RESCHED read. This affects almost all idling methods (default,
      ACPI, APM), on all 3 x86 architectures: i386, x86_64, ia64.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Tested-by: NFernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0888f06a
    • J
      [PATCH] ptrace: Fix EFL_OFFSET value according to i386 pda changes · 8701ea95
      Jeremy Fitzhardinge 提交于
      The PDA patches introduced a bug in ptrace: it reads eflags from the wrong
      place on the target's stack, but writes it back to the correct place.  The
      result is a corrupted eflags, which is most visible when it turns interrupts
      off unexpectedly.
      
      This patch fixes this by making the ptrace code a little less fragile.  It
      changes [gs]et_stack_long to take a straightforward byte offset into struct
      pt_regs, rather than requiring all callers to do a sizeof(struct pt_regs)
      offset adjustment.  This means that the eflag's offset (EFL_OFFSET) on the
      target stack can be simply computed with offsetof().
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Frederik Deweerdt <deweerdt@free.fr>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8701ea95
    • Y
      [PATCH] memory hotplug: fix compile error for i386 with NUMA config · 7c7e9425
      Yasunori Goto 提交于
      Fix compile error when config memory hotplug with numa on i386.
      
      The cause of compile error was missing of arch_add_memory(),
      remove_memory(), and memory_add_physaddr_to_nid().
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@cs.washington.edu>
      Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7c7e9425
    • J
      [PATCH] microcode: fix mc_cpu_notifier section warning · be31f9cb
      Jean Delvare 提交于
      Structure mc_cpu_notifier references a __cpuinit function, but isn't
      declared __cpuinitdata itself:
      
      WARNING: arch/i386/kernel/microcode.o - Section mismatch: reference
      to .init.text: from .data after 'mc_cpu_notifier' (at offset 0x118)
      Signed-off-by: NJean Delvare <khali@linux-fr.org>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      be31f9cb
    • Y
      [PATCH] compile error of register_memory() · 5c95da9f
      Yasunori Goto 提交于
      register_memory() becomes double definition in 2.6.20-rc1.  It is defined
      in arch/i386/kernel/setup.c as static definition in 2.6.19.  But it is
      moved to arch/i386/kernel/e820.c in 2.6.20-rc1.  And same name function is
      defined in driver/base/memory.c too.  So, it becomes cause of compile error
      of duplicate definition if memory hotplug option is on.
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5c95da9f
  4. 21 12月, 2006 5 次提交
    • I
      [PATCH] x86_64: fix boot time hang in detect_calgary() · 136f1e7a
      Ingo Molnar 提交于
      if CONFIG_CALGARY_IOMMU is built into the kernel via
      CONFIG_CALGARY_IOMMU_ENABLED_BY_DEFAULT, or is enabled via the
      iommu=calgary boot option, then the detect_calgary() function runs to
      detect the presence of a Calgary IOMMU.
      
      detect_calgary() first searches the BIOS EBDA area for a "rio_table_hdr"
      BIOS table. It has this parsing algorithm for the EBDA:
      
      	while (offset) {
      		...
      		/* The next offset is stored in the 1st word. 0 means no more */
       		offset = *((unsigned short *)(ptr + offset));
      	}
      
      got that? Lets repeat it slowly: we've got a BIOS-supplied data
      structure, plus Linux kernel code that will only break out of an
      infinite parsing loop once the BIOS gives a zero offset. Ok?
      
      Translation: what an excellent opportunity for BIOS writers to lock up
      the Linux boot process in an utterly hard to debug place! Indeed the
      BIOS jumped on that opportunity on my box, which has the following EBDA
      chaining layout:
      
        384, 65282, 65535, 65535, 65535, 65535, 65535, 65535 ...
      
      see the pattern? So my, definitely non-Calgary system happily locks up
      in detect_calgary()!
      
      the patch below fixes the boot hang by trusting the BIOS-supplied data
      structure a bit less: the parser always has to make forward progress,
      and if it doesnt, we break out of the loop and i get the expected kernel
      message:
      
        Calgary: Unable to locate Rio Grande Table in EBDA - bailing!
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NMuli Ben-Yehuda <muli@il.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      136f1e7a
    • I
      [PATCH] x86_64: fix boot hang caused by CALGARY_IOMMU_ENABLED_BY_DEFAULT · a9622f62
      Ingo Molnar 提交于
      one of my boxes didnt boot the 2.6.20-rc1-rt0 kernel rpm, it hung during
      early bootup. After an hour or two of happy debugging i narrowed it down
      to the CALGARY_IOMMU_ENABLED_BY_DEFAULT option, which was freshly added
      to 2.6.20 via the x86_64 tree and /enabled by default/.
      
      commit bff6547b claims:
      
          [PATCH] Calgary: allow compiling Calgary in but not using it by default
      
          This patch makes it possible to compile Calgary in but not use it by
          default. In this mode, use 'iommu=calgary' to activate it.
      
      but the change does not actually practice it:
      
       config CALGARY_IOMMU_ENABLED_BY_DEFAULT
              bool "Should Calgary be enabled by default?"
              default y
              depends on CALGARY_IOMMU
              help
                Should Calgary be enabled by default? if you choose 'y', Calgary
                will be used (if it exists). If you choose 'n', Calgary will not be
                used even if it exists. If you choose 'n' and would like to use
                Calgary anyway, pass 'iommu=calgary' on the kernel command line.
                If unsure, say Y.
      
      it's both 'default y', and says "If unsure, say Y". Clearly not a typo.
      
      disabling this option makes my box boot again. The patch below fixes the
      Kconfig entry. Grumble.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a9622f62
    • B
      ACPI: replace kmalloc+memset with kzalloc · 36bcbec7
      Burman Yan 提交于
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      36bcbec7
    • A
      PCI: Fix multiple problems with VIA hardware · 1597cacb
      Alan Cox 提交于
      This patch is designed to fix:
      - Disk eating corruptor on KT7 after resume from RAM
      - VIA IRQ handling
      - VIA fixups for bus lockups after resume from RAM
      
      The core of this is to add a table of resume fixups run at resume time.
      We need to do this for a variety of boards and features, but particularly
      we need to do this to get various critical VIA fixups done on resume.
      
      The second part of the problem is to handle VIA IRQ number rules which
      are a bit odd and need special handling for PIC interrupts. Various
      patches broke various boxes and while this one may not be perfect
      (hopefully it is) it ensures the workaround is applied to the right
      devices only.
      
      From: Jean Delvare <khali@linux-fr.org>
      
      Now that PCI quirks are replayed on software resume, we can safely
      re-enable the Asus SMBus unhiding quirk even when software suspend support
      is enabled.
      
      [akpm@osdl.org: fix const warning]
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Cc: Jean Delvare <khali@linux-fr.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      1597cacb
    • M
      PCI: Only check the HT capability bits in mpic.c · beb7cc82
      Michael Ellerman 提交于
      Only compare the exact HT capability bits against HT_CAPTYPE_IRQ,
      this is a little paranoid, but doesn't hurt.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      beb7cc82
  5. 20 12月, 2006 11 次提交
  6. 19 12月, 2006 5 次提交
  7. 18 12月, 2006 11 次提交