1. 16 10月, 2008 17 次提交
    • Y
      sparseirq: move kstat_irqs from kstat to irq_desc - fix · 8c464a4b
      Yinghai Lu 提交于
      fix non-sparseirq architectures.
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8c464a4b
    • Y
      x86: sparse_irq needs spin_lock in allocations · e89eb438
      Yinghai Lu 提交于
      Suresh Siddha noticed that we should have a spinlock around it.
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e89eb438
    • I
      sparseirq: fix lockdep · e955b539
      Ingo Molnar 提交于
      -tip testing found this lockdep splat:
      
      [    0.000000] Initializing CPU#0
      [    0.000000] found new irq_desc for irq 0
      [    0.000000] INFO: trying to register non-static key.
      [    0.000000] the code is fine but needs lockdep annotation.
      [    0.000000] turning off the locking correctness validator.
      [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.27-rc3-tip-00191-g98ccb89-dirty #1
      [    0.000000]  [<c0153c22>] register_lock_class+0x3d2/0x400
      [    0.000000]  [<c0104d87>] ? mcount_call+0x5/0xa
      [    0.000000]  [<c0154f3a>] __lock_acquire+0x22a/0x5d0
      [    0.000000]  [<c0104d87>] ? mcount_call+0x5/0xa
      [    0.000000]  [<c0155351>] lock_acquire+0x71/0xa0
      [    0.000000]  [<c016d61f>] ? set_irq_chip+0x3f/0x90
      [    0.000000]  [<c070f148>] _spin_lock_irqsave+0x58/0x90
      [    0.000000]  [<c016d61f>] ? set_irq_chip+0x3f/0x90
      [    0.000000]  [<c016d61f>] set_irq_chip+0x3f/0x90
      [    0.000000]  [<c016d7e0>] ? handle_level_irq+0x0/0xe0
      [    0.000000]  [<c016da1a>] set_irq_chip_and_handler_name+0x1a/0x40
      [    0.000000]  [<c0a396c1>] init_ISA_irqs+0x51/0xa0
      [    0.000000]  [<c0a4a365>] pre_intr_init_hook+0x25/0x30
      [    0.000000]  [<c0a39723>] native_init_IRQ+0x13/0x370
      [    0.000000]  [<c015569c>] ? lock_release+0xcc/0x1d0
      [    0.000000]  [<c0104d87>] ? mcount_call+0x5/0xa
      [    0.000000]  [<c070dc22>] ? __mutex_unlock_slowpath+0x92/0x110
      [    0.000000]  [<c070dcad>] ? mutex_unlock+0xd/0x10
      [    0.000000]  [<c0135f62>] ? cpu_maps_update_done+0x12/0x20
      [    0.000000]  [<c06c6743>] ? register_cpu_notifier+0x23/0x30
      [    0.000000]  [<c011e8ae>] init_IRQ+0xe/0x10
      [    0.000000]  [<c0a357a5>] start_kernel+0x1c5/0x340
      [    0.000000]  [<c0a35280>] ? unknown_bootoption+0x0/0x210
      [    0.000000]  [<c0a3506b>] i386_start_kernel+0x6b/0x80
      [    0.000000]  =======================
      [    0.000000] found new irq_desc for irq 1
      [    0.000000] found new irq_desc for irq 2
      [    0.000000] found new irq_desc for irq 3
      
      this:
      
       static void init_one_irq_desc(struct irq_desc *desc)
       {
               memcpy(desc, &irq_desc_init, sizeof(struct irq_desc));
       #ifdef CONFIG_TRACE_IRQFLAGS
               lockdep_set_class(&desc->lock, &irq_desc_lock_class);
       #endif
       }
      
      should be unconditional.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e955b539
    • Y
      x86: remove irqbalance in kernel for 32 bit · 8b8e8c1b
      Yinghai Lu 提交于
      This has been deprecated for years, the user space irqbalanced utility
      works better with numa, has configurable policies, etc...
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmai.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8b8e8c1b
    • Y
      irq: separate sparse_irqs from sparse_irqs_free · 67fb283e
      Yinghai Lu 提交于
      so later don't need compare with -1U
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      67fb283e
    • Y
      x86_64: rename irq_desc/irq_desc_alloc · cb5bc832
      Yinghai Lu 提交于
      change names:
      
                irq_desc() ==> irq_desc_alloc
      	__irq_desc() ==> irq_desc
      
      Also split a few of the uses in lowlevel x86 code.
      
      v2: need to check if desc is null in smp_irq_move_cleanup
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cb5bc832
    • Y
      generic: add irq_desc in function in parameter · 46926b67
      Yinghai Lu 提交于
      So we could remove some duplicated calling to irq_desc
      
      v2: make sure irq_desc in  init/main.c is not used without generic_hardirqs
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      46926b67
    • Y
      irq: remove >= nr_irqs checking with config_have_sparse_irq · 7d94f7ca
      Yinghai Lu 提交于
      remove irq limit checks - nr_irqs is dynamic and we expand anytime.
      
      v2: fix checking about result irq_cfg_without_new, so could use msi again
      v3: use irq_desc_without_new to check irq is valid
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7d94f7ca
    • Y
      irq: replace loop with nr_irqs with for_each_irq_desc · 2c6927a3
      Yinghai Lu 提交于
      There are a handful of loops that go from 0 to nr_irqs and use
      get_irq_desc() on them. These would allocate all the irq_desc
      entries, regardless of the need for them.
      
      Use the smarter for_each_irq_desc() iterator that will only iterate
      over the present ones.
      
      v2: make sure arch without GENERIC_HARDIRQS work too
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2c6927a3
    • Y
      irq: add irq_desc_without_new · 9059d8fa
      Yinghai Lu 提交于
      add an irq_desc accessor that will not allocate any sparse entry
      but returns failure if there's no entry present.
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9059d8fa
    • Y
      x86: move kstat_irqs from kstat to irq_desc · 7f95ec9e
      Yinghai Lu 提交于
      based on Eric's patch ...
      
      together mold it with dyn_array for irq_desc, will allcate kstat_irqs for
      nr_irq_desc alltogether if needed. -- at that point nr_cpus is known already.
      
      v2: make sure system without generic_hardirqs works they don't have irq_desc
      v3: fix merging
      v4: [mingo@elte.hu] fix typo
      
      [ mingo@elte.hu ] irq: build fix
      
      fix:
      
       arch/x86/xen/spinlock.c: In function 'xen_spin_lock_slow':
       arch/x86/xen/spinlock.c:90: error: 'struct kernel_stat' has no member named 'irqs'
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7f95ec9e
    • I
      irq: sparse irqs, fix IRQ auto-probe crash · 3bf52a4d
      Ingo Molnar 提交于
      fix:
      
      [   10.631533] calling  yenta_socket_init+0x0/0x20
      [   10.631533] Yenta: CardBus bridge found at 0000:15:00.0 [17aa:2012]
      [   10.631533] Yenta: Using INTVAL to route CSC interrupts to PCI
      [   10.631533] Yenta: Routing CardBus interrupts to PCI
      [   10.631533] Yenta TI: socket 0000:15:00.0, mfunc 0x01d01002, devctl 0x64
      [   10.731599] BUG: unable to handle kernel NULL pointer dereference at 00000040
      [   10.731838] IP: [<c0c95b5f>] _spin_lock_irq+0xf/0x20
      [   10.732221] *pde = 00000000
      [   10.732741] Oops: 0002 [#1] SMP
      [   10.733453]
      [   10.734253] Pid: 1, comm: swapper Tainted: G        W (2.6.27-rc3-tip-00173-gd7eaa4f-dirty #1)
      [   10.735188] EIP: 0060:[<c0c95b5f>] EFLAGS: 00010002 CPU: 0
      [   10.735523] EIP is at _spin_lock_irq+0xf/0x20
      [   10.735523] EAX: 00000040 EBX: 00000000 ECX: f6e04c90 EDX: 00000100
      [   10.735523] ESI: 000000df EDI: f6e04c90 EBP: f7867df0 ESP: f7867df0
      [   10.735523]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
      [   10.735523] Process swapper (pid: 1, ti=f7867000 task=f7870000 task.ti=f7867000)
      [   10.735523] Stack: f7867e04 c0155fbd 00000000 00000000 f6e04c90 f7867e5c c0c6e319 c0f6a074
      [   10.735523]        f6e04c90 000017aa 00002012 c112b648 f791f240 c112b5e0 f7867e44 c010440b
      [   10.735523]        f791f240 f791f29c c112b8ec f791f240 00000000 f7867e5c c048f893 03c0b648
      [   10.735523] Call Trace:
      [   10.735523]  [<c0155fbd>] ? probe_irq_on+0x3d/0x140
      [   10.735523]  [<c0c6e319>] ? yenta_probe+0x529/0x640
      [   10.735523]  [<c010440b>] ? mcount_call+0x5/0xa
      [   10.735523]  [<c048f893>] ? pci_match_device+0xa3/0xb0
      [   10.735523]  [<c048fc1e>] ? pci_device_probe+0x5e/0x80
      [   10.735523]  [<c0515423>] ? driver_probe_device+0x83/0x180
      [   10.735523]  [<c0515594>] ? __driver_attach+0x74/0x80
      [   10.735523]  [<c0514b69>] ? bus_for_each_dev+0x49/0x70
      [   10.735523]  [<c051528e>] ? driver_attach+0x1e/0x20
      [   10.735523]  [<c0515520>] ? __driver_attach+0x0/0x80
      [   10.735523]  [<c05150d3>] ? bus_add_driver+0x1a3/0x220
      [   10.735523]  [<c048fb60>] ? pci_device_remove+0x0/0x40
      [   10.735523]  [<c05157f4>] ? driver_register+0x54/0x130
      [   10.735523]  [<c048fe2f>] ? __pci_register_driver+0x4f/0x90
      [   10.735523]  [<c11e9419>] ? yenta_socket_init+0x19/0x20
      [   10.735523]  [<c0101125>] ? do_one_initcall+0x35/0x160
      [   10.735523]  [<c11e9400>] ? yenta_socket_init+0x0/0x20
      [   10.735523]  [<c01391a6>] ? __queue_work+0x36/0x50
      [   10.735523]  [<c013922d>] ? queue_work_on+0x3d/0x50
      [   10.735523]  [<c11a2758>] ? kernel_init+0x148/0x210
      [   10.735523]  [<c11a2610>] ? kernel_init+0x0/0x210
      [   10.735523]  [<c01043f3>] ? kernel_thread_helper+0x7/0x10
      [   10.735523]  =======================
      [   10.735523] Code: 10 38 f2 74 06 f3 90 8a 10 eb f6 5d 89 c8 c3 8d b6 00 00 00 00 8d bc 27 00 00 00 00 55 89 e5 e8 a4 e8 46 ff fa ba 00 01 00 00 90 <66> 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 5d c3 90 55 89 e5 53
      
      as auto-probing wants to iterate over existing irqs.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3bf52a4d
    • Y
      generic: sparse irqs: use irq_desc() together with dyn_array, instead of irq_desc[] · 08678b08
      Yinghai Lu 提交于
      add CONFIG_HAVE_SPARSE_IRQ to for use condensed array.
      Get rid of irq_desc[] array assumptions.
      
      Preallocate 32 irq_desc, and irq_desc() will try to get more.
      
      ( No change in functionality is expected anywhere, except the odd build
        failure where we missed a code site or where a crossing commit itroduces
        new irq_desc[] usage. )
      
      v2: according to Eric, change get_irq_desc() to irq_desc()
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      08678b08
    • Y
      irq: make irqs in kernel stat use per_cpu_dyn_array · d17a55de
      Yinghai Lu 提交于
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d17a55de
    • I
      irq: sparse irqs, export nr_irqs · fa42d10d
      Ingo Molnar 提交于
      fix:
      
        Building modules, stage 2.
        MODPOST 458 modules
        ERROR: "nr_irqs" [drivers/serial/8250.ko] undefined!
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fa42d10d
    • Y
      irq: make irq_desc to use dyn_array · d60458b2
      Yinghai Lu 提交于
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d60458b2
    • Y
      irq: introduce nr_irqs · 85c0f909
      Yinghai Lu 提交于
      at this point nr_irqs is equal NR_IRQS
      
      convert a few easy users from NR_IRQS to dynamic nr_irqs.
      
      v2: according to Eric, we need to take care of arch without generic_hardirqs
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      85c0f909
  2. 14 10月, 2008 3 次提交
  3. 10 10月, 2008 3 次提交
  4. 09 10月, 2008 1 次提交
    • I
      sched debug: add name to sched_domain sysctl entries · a5d8c348
      Ingo Molnar 提交于
      add /proc/sys/kernel/sched_domain/cpu0/domain0/name, to make
      it easier to see which specific scheduler domain remained at
      that entry.
      
      Since we process the scheduler domain tree and
      simplify it, it's not always immediately clear during debugging
      which domain came from where.
      
      depends on CONFIG_SCHED_DEBUG=y.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a5d8c348
  5. 08 10月, 2008 1 次提交
  6. 07 10月, 2008 1 次提交
  7. 06 10月, 2008 1 次提交
  8. 04 10月, 2008 2 次提交
    • D
      sched_rt.c: resch needed in rt_rq_enqueue() for the root rt_rq · f6121f4f
      Dario Faggioli 提交于
      While working on the new version of the code for SCHED_SPORADIC I
      noticed something strange in the present throttling mechanism. More
      specifically in the throttling timer handler in sched_rt.c
      (do_sched_rt_period_timer()) and in rt_rq_enqueue().
      
      The problem is that, when unthrottling a runqueue, rt_rq_enqueue() only
      asks for rescheduling if the runqueue has a sched_entity associated to
      it (i.e., rt_rq->rt_se != NULL).
      Now, if the runqueue is the root rq (which has a rt_se = NULL)
      rescheduling does not take place, and it is delayed to some undefined
      instant in the future.
      
      This imply some random bandwidth usage by the RT tasks under throttling.
      For instance, setting rt_runtime_us/rt_period_us = 950ms/1000ms an RT
      task will get less than 95%. In our tests we got something varying
      between 70% to 95%.
      Using smaller time values, e.g., 95ms/100ms, things are even worse, and
      I can see values also going down to 20-25%!!
      
      The tests we performed are simply running 'yes' as a SCHED_FIFO task,
      and checking the CPU usage with top, but we can investigate thoroughly
      if you think it is needed.
      
      Things go much better, for us, with the attached patch... Don't know if
      it is the best approach, but it solved the issue for us.
      Signed-off-by: NDario Faggioli <raistlin@linux.it>
      Signed-off-by: NMichael Trimarchi <trimarchimichael@yahoo.it>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f6121f4f
    • T
      clockevents: check broadcast tick device not the clock events device · 07454bff
      Thomas Gleixner 提交于
      Impact: jiffies increment too fast.
      
      Hugh Dickins noted that with NOHZ=n and HIGHRES=n jiffies get
      incremented too fast. The reason is a wrong check in the broadcast
      enter/exit code, which keeps the local apic timer in periodic mode
      when the switch happens.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      07454bff
  9. 03 10月, 2008 4 次提交
  10. 02 10月, 2008 1 次提交
    • D
      genirq: record trigger type · 0c5d1eb7
      David Brownell 提交于
      Genirq hasn't previously recorded the trigger type used by any given IRQ,
      although some irq_chip support has done so.  That data can be useful when
      troubleshooting.  This patch records it in the relevant irq_desc.status
      bits, and improves consistency between the two driver-visible calls
      affected:
      
       - Make set_irq_type() usage match request_irq() usage:
          * IRQ_TYPE_NONE should be a NOP; succeed, so irq_chip methods
            won't have to handle that case any more (many do it wrong).
          * IRQ_TYPE_PROBE is ignored; any buggy out-of-tree callers
            might need to switch over to the real IRQ probing code.
          * emit the same diagnostics (from shared utility code)
      
       - Their kerneldoc now reflects usage:
          * request_irq() flags include IRQF_TRIGGER_* to specify
            active edge(s)/level ... docs previously omitted that
          * set_irq_type() is declared in <linux/irq.h> so callers
            should use the (bit-equivalent) IRQ_TYPE_* symbols there
      
      Also: adds a warning about shared IRQs that don't end up using the
      requested trigger mode; and fix an unrelated "sparse" warning.
      Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0c5d1eb7
  11. 30 9月, 2008 2 次提交
  12. 29 9月, 2008 4 次提交
    • B
      mm owner: fix race between swapoff and exit · 31a78f23
      Balbir Singh 提交于
      There's a race between mm->owner assignment and swapoff, more easily
      seen when task slab poisoning is turned on.  The condition occurs when
      try_to_unuse() runs in parallel with an exiting task.  A similar race
      can occur with callers of get_task_mm(), such as /proc/<pid>/<mmstats>
      or ptrace or page migration.
      
      CPU0                                    CPU1
                                              try_to_unuse
                                              looks at mm = task0->mm
                                              increments mm->mm_users
      task 0 exits
      mm->owner needs to be updated, but no
      new owner is found (mm_users > 1, but
      no other task has task->mm = task0->mm)
      mm_update_next_owner() leaves
                                              mmput(mm) decrements mm->mm_users
      task0 freed
                                              dereferencing mm->owner fails
      
      The fix is to notify the subsystem via mm_owner_changed callback(),
      if no new owner is found, by specifying the new task as NULL.
      
      Jiri Slaby:
      mm->owner was set to NULL prior to calling cgroup_mm_owner_callbacks(), but
      must be set after that, so as not to pass NULL as old owner causing oops.
      
      Daisuke Nishimura:
      mm_update_next_owner() may set mm->owner to NULL, but mem_cgroup_from_task()
      and its callers need to take account of this situation to avoid oops.
      
      Hugh Dickins:
      Lockdep warning and hang below exec_mmap() when testing these patches.
      exit_mm() up_reads mmap_sem before calling mm_update_next_owner(),
      so exec_mmap() now needs to do the same.  And with that repositioning,
      there's now no point in mm_need_new_owner() allowing for NULL mm.
      Reported-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      31a78f23
    • T
      hrtimer: prevent migration of per CPU hrtimers · ccc7dadf
      Thomas Gleixner 提交于
      Impact: per CPU hrtimers can be migrated from a dead CPU
      
      The hrtimer code has no knowledge about per CPU timers, but we need to
      prevent the migration of such timers and warn when such a timer is
      active at migration time.
      
      Explicitely mark the timers as per CPU and use a more understandable
      mode descriptor for the interrupts safe unlocked callback mode, which
      is used by hrtimer_sleeper and the scheduler code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      ccc7dadf
    • T
      hrtimer: mark migration state · b00c1a99
      Thomas Gleixner 提交于
      Impact: during migration active hrtimers can be seen as inactive
      
      The migration code removes the hrtimers from the queues of the dead
      CPU and sets the state temporary to INACTIVE. The enqueue code sets it
      to ACTIVE/PENDING again.
      
      Prevent that the wrong state can be seen by using a separate migration
      state bit.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b00c1a99
    • T
      hrtimer: fix migration of CB_IRQSAFE_NO_SOFTIRQ hrtimers · 41e1022e
      Thomas Gleixner 提交于
      Impact: Stale timers after a CPU went offline.
      
      commit 37bb6cb4
             hrtimer: unlock hrtimer_wakeup
      
      changed the hrtimer sleeper callback mode to CB_IRQSAFE_NO_SOFTIRQ due
      to locking problems. A result of this change is that when enqueue is
      called for an already expired hrtimer the callback function is not
      longer called directly from the enqueue code. The normal callers have
      been fixed in the code, but the migration code which moves hrtimers
      from a dead CPU to a live CPU was not made aware of this.
      
      This can be fixed by checking the timer state after the call to
      enqueue in the migration code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      41e1022e