1. 08 12月, 2006 1 次提交
  2. 23 11月, 2006 1 次提交
  3. 17 11月, 2006 1 次提交
    • Z
      [PATCH] some irq_chip variables point to NULL · b86432b4
      Zhang, Yanmin 提交于
      I got an oops when booting 2.6.19-rc5-mm1 on my ia64 machine.
      
      Below is the log.
      
      Oops 11012296146944 [1]
      Modules linked in: binfmt_misc dm_mirror dm_multipath dm_mod thermal processor f
      an container button sg eepro100 e100 mii
      
      Pid: 0, CPU 0, comm:              swapper
      psr : 0000121008022038 ifs : 800000000000040b ip  : [<a0000001000e1411>]    Not
      tainted
      ip is at __do_IRQ+0x371/0x3e0
      unat: 0000000000000000 pfs : 000000000000040b rsc : 0000000000000003
      rnat: 656960155aa56aa5 bsps: a00000010058b890 pr  : 656960155aa55a65
      ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
      csd : 0000000000000000 ssd : 0000000000000000
      b0  : a0000001000e1390 b6  : a0000001005beac0 b7  : e00000007f01aa00
      f6  : 000000000000000000000 f7  : 0ffe69090000000000000
      f8  : 1000a9090000000000000 f9  : 0ffff8000000000000000
      f10 : 1000a908ffffff6f70000 f11 : 1003e0000000000000909
      r1  : a000000100fbbff0 r2  : 0000000000010002 r3  : 0000000000010001
      r8  : fffffffffffbffff r9  : a000000100bd8060 r10 : a000000100dd83b8
      r11 : fffffffffffeffff r12 : a000000100bcbbb0 r13 : a000000100bc4000
      r14 : 0000000000010000 r15 : 0000000000010000 r16 : a000000100c01aa8
      r17 : a000000100d2c350 r18 : 0000000000000000 r19 : a000000100d2c300
      r20 : a000000100c01a88 r21 : 0000000080010100 r22 : a000000100c01ac0
      r23 : a0000001000108e0 r24 : e000000477980004 r25 : 0000000000000000
      r26 : 0000000000000000 r27 : e00000000913400c r28 : e0000004799ee51c
      r29 : e0000004778b87f0 r30 : a000000100d2c300 r31 : a00000010005c7e0
      
      Call Trace:
       [<a000000100014600>] show_stack+0x40/0xa0
                                      sp=a000000100bcb760 bsp=a000000100bc4f40
       [<a000000100014f00>] show_regs+0x840/0x880
                                      sp=a000000100bcb930 bsp=a000000100bc4ee8
       [<a000000100037fb0>] die+0x250/0x320
                                      sp=a000000100bcb930 bsp=a000000100bc4ea0
       [<a00000010005e5f0>] ia64_do_page_fault+0x8d0/0xa20
                                      sp=a000000100bcb950 bsp=a000000100bc4e50
       [<a00000010000caa0>] ia64_leave_kernel+0x0/0x290
                                      sp=a000000100bcb9e0 bsp=a000000100bc4e50
       [<a0000001000e1410>] __do_IRQ+0x370/0x3e0
                                      sp=a000000100bcbbb0 bsp=a000000100bc4df0
       [<a000000100011f50>] ia64_handle_irq+0x170/0x220
                                      sp=a000000100bcbbb0 bsp=a000000100bc4dc0
       [<a00000010000caa0>] ia64_leave_kernel+0x0/0x290
                                      sp=a000000100bcbbb0 bsp=a000000100bc4dc0
       [<a000000100012390>] ia64_pal_call_static+0x90/0xc0
                                      sp=a000000100bcbd80 bsp=a000000100bc4d78
       [<a000000100015630>] default_idle+0x90/0x160
                                      sp=a000000100bcbd80 bsp=a000000100bc4d58
       [<a000000100014290>] cpu_idle+0x1f0/0x440
                                      sp=a000000100bcbe20 bsp=a000000100bc4d18
       [<a000000100009980>] rest_init+0xc0/0xe0
                                      sp=a000000100bcbe20 bsp=a000000100bc4d00
       [<a0000001009f8ea0>] start_kernel+0x6a0/0x6c0
                                      sp=a000000100bcbe20 bsp=a000000100bc4ca0
       [<a0000001000089f0>] __end_ivt_text+0x6d0/0x6f0
                                      sp=a000000100bcbe30 bsp=a000000100bc4c00
       <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
      
      The root cause is that some irq_chip variables, especially ia64_msi_chip,
      initiate their memeber end to point to NULL. __do_IRQ doesn't check
      if irq_chip->end is null and just calls it after processing the interrupt.
      
      As irq_chip->end is called at many places, so I fix it by reinitiating
      irq_chip->end to dummy_irq_chip.end, e.g., a noop function.
      Signed-off-by: NZhang Yanmin <yanmin.zhang@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b86432b4
  4. 15 11月, 2006 1 次提交
  5. 13 11月, 2006 1 次提交
    • P
      [PATCH] Fix misrouted interrupts deadlocks · f72fa707
      Pavel Emelianov 提交于
      While testing kernel on machine with "irqpoll" option I've caught such a
      lockup:
      
      	__do_IRQ()
      	   spin_lock(&desc->lock);
                 desc->chip->ack(); /* IRQ is ACKed */
      	note_interrupt()
      	misrouted_irq()
      	handle_IRQ_event()
                 if (...)
      	      local_irq_enable_in_hardirq();
      	/* interrupts are enabled from now */
      	...
      	__do_IRQ() /* same IRQ we've started from */
      	   spin_lock(&desc->lock); /* LOCKUP */
      
      Looking at misrouted_irq() code I've found that a potential deadlock like
      this can also take place:
      
      1CPU:
      __do_IRQ()
         spin_lock(&desc->lock); /* irq = A */
      misrouted_irq()
         for (i = 1; i < NR_IRQS; i++) {
            spin_lock(&desc->lock); /* irq = B */
            if (desc->status & IRQ_INPROGRESS) {
      
      2CPU:
      __do_IRQ()
         spin_lock(&desc->lock); /* irq = B */
      misrouted_irq()
         for (i = 1; i < NR_IRQS; i++) {
            spin_lock(&desc->lock); /* irq = A */
            if (desc->status & IRQ_INPROGRESS) {
      
      As the second lock on both CPUs is taken before checking that this irq is
      being handled in another processor this may cause a deadlock.  This issue
      is only theoretical.
      
      I propose the attached patch to fix booth problems: when trying to handle
      misrouted IRQ active desc->lock may be unlocked.
      Acked-by: NIngo Molnar <mingo@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f72fa707
  6. 17 10月, 2006 1 次提交
  7. 12 10月, 2006 1 次提交
    • R
      [PATCH] bitmap: parse input from kernel and user buffers · 01a3ee2b
      Reinette Chatre 提交于
      lib/bitmap.c:bitmap_parse() is a library function that received as input a
      user buffer.  This seemed to have originated from the way the write_proc
      function of the /proc filesystem operates.
      
      This has been reworked to not use kmalloc and eliminates a lot of
      get_user() overhead by performing one access_ok before using __get_user().
      
      We need to test if we are in kernel or user space (is_user) and access the
      buffer differently.  We cannot use __get_user() to access kernel addresses
      in all cases, for example in architectures with separate address space for
      kernel and user.
      
      This function will be useful for other uses as well; for example, taking
      input for /sysfs instead of /proc, so it was changed to accept kernel
      buffers.  We have this use for the Linux UWB project, as part as the
      upcoming bandwidth allocator code.
      
      Only a few routines used this function and they were changed too.
      Signed-off-by: NReinette Chatre <reinette.chatre@linux.intel.com>
      Signed-off-by: NInaky Perez-Gonzalez <inaky@linux.intel.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Joe Korty <joe.korty@ccur.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      01a3ee2b
  8. 07 10月, 2006 1 次提交
  9. 05 10月, 2006 3 次提交
    • D
      IRQ: Maintain regs pointer globally rather than passing to IRQ handlers · 7d12e780
      David Howells 提交于
      Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
      of passing regs around manually through all ~1800 interrupt handlers in the
      Linux kernel.
      
      The regs pointer is used in few places, but it potentially costs both stack
      space and code to pass it around.  On the FRV arch, removing the regs parameter
      from all the genirq function results in a 20% speed up of the IRQ exit path
      (ie: from leaving timer_interrupt() to leaving do_IRQ()).
      
      Where appropriate, an arch may override the generic storage facility and do
      something different with the variable.  On FRV, for instance, the address is
      maintained in GR28 at all times inside the kernel as part of general exception
      handling.
      
      Having looked over the code, it appears that the parameter may be handed down
      through up to twenty or so layers of functions.  Consider a USB character
      device attached to a USB hub, attached to a USB controller that posts its
      interrupts through a cascaded auxiliary interrupt controller.  A character
      device driver may want to pass regs to the sysrq handler through the input
      layer which adds another few layers of parameter passing.
      
      I've build this code with allyesconfig for x86_64 and i386.  I've runtested the
      main part of the code on FRV and i386, though I can't test most of the drivers.
      I've also done partial conversion for powerpc and MIPS - these at least compile
      with minimal configurations.
      
      This will affect all archs.  Mostly the changes should be relatively easy.
      Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
      
      	struct pt_regs *old_regs = set_irq_regs(regs);
      
      And put the old one back at the end:
      
      	set_irq_regs(old_regs);
      
      Don't pass regs through to generic_handle_irq() or __do_IRQ().
      
      In timer_interrupt(), this sort of change will be necessary:
      
      	-	update_process_times(user_mode(regs));
      	-	profile_tick(CPU_PROFILING, regs);
      	+	update_process_times(user_mode(get_irq_regs()));
      	+	profile_tick(CPU_PROFILING);
      
      I'd like to move update_process_times()'s use of get_irq_regs() into itself,
      except that i386, alone of the archs, uses something other than user_mode().
      
      Some notes on the interrupt handling in the drivers:
      
       (*) input_dev() is now gone entirely.  The regs pointer is no longer stored in
           the input_dev struct.
      
       (*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking.  It does
           something different depending on whether it's been supplied with a regs
           pointer or not.
      
       (*) Various IRQ handler function pointers have been moved to type
           irq_handler_t.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      (cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
      7d12e780
    • D
      IRQ: Typedef the IRQ handler function type · da482792
      David Howells 提交于
      Typedef the IRQ handler function type.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      (cherry picked from 1356d1e5fd256997e3d3dce0777ab787d0515c7a commit)
      da482792
    • D
      IRQ: Typedef the IRQ flow handler function type · 57a58a94
      David Howells 提交于
      Typedef the IRQ flow handler function type.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      (cherry picked from 8e973fbdf5716b93a0a8c0365be33a31ca0fa351 commit)
      57a58a94
  10. 04 10月, 2006 4 次提交
    • E
      [PATCH] msi: simplify msi sanity checks by adding with generic irq code · 1f80025e
      Eric W. Biederman 提交于
      Currently msi.c is doing sanity checks that make certain before an irq is
      destroyed it has no more users.
      
      By adding irq_has_action I can perform the test is a generic way, instead of
      relying on a msi specific data structure.
      
      By performing the core check in dynamic_irq_cleanup I ensure every user of
      dynamic irqs has a test present and we don't free resources that are in use.
      
      In msi.c this allows me to kill the attrib.state member of msi_desc and all of
      the assciated code to maintain it.
      
      To keep from freeing data structures when irq cleanup code is called to soon
      changing dyanamic_irq_cleanup is insufficient because there are msi specific
      data structures that are also not safe to free.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Greg KH <greg@kroah.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1f80025e
    • E
      [PATCH] genirq: irq: add a dynamic irq creation API · 3a16d713
      Eric W. Biederman 提交于
      With the msi support comes a new concept in irq handling, irqs that are
      created dynamically at run time.
      
      Currently the msi code allocates irqs backwards.  First it allocates a
      platform dependent routing value for an interrupt the ``vector'' and then it
      figures out from the vector which irq you are on.
      
      This msi backwards allocator suffers from two basic problems.  The allocator
      suffers because it is trying to do something that is architecture specific in
      a generic way making it brittle, inflexible, and tied to tightly to the
      architecture implementation.  The alloctor also suffers from it's very
      backwards nature as it has tied things together that should have no
      dependencies.
      
      To solve the basic dynamic irq allocation problem two new architecture
      specific functions are added: create_irq and destroy_irq.
      
      create_irq takes no input and returns an unused irq number, that won't be
      reused until it is returned to the free poll with destroy_irq.  The irq then
      can be used for any purpose although the only initial consumer is the msi
      code.
      
      destroy_irq takes an irq number allocated with create_irq and returns it to
      the free pool.
      
      Making this functionality per architecture increases the simplicity of the irq
      allocation code and increases it's flexibility.
      
      dynamic_irq_init() and dynamic_irq_cleanup() are added to automate the
      irq_desc initializtion that should happen for dynamic irqs.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Rajesh Shah <rajesh.shah@intel.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3a16d713
    • E
      [PATCH] genirq: irq: add moved_masked_irq · e7b946e9
      Eric W. Biederman 提交于
      Currently move_native_irq disables and renables the irq we are migrating to
      ensure we don't take that irq when we are actually doing the migration
      operation.  Disabling the irq needs to happen but sometimes doing the work is
      move_native_irq is too late.
      
      On x86 with ioapics the irq move sequences needs to be:
      edge_triggered:
        mask irq.
        move irq.
        unmask irq.
        ack irq.
      level_triggered:
        mask irq.
        ack irq.
        move irq.
        unmask irq.
      
      We can easily perform the edge triggered sequence, with the current defintion
      of move_native_irq.  However the level triggered case does not map well.  For
      that I have added move_masked_irq, to allow me to disable the irqs around both
      the ack and the move.
      
      Q: Why have we not seen this problem earlier?
      
      A: The only symptom I have been able to reproduce is that if we change
         the vector before acknowleding an irq the wrong irq is acknowledged.
         Since we currently are not reprogramming the irq vector during
         migration no problems show up.
      
         We have to mask the irq before we acknowledge the irq or else we could
         hit a window where an irq is asserted just before we acknowledge it.
      
         Edge triggered irqs do not have this problem because acknowledgements
         do not propogate in the same way.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Rajesh Shah <rajesh.shah@intel.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e7b946e9
    • E
      [PATCH] genirq: irq: convert the move_irq flag from a 32bit word to a single bit · a24ceab4
      Eric W. Biederman 提交于
      The primary aim of this patchset is to remove maintenances problems caused by
      the irq infrastructure.  The two big issues I address are an artificially
      small cap on the number of irqs, and that MSI assumes vector == irq.  My
      primary focus is on x86_64 but I have touched other architectures where
      necessary to keep them from breaking.
      
      - To increase the number of irqs I modify the code to look at the (cpu,
        vector) pair instead of just looking at the vector.
      
        With a large number of irqs available systems with a large irq count no
        longer need to compress their irq numbers to fit.  Removing a lot of brittle
        special cases.
      
        For acpi guys the result is that irq == gsi.
      
      - Addressing the fact that MSI assumes irq == vector takes a few more
        patches.  But suffice it to say when I am done none of the generic irq code
        even knows what a vector is.
      
      In quick testing on a large Unisys x86_64 machine we stumbled over at least
      one driver that assumed that NR_IRQS could always fit into an 8 bit number.
      This driver is clearly buggy today.  But this has become a class of bugs that
      it is now much easier to hit.
      
      This patch:
      
      This is a minor space optimization.  In practice I don't think this has any
      affect because of our alignment constraints and the other fields but there is
      not point in chewing up an uncessary word and since we already read the flag
      field this should improve the cache hit ratio of the irq handler.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Rajesh Shah <rajesh.shah@intel.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: "Protasevich, Natalie" <Natalie.Protasevich@UNISYS.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a24ceab4
  11. 30 9月, 2006 2 次提交
  12. 26 9月, 2006 1 次提交
  13. 19 9月, 2006 1 次提交
    • I
      [PATCH] genirq core: fix handle_level_irq() · 86998aa6
      Ingo Molnar 提交于
      while porting the -rt tree to 2.6.18-rc7 i noticed the following
      screaming-IRQ scenario on an SMP system:
      
       2274  0Dn.:1 0.001ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.010ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.020ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.029ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.039ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.048ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.058ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.068ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.077ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.087ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.097ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
      
      as it turns out, the bug is caused by handle_level_irq(), which if it
      races with another CPU already handling this IRQ, it _unmasks_ the IRQ
      line on the way out. This is not how 2.6.17 works, and we introduced
      this bug in one of the early genirq cleanups right before it went into
      -mm. (the bug was not in the genirq patchset for a long time, and we
      didnt notice the bug due to the lack of -rt rebase to the new genirq
      code. -rt, and hardirq-preemption in particular opens up such races much
      wider than anything else.)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      86998aa6
  14. 17 9月, 2006 1 次提交
  15. 02 9月, 2006 1 次提交
  16. 01 8月, 2006 1 次提交
    • D
      [PATCH] genirq: {en,dis}able_irq_wake() need refcounting too · 15a647eb
      David Brownell 提交于
      IRQs need refcounting and a state flag to track whether the the IRQ should
      be enabled or disabled as a "normal IRQ" source after a series of calls to
      {en,dis}able_irq().  For shared IRQs, the IRQ must be enabled so long as at
      least one driver needs it active.
      
      Likewise, IRQs need the same support to track whether the IRQ should be
      enabled or disabled as a "wakeup event" source after a series of calls to
      {en,dis}able_irq_wake().  For shared IRQs, the IRQ must be enabled as a
      wakeup source during sleep so long as at least one driver needs it.  But
      right now they _don't have_ that refcounting ...  which means sharing a
      wakeup-capable IRQ can't work correctly in some configurations.
      
      This patch adds the refcount and flag mechanisms to set_irq_wake() -- which
      is what {en,dis}able_irq_wake() call -- and minimal documentation of what
      the irq wake mechanism does.
      
      Drivers relying on the older (broken) "toggle" semantics will trigger a
      warning; that'll be a handful of drivers on ARM systems.
      Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      15a647eb
  17. 04 7月, 2006 3 次提交
    • I
      [PATCH] lockdep: annotate enable_in_hardirq() · 366c7f55
      Ingo Molnar 提交于
      Make use of local_irq_enable_in_hardirq() API to annotate places that enable
      hardirqs in hardirq context.
      
      Has no effect on non-lockdep kernels.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      366c7f55
    • I
      [PATCH] lockdep: annotate genirq · 243c7621
      Ingo Molnar 提交于
      Teach special (recursive) locking code to the lock validator.  Has no effect
      on non-lockdep kernels.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      243c7621
    • I
      [PATCH] lockdep: core · fbb9ce95
      Ingo Molnar 提交于
      Do 'make oldconfig' and accept all the defaults for new config options -
      reboot into the kernel and if everything goes well it should boot up fine and
      you should have /proc/lockdep and /proc/lockdep_stats files.
      
      Typically if the lock validator finds some problem it will print out
      voluminous debug output that begins with "BUG: ..." and which syslog output
      can be used by kernel developers to figure out the precise locking scenario.
      
      What does the lock validator do?  It "observes" and maps all locking rules as
      they occur dynamically (as triggered by the kernel's natural use of spinlocks,
      rwlocks, mutexes and rwsems).  Whenever the lock validator subsystem detects a
      new locking scenario, it validates this new rule against the existing set of
      rules.  If this new rule is consistent with the existing set of rules then the
      new rule is added transparently and the kernel continues as normal.  If the
      new rule could create a deadlock scenario then this condition is printed out.
      
      When determining validity of locking, all possible "deadlock scenarios" are
      considered: assuming arbitrary number of CPUs, arbitrary irq context and task
      context constellations, running arbitrary combinations of all the existing
      locking scenarios.  In a typical system this means millions of separate
      scenarios.  This is why we call it a "locking correctness" validator - for all
      rules that are observed the lock validator proves it with mathematical
      certainty that a deadlock could not occur (assuming that the lock validator
      implementation itself is correct and its internal data structures are not
      corrupted by some other kernel subsystem).  [see more details and conditionals
      of this statement in include/linux/lockdep.h and
      Documentation/lockdep-design.txt]
      
      Furthermore, this "all possible scenarios" property of the validator also
      enables the finding of complex, highly unlikely multi-CPU multi-context races
      via single single-context rules, increasing the likelyhood of finding bugs
      drastically.  In practical terms: the lock validator already found a bug in
      the upstream kernel that could only occur on systems with 3 or more CPUs, and
      which needed 3 very unlikely code sequences to occur at once on the 3 CPUs.
      That bug was found and reported on a single-CPU system (!).  So in essence a
      race will be found "piecemail-wise", triggering all the necessary components
      for the race, without having to reproduce the race scenario itself!  In its
      short existence the lock validator found and reported many bugs before they
      actually caused a real deadlock.
      
      To further increase the efficiency of the validator, the mapping is not per
      "lock instance", but per "lock-class".  For example, all struct inode objects
      in the kernel have inode->inotify_mutex.  If there are 10,000 inodes cached,
      then there are 10,000 lock objects.  But ->inotify_mutex is a single "lock
      type", and all locking activities that occur against ->inotify_mutex are
      "unified" into this single lock-class.  The advantage of the lock-class
      approach is that all historical ->inotify_mutex uses are mapped into a single
      (and as narrow as possible) set of locking rules - regardless of how many
      different tasks or inode structures it took to build this set of rules.  The
      set of rules persist during the lifetime of the kernel.
      
      To see the rough magnitude of checking that the lock validator does, here's a
      portion of /proc/lockdep_stats, fresh after bootup:
      
       lock-classes:                            694 [max: 2048]
       direct dependencies:                  1598 [max: 8192]
       indirect dependencies:               17896
       all direct dependencies:             16206
       dependency chains:                    1910 [max: 8192]
       in-hardirq chains:                      17
       in-softirq chains:                     105
       in-process chains:                    1065
       stack-trace entries:                 38761 [max: 131072]
       combined max dependencies:         2033928
       hardirq-safe locks:                     24
       hardirq-unsafe locks:                  176
       softirq-safe locks:                     53
       softirq-unsafe locks:                  137
       irq-safe locks:                         59
       irq-unsafe locks:                      176
      
      The lock validator has observed 1598 actual single-thread locking patterns,
      and has validated all possible 2033928 distinct locking scenarios.
      
      More details about the design of the lock validator can be found in
      Documentation/lockdep-design.txt, which can also found at:
      
         http://redhat.com/~mingo/lockdep-patches/lockdep-design.txt
      
      [bunk@stusta.de: cleanups]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fbb9ce95
  18. 03 7月, 2006 4 次提交
  19. 02 7月, 2006 4 次提交
  20. 01 7月, 2006 1 次提交
  21. 30 6月, 2006 6 次提交