1. 21 12月, 2006 1 次提交
  2. 13 12月, 2006 1 次提交
  3. 12 12月, 2006 2 次提交
    • L
      Make sure we populate the initroot filesystem late enough · 8d610dd5
      Linus Torvalds 提交于
      We should not initialize rootfs before all the core initializers have
      run.  So do it as a separate stage just before starting the regular
      driver initializers.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8d610dd5
    • L
      Make SLES9 "get_kernel_version" work on the kernel binary again · 8993780a
      Linus Torvalds 提交于
      As reported by Andy Whitcroft, at least the SLES9 initrd build process
      depends on getting the kernel version from the kernel binary.  It does
      that by simply trawling the binary and looking for the signature of the
      "linux_banner" string (the string "Linux version " to be exact. Which
      is really broken in itself, but whatever..)
      
      That got broken when the string was changed to allow /proc/version to
      change the UTS release information dynamically, and "get_kernel_version"
      thus returned "%s" (see commit a2ee8649:
      "[PATCH] Fix linux banner utsname information").
      
      This just restores "linux_banner" as a static string, which should fix
      the version finding.  And /proc/version simply uses a different string.
      
      To avoid wasting even that miniscule amount of memory, the early boot
      string should really be marked __initdata, but that just causes the same
      bug in SLES9 to re-appear, since it will then find other occurrences of
      "Linux version " first.
      
      Cc: Andy Whitcroft <apw@shadowen.org>
      Acked-by: NHerbert Poetzl <herbert@13thfloor.at>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@osdl.org>
      Cc: Steve Fox <drfickle@us.ibm.com>
      Acked-by: NOlaf Hering <olaf@aepfle.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8993780a
  4. 11 12月, 2006 1 次提交
    • A
      [PATCH] io-accounting: core statistics · 7c3ab738
      Andrew Morton 提交于
      The present per-task IO accounting isn't very useful.  It simply counts the
      number of bytes passed into read() and write().  So if a process reads 1MB
      from an already-cached file, it is accused of having performed 1MB of I/O,
      which is wrong.
      
      (David Wright had some comments on the applicability of the present logical IO accounting:
      
        For billing purposes it is useless but for workload analysis it is very
        useful
      
        read_bytes/read_calls  average read request size
        write_bytes/write_calls average write request size
      
        read_bytes/read_blocks ie logical/physical can indicate hit rate or thrashing
        write_bytes/write_blocks  ie logical/physical  guess since pdflush writes can
                                                      be missed
      
        I often look for logical larger than physical to see filesystem cache
        problems.  And the bytes/cpusec can help find applications that are
        dominating the cache and causing slow interactive response from page cache
        contention.
      
        I want to find the IO intensive applications and make sure they are doing
        efficient IO.  Thus the acctcms(sysV) or csacms command would give the high
        IO commands).
      
      This patchset adds new accounting which tries to be more accurate.  We account
      for three things:
      
      reads:
      
        attempt to count the number of bytes which this process really did cause
        to be fetched from the storage layer.  Done at the submit_bio() level, so it
        is accurate for block-backed filesystems.  I also attempt to wire up NFS and
        CIFS.
      
      writes:
      
        attempt to count the number of bytes which this process caused to be sent
        to the storage layer.  This is done at page-dirtying time.
      
        The big inaccuracy here is truncate.  If a process writes 1MB to a file
        and then deletes the file, it will in fact perform no writeout.  But it will
        have been accounted as having caused 1MB of write.
      
        So...
      
      cancelled_writes:
      
        account the number of bytes which this process caused to not happen, by
        truncating pagecache.
      
        We _could_ just subtract this from the process's `write' accounting.  But
        that means that some processes would be reported to have done negative
        amounts of write IO, which is silly.
      
        So we just report the raw number and punt this decision up to userspace.
      
      Now, we _could_ account for writes at the physical I/O level.  But
      
      - This would require that we track memory-dirtying tasks at the per-page
        level (would require a new pointer in struct page).
      
      - It would mean that IO statistics for a process are usually only available
        long after that process has exitted.  Which means that we probably cannot
        communicate this info via taskstats.
      
      This patch:
      
      Wire up the kernel-private data structures and the accessor functions to
      manipulate them.
      
      Cc: Jay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: David Wright <daw@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7c3ab738
  5. 09 12月, 2006 2 次提交
  6. 08 12月, 2006 3 次提交
  7. 07 12月, 2006 1 次提交
  8. 02 12月, 2006 1 次提交
  9. 09 11月, 2006 1 次提交
    • E
      [PATCH] sysctl: Undeprecate sys_sysctl · 13bb7e37
      Eric W. Biederman 提交于
      The basic issue is that despite have been deprecated and warned about as a
      very bad thing in the man pages since its inception there are a few real
      users of sys_sysctl.  It was my assumption that because sysctl had been
      deprecated for all of 2.6 there would be no user space users by this point,
      so I initially gave sys_sysctl a very short deprecation period.
      
      Now that I know there are a few real users the only sane way to proceed
      with deprecation is to push the time limit out to a year or two work and
      work with distributions that have big testing pools like fedora core to
      find these last remaining users.
      
      Which means that the sys_sysctl interface needs to be maintained in the
      meantime.
      
      Since I have provided a technical measure that allows us to add new sysctl
      entries without reserving more binary numbers I believe that is enough to
      fix the sys_sysctl binary interface maintenance problems, because there is
      no longer a need to change the binary interface at all.
      
      Since the sys_sysctl implementation needs to stay around for a while and
      the worst of the maintenance issues that caused us to occasionally break
      the ABI have been addressed I don't see any advantage in continuing with
      the removal of sys_sysctl.
      
      So instead of merely increasing the deprecation period this patch removes
      the deprecation of sys_sysctl and modifies the kernel to compile the code
      in by default.
      
      With committing to maintain sys_sysctl we get all of the advantages of a
      fast interface for anything that needs it.  Currently sys_sysctl is about
      5x faster than /proc/sys, for the same string data.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Acked-by: NAlan Cox <alan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      13bb7e37
  10. 22 10月, 2006 1 次提交
    • J
      [PATCH] x86-64: Speed up dwarf2 unwinder · 690a973f
      Jan Beulich 提交于
      This changes the dwarf2 unwinder to do a binary search for CIEs
      instead of a linear work. The linker is unfortunately not
      able to build a proper lookup table at link time, instead it creates
      one at runtime as soon as the bootmem allocator is usable (so you'll continue
      using the linear lookup for the first [hopefully] few calls).
      The code should be ready to utilize a build-time created table once
      a fixed linker becomes available.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      690a973f
  11. 21 10月, 2006 1 次提交
    • P
      [PATCH] uml: use DEFCONFIG_LIST to avoid reading host's config · b2670eac
      Paolo 'Blaisorblade' Giarrusso 提交于
      This should make sure that, for UML, host's configuration files are not
      considered, which avoids various pains to the user.  Our dependency are such
      that the obtained Kconfig will be valid and will lead to successful
      compilation - however they cannot prevent an user from disabling any boot
      device, and if an option is not set in the read .config (say
      /boot/config-XXX), with make menuconfig ARCH=um, it is not set.  This always
      disables UBD and all console I/O channels, which leads to non-working UML
      kernels, so this bothers users - especially now, since it will happen on
      almost every machine (/boot/config-`uname -r` exists almost on every machine).
       It can be workarounded with make defconfig ARCH=um, but it is non-obvious and
      can be avoided, so please _do_ merge this patch.
      
      Given the existence of options, it could be interesting to implement
      (additionally) "option required" - with it, Kconfig will refuse reading a
      .config file (from wherever it comes) if the given option is not set.  With
      this, one could mark with it the option characteristic of the given
      architecture (it was an old proposal of Roman Zippel, when I pointed out our
      problem):
      
      config UML
      	option required
      	default y
      
      However this should be further discussed:
      *) for x86, it must support constructs like:
      
      ==arch/i386/Kconfig==
      config 64BIT
      	option required
      	default n
      where Kconfig must require that CONFIG_64BIT is disabled or not present in the
      read .config.
      
      *) do we want to do such checks only for the starting defconfig or also for
         .config? Which leads to:
      *) I may want to port a x86_64 .config to x86 and viceversa, or even among more
         different archs. Should that be allowed, and in which measure (the user may
         force skipping the check for a .config or it is only given a warning by
         default)?
      
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: <kbuild-devel@lists.sourceforge.net>
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b2670eac
  12. 04 10月, 2006 1 次提交
  13. 03 10月, 2006 1 次提交
  14. 02 10月, 2006 4 次提交
    • C
      [PATCH] replace cad_pid by a struct pid · 9ec52099
      Cedric Le Goater 提交于
      There are a few places in the kernel where the init task is signaled.  The
      ctrl+alt+del sequence is one them.  It kills a task, usually init, using a
      cached pid (cad_pid).
      
      This patch replaces the pid_t by a struct pid to avoid pid wrap around
      problem.  The struct pid is initialized at boot time in init() and can be
      modified through systctl with
      
      	/proc/sys/kernel/cad_pid
      
      [ I haven't found any distro using it ? ]
      
      It also introduces a small helper routine kill_cad_pid() which is used
      where it seemed ok to use cad_pid instead of pid 1.
      
      [akpm@osdl.org: cleanups, build fix]
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9ec52099
    • A
      [PATCH] introduce kernel_execve · 67608567
      Arnd Bergmann 提交于
      The use of execve() in the kernel is dubious, since it relies on the
      __KERNEL_SYSCALLS__ mechanism that stores the result in a global errno
      variable.  As a first step of getting rid of this, change all users to a
      global kernel_execve function that returns a proper error code.
      
      This function is a terrible hack, and a later patch removes it again after the
      kernel syscalls are gone.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      67608567
    • K
      [PATCH] IPC namespace core · 25b21cb2
      Kirill Korotaev 提交于
      This patch set allows to unshare IPCs and have a private set of IPC objects
      (sem, shm, msg) inside namespace.  Basically, it is another building block of
      containers functionality.
      
      This patch implements core IPC namespace changes:
      - ipc_namespace structure
      - new config option CONFIG_IPC_NS
      - adds CLONE_NEWIPC flag
      - unshare support
      
      [clg@fr.ibm.com: small fix for unshare of ipc namespace]
      [akpm@osdl.org: build fix]
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Signed-off-by: NKirill Korotaev <dev@openvz.org>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      25b21cb2
    • S
      [PATCH] namespaces: utsname: implement utsname namespaces · 4865ecf1
      Serge E. Hallyn 提交于
      This patch defines the uts namespace and some manipulators.
      Adds the uts namespace to task_struct, and initializes a
      system-wide init namespace.
      
      It leaves a #define for system_utsname so sysctl will compile.
      This define will be removed in a separate patch.
      
      [akpm@osdl.org: build fix, cleanup]
      Signed-off-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Kirill Korotaev <dev@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: Andrey Savochkin <saw@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4865ecf1
  15. 01 10月, 2006 4 次提交
    • J
      [PATCH] csa: Extended system accounting over taskstats · 9acc1853
      Jay Lan 提交于
      Add extended system accounting handling over taskstats interface.  A
      CONFIG_TASK_XACCT flag is created to enable the extended accounting code.
      Signed-off-by: NJay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9acc1853
    • R
      [PATCH] fix EMBEDDED + SYSCTL menu · 0847062a
      Randy Dunlap 提交于
      SYSCTL should still depend on EMBEDDED.  This unbreaks the EMBEDDED menu
      (from the recent SYSCTL_SYCALL menu option patch).
      
      Fix typos in new SYSCTL_SYSCALL menu.
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0847062a
    • R
      [PATCH] allow /proc/config.gz to be built as a module · f2443ab6
      Ross Biro 提交于
      The driver for /proc/config.gz consumes rather a lot of memory and it is in
      fact possible to build it as a module.
      
      In some ways this is a bit risky, because the .config which is used for
      compiling kernel/configs.c isn't necessarily the same as the .config which was
      used to build vmlinux.
      
      But OTOH the potential memory savings are decent, and it'd be fairly dumb to
      build your configs.o with a different .config.
      Signed-off-by: NAndrew Morton <akpm@google.com>
      Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f2443ab6
    • D
      [PATCH] BLOCK: Make it possible to disable the block layer [try #6] · 9361401e
      David Howells 提交于
      Make it possible to disable the block layer.  Not all embedded devices require
      it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
      the block layer to be present.
      
      This patch does the following:
      
       (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
           support.
      
       (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
           an item that uses the block layer.  This includes:
      
           (*) Block I/O tracing.
      
           (*) Disk partition code.
      
           (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
      
           (*) The SCSI layer.  As far as I can tell, even SCSI chardevs use the
           	 block layer to do scheduling.  Some drivers that use SCSI facilities -
           	 such as USB storage - end up disabled indirectly from this.
      
           (*) Various block-based device drivers, such as IDE and the old CDROM
           	 drivers.
      
           (*) MTD blockdev handling and FTL.
      
           (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
           	 taking a leaf out of JFFS2's book.
      
       (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
           linux/elevator.h contingent on CONFIG_BLOCK being set.  sector_div() is,
           however, still used in places, and so is still available.
      
       (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
           parts of linux/fs.h.
      
       (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
      
       (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
      
       (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
           is not enabled.
      
       (*) fs/no-block.c is created to hold out-of-line stubs and things that are
           required when CONFIG_BLOCK is not set:
      
           (*) Default blockdev file operations (to give error ENODEV on opening).
      
       (*) Makes some /proc changes:
      
           (*) /proc/devices does not list any blockdevs.
      
           (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
      
       (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
      
       (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
           given command other than Q_SYNC or if a special device is specified.
      
       (*) In init/do_mounts.c, no reference is made to the blockdev routines if
           CONFIG_BLOCK is not defined.  This does not prohibit NFS roots or JFFS2.
      
       (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
           error ENOSYS by way of cond_syscall if so).
      
       (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
           CONFIG_BLOCK is not set, since they can't then happen.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9361401e
  16. 27 9月, 2006 2 次提交
  17. 26 9月, 2006 3 次提交
    • A
      [PATCH] Move unwind_init earlier · c9538ed4
      Andi Kleen 提交于
      Needed for use of the unwinder in lockdep, because lockdep runs really
      early too.
      
      Cc: jbeulich@novell.com
      Signed-off-by: NAndi Kleen <ak@suse.de>
      c9538ed4
    • R
      [PATCH] Allow early_param and identical __setup to exist · 33df0d19
      Rusty Russell 提交于
      We currently assume that boot parameters which are handled by
      early_param() will not overlap boot parameters handled by __setup: if
      they do, behaviour is dependent on link order, usually meaning __setup
      will not get called.
      
      ACPI wants to use early_param("pci"), and pci uses __setup("pci="), so
      we modify the core to let them coexist: "pci=noacpi" will now get
      passed to both.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      33df0d19
    • G
      Driver Core: add ability for drivers to do a threaded probe · d779249e
      Greg Kroah-Hartman 提交于
      This adds the infrastructure for drivers to do a threaded probe, and
      waits at init time for all currently outstanding probes to complete.
      
      A new kernel thread will be created when the probe() function for the
      driver is called, if the multithread_probe bit is set in the driver
      saying it can support this kind of operation.
      
      I have tested this with USB and PCI, and it works, and shaves off a lot
      of time in the boot process, but there are issues with finding root boot
      disks, and some USB drivers assume that this can never happen, so it is
      currently not enabled for any bus type.  Individual drivers can enable
      this right now if they wish, and bus authors can selectivly turn it on
      as well, once they determine that their subsystem will work properly
      with it.
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      d779249e
  18. 17 9月, 2006 1 次提交
  19. 15 7月, 2006 3 次提交
  20. 04 7月, 2006 6 次提交
    • I
      [PATCH] lockdep: annotate genirq · 243c7621
      Ingo Molnar 提交于
      Teach special (recursive) locking code to the lock validator.  Has no effect
      on non-lockdep kernels.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      243c7621
    • I
      [PATCH] lockdep: core · fbb9ce95
      Ingo Molnar 提交于
      Do 'make oldconfig' and accept all the defaults for new config options -
      reboot into the kernel and if everything goes well it should boot up fine and
      you should have /proc/lockdep and /proc/lockdep_stats files.
      
      Typically if the lock validator finds some problem it will print out
      voluminous debug output that begins with "BUG: ..." and which syslog output
      can be used by kernel developers to figure out the precise locking scenario.
      
      What does the lock validator do?  It "observes" and maps all locking rules as
      they occur dynamically (as triggered by the kernel's natural use of spinlocks,
      rwlocks, mutexes and rwsems).  Whenever the lock validator subsystem detects a
      new locking scenario, it validates this new rule against the existing set of
      rules.  If this new rule is consistent with the existing set of rules then the
      new rule is added transparently and the kernel continues as normal.  If the
      new rule could create a deadlock scenario then this condition is printed out.
      
      When determining validity of locking, all possible "deadlock scenarios" are
      considered: assuming arbitrary number of CPUs, arbitrary irq context and task
      context constellations, running arbitrary combinations of all the existing
      locking scenarios.  In a typical system this means millions of separate
      scenarios.  This is why we call it a "locking correctness" validator - for all
      rules that are observed the lock validator proves it with mathematical
      certainty that a deadlock could not occur (assuming that the lock validator
      implementation itself is correct and its internal data structures are not
      corrupted by some other kernel subsystem).  [see more details and conditionals
      of this statement in include/linux/lockdep.h and
      Documentation/lockdep-design.txt]
      
      Furthermore, this "all possible scenarios" property of the validator also
      enables the finding of complex, highly unlikely multi-CPU multi-context races
      via single single-context rules, increasing the likelyhood of finding bugs
      drastically.  In practical terms: the lock validator already found a bug in
      the upstream kernel that could only occur on systems with 3 or more CPUs, and
      which needed 3 very unlikely code sequences to occur at once on the 3 CPUs.
      That bug was found and reported on a single-CPU system (!).  So in essence a
      race will be found "piecemail-wise", triggering all the necessary components
      for the race, without having to reproduce the race scenario itself!  In its
      short existence the lock validator found and reported many bugs before they
      actually caused a real deadlock.
      
      To further increase the efficiency of the validator, the mapping is not per
      "lock instance", but per "lock-class".  For example, all struct inode objects
      in the kernel have inode->inotify_mutex.  If there are 10,000 inodes cached,
      then there are 10,000 lock objects.  But ->inotify_mutex is a single "lock
      type", and all locking activities that occur against ->inotify_mutex are
      "unified" into this single lock-class.  The advantage of the lock-class
      approach is that all historical ->inotify_mutex uses are mapped into a single
      (and as narrow as possible) set of locking rules - regardless of how many
      different tasks or inode structures it took to build this set of rules.  The
      set of rules persist during the lifetime of the kernel.
      
      To see the rough magnitude of checking that the lock validator does, here's a
      portion of /proc/lockdep_stats, fresh after bootup:
      
       lock-classes:                            694 [max: 2048]
       direct dependencies:                  1598 [max: 8192]
       indirect dependencies:               17896
       all direct dependencies:             16206
       dependency chains:                    1910 [max: 8192]
       in-hardirq chains:                      17
       in-softirq chains:                     105
       in-process chains:                    1065
       stack-trace entries:                 38761 [max: 131072]
       combined max dependencies:         2033928
       hardirq-safe locks:                     24
       hardirq-unsafe locks:                  176
       softirq-safe locks:                     53
       softirq-unsafe locks:                  137
       irq-safe locks:                         59
       irq-unsafe locks:                      176
      
      The lock validator has observed 1598 actual single-thread locking patterns,
      and has validated all possible 2033928 distinct locking scenarios.
      
      More details about the design of the lock validator can be found in
      Documentation/lockdep-design.txt, which can also found at:
      
         http://redhat.com/~mingo/lockdep-patches/lockdep-design.txt
      
      [bunk@stusta.de: cleanups]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fbb9ce95
    • I
      [PATCH] lockdep: better lock debugging · 9a11b49a
      Ingo Molnar 提交于
      Generic lock debugging:
      
       - generalized lock debugging framework. For example, a bug in one lock
         subsystem turns off debugging in all lock subsystems.
      
       - got rid of the caller address passing (__IP__/__IP_DECL__/etc.) from
         the mutex/rtmutex debugging code: it caused way too much prototype
         hackery, and lockdep will give the same information anyway.
      
       - ability to do silent tests
      
       - check lock freeing in vfree too.
      
       - more finegrained debugging options, to allow distributions to
         turn off more expensive debugging features.
      
      There's no separate 'held mutexes' list anymore - but there's a 'held locks'
      stack within lockdep, which unifies deadlock detection across all lock
      classes.  (this is independent of the lockdep validation stuff - lockdep first
      checks whether we are holding a lock already)
      
      Here are the current debugging options:
      
      CONFIG_DEBUG_MUTEXES=y
      CONFIG_DEBUG_LOCK_ALLOC=y
      
      which do:
      
       config DEBUG_MUTEXES
                bool "Mutex debugging, basic checks"
      
       config DEBUG_LOCK_ALLOC
               bool "Detect incorrect freeing of live mutexes"
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9a11b49a
    • H
      [PATCH] lockdep: console_init after local_irq_enable() · 93e02814
      Heiko Carstens 提交于
      s390's console_init must enable interrupts, but early_boot_irqs_on() gets
      called later.  To avoid problems move console_init() after local_irq_enable().
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      93e02814
    • J
      [PATCH] time initialisation fix · 88fecaa2
      john stultz 提交于
      We're not reay to take a timer interrupt until timekeeping_init() has run.
      But time_init() will start the time interrupt and if it is called with
      local interrupts enabled we'll immediately take an interrupt and die.
      
      Fix that by running timekeeping_init() prior to time_init().
      
      We don't know _why_ local interrupts got enabled on Jesse Brandeburg's
      machine.  That's a separate as-yet-unsolved problem.  THe patch adds a little
      bit of debugging to detect that.
      
      This whole requirement that local interrupts be held off during early boot
      keeps on biting us.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Jesse Brandeburg <jesse.brandeburg@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      88fecaa2
    • S
      kbuild: introduce utsrelease.h · 63104eec
      Sam Ravnborg 提交于
      include/linux/version.h contained both actual KERNEL version
      and UTS_RELEASE that contains a subset from git SHA1 for when
      kernel was compiled as part of a git repository.
      This had the unfortunate side-effect that all files including version.h
      would be recompiled when some git changes was made due to changes SHA1.
      Split it out so we keep independent parts in separate files.
      
      Also update checkversion.pl script to no longer check for UTS_RELEASE.
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      63104eec