1. 15 5月, 2007 1 次提交
    • L
      Revert "ipmi: add new IPMI nmi watchdog handling" · faa8b6c3
      Linus Torvalds 提交于
      This reverts commit f64da958.
      
      Andi Kleen is unhappy with the changes, and they really do not seem
      worth it.  IPMI could use DIE_NMI_IPI instead of the new callback, even
      though that ends up having its own set of problems too, mainly because
      the IPMI code cannot really know the NMI was from IPMI or not.
      
      Manually fix up conflicts in arch/x86_64/kernel/traps.c and
      drivers/char/ipmi/ipmi_watchdog.c.
      
      Cc: Andi Kleen <ak@suse.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Corey Minyard <minyard@acm.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      faa8b6c3
  2. 13 5月, 2007 1 次提交
  3. 12 5月, 2007 1 次提交
  4. 11 5月, 2007 4 次提交
  5. 10 5月, 2007 7 次提交
  6. 09 5月, 2007 18 次提交
    • U
      fix file specification in comments · 58862699
      Uwe Kleine-König 提交于
      Many files include the filename at the beginning, serveral used a wrong one.
      Signed-off-by: NUwe Kleine-König <ukleinek@informatik.uni-freiburg.de>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      58862699
    • R
      Kconfig: A couple of grammatical fixes in arch/i386/Kconfig · fd2dbc92
      Robert P. J. Day 提交于
      Fix a couple grammatical errors in arch/i386/Kconfig.
      Signed-off-by: NRobert P. J. Day <rpjday@mindspring.com>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      fd2dbc92
    • D
      Fix trivial typos in Kconfig* files · 3dde6ad8
      David Sterba 提交于
      Fix several typos in help text in Kconfig* files.
      Signed-off-by: NDavid Sterba <dave@jikos.cz>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      3dde6ad8
    • L
      Revert "fbdev: ignore VESA modes if framebuffer is disabled" · 01e73be3
      Linus Torvalds 提交于
      This reverts commit 464bdd33.
      
      Peter Anvin correctly points out that VESA modes have nothing to do with
      frame buffers per se - they are often just regular extended text modes.
      Disabling them just because we don't have frame buffer support is very
      wrong.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Antonino A. Daplas <adaplas@gmail.com>,
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01e73be3
    • A
      fbdev: ignore VESA modes if framebuffer is disabled · 464bdd33
      Antonino A. Daplas 提交于
      If the option vga=<VESA graphics mode> is added to the boot parameter, it will
      activate graphics mode, but without any framebuffer support, the user is left
      with an unusable display.
      
      Change the behavior such that the user is instead prompted for another mode
      (ala vga=ask).
      
      NOTE: People can always use vbetool to set a graphics mode if this is really
      desired, but the number of people doing this approaches zero.
      Signed-off-by: NAntonino Daplas <adaplas@pol.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      464bdd33
    • B
      x86, serial: convert legacy COM ports to platform devices · 7e92b4fc
      Bjorn Helgaas 提交于
      Make x86 COM ports into platform devices and don't probe for them
      if we have PNP.
      
      This prevents double discovery, where a device was found both by
      the legacy probe and by 8250_pnp, e.g.,
      
          serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
          00:02: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
      
      This also means IRDA devices without a UART PNP ID will no longer be
      claimed by the serial driver, which might require changes in IRDA
      drivers and administration.
      
      In addition to this patch, you may need to configure a setserial init
      script, e.g., /etc/init.d/setserial, so it doesn't poke legacy UART
      stuff back in.  On Debian, "dpkg-reconfigure setserial" with the "kernel"
      option does this.
      
      To force the old legacy probe behavior even when we have PNPBIOS or
      ACPI, load the new legacy_serial module (or build 8250 static) with
      the "legacy_serial.force" option.
      
      [akpm@linux-foundation.org: fix makefiles]
      Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Keith Owens <kaos@ocs.com.au>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Adam Belay <ambx1@neo.rr.com>
      Cc: Matthieu CASTET <castet.matthieu@free.fr>
      Cc: Jean Tourrilhes <jt@hpl.hp.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Ville Syrjala <syrjala@sci.fi>
      Cc: Russell King <rmk+serial@arm.linux.org.uk>
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e92b4fc
    • B
      Add IRQF_IRQPOLL flag on i386 · b34942fe
      Bernhard Walle 提交于
      Add IRQF_IRQPOLL to timer interrupts on i386.
      Signed-off-by: NBernhard Walle <bwalle@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b34942fe
    • A
      Kprobes: The ON/OFF knob thru debugfs · bf8f6e5b
      Ananth N Mavinakayanahalli 提交于
      This patch provides a debugfs knob to turn kprobes on/off
      
      o A new file /debug/kprobes/enabled indicates if kprobes is enabled or
        not (default enabled)
      o Echoing 0 to this file will disarm all installed probes
      o Any new probe registration when disabled will register the probe but
        not arm it. A message will be printed out in such a case.
      o When a value 1 is echoed to the file, all probes (including ones
        registered in the intervening period) will be enabled
      o Unregistration will happen irrespective of whether probes are globally
        enabled or not.
      o Update Documentation/kprobes.txt to reflect these changes. While there
        also update the doc to make it current.
      
      We are also looking at providing sysrq key support to tie to the disabling
      feature provided by this patch.
      
      [akpm@linux-foundation.org: Use bool like a bool!]
      [akpm@linux-foundation.org: add printk facility levels]
      [cornelia.huck@de.ibm.com: Add the missing arch_trampoline_kprobe() for s390]
      Signed-off-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Signed-off-by: NSrinivasa DS <srinivasa@in.ibm.com>
      Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bf8f6e5b
    • C
      kprobes: kretprobes simplifications · 4c4308cb
      Christoph Hellwig 提交于
       - consolidate duplicate code in all arch_prepare_kretprobe instances
         into common code
       - replace various odd helpers that use hlist_for_each_entry to get
         the first elemenet of a list with either a hlist_for_each_entry_save
         or an opencoded access to the first element in the caller
       - inline add_rp_inst into it's only remaining caller
       - use kretprobe_inst_table_head instead of opencoding it
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
      Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4c4308cb
    • U
      utimensat implementation · 1c710c89
      Ulrich Drepper 提交于
      Implement utimensat(2) which is an extension to futimesat(2) in that it
      
      a) supports nano-second resolution for the timestamps
      b) allows to selectively ignore the atime/mtime value
      c) allows to selectively use the current time for either atime or mtime
      d) supports changing the atime/mtime of a symlink itself along the lines
         of the BSD lutimes(3) functions
      
      For this change the internally used do_utimes() functions was changed to
      accept a timespec time value and an additional flags parameter.
      
      Additionally the sys_utime function was changed to match compat_sys_utime
      which already use do_utimes instead of duplicating the work.
      
      Also, the completely missing futimensat() functionality is added.  We have
      such a function in glibc but we have to resort to using /proc/self/fd/* which
      not everybody likes (chroot etc).
      
      Test application (the syscall number will need per-arch editing):
      
      #include <errno.h>
      #include <fcntl.h>
      #include <time.h>
      #include <sys/time.h>
      #include <stddef.h>
      #include <syscall.h>
      
      #define __NR_utimensat 280
      
      #define UTIME_NOW       ((1l << 30) - 1l)
      #define UTIME_OMIT      ((1l << 30) - 2l)
      
      int
      main(void)
      {
        int status = 0;
      
        int fd = open("ttt", O_RDWR|O_CREAT|O_EXCL, 0666);
        if (fd == -1)
          error (1, errno, "failed to create test file \"ttt\"");
      
        struct stat64 st1;
        if (fstat64 (fd, &st1) != 0)
          error (1, errno, "fstat failed");
      
        struct timespec t[2];
        t[0].tv_sec = 0;
        t[0].tv_nsec = 0;
        t[1].tv_sec = 0;
        t[1].tv_nsec = 0;
        if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
          error (1, errno, "utimensat failed");
      
        struct stat64 st2;
        if (fstat64 (fd, &st2) != 0)
          error (1, errno, "fstat failed");
      
        if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
          {
            puts ("atim not reset to zero");
            status = 1;
          }
        if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
          {
            puts ("mtim not reset to zero");
            status = 1;
          }
        if (status != 0)
          goto out;
      
        t[0] = st1.st_atim;
        t[1].tv_sec = 0;
        t[1].tv_nsec = UTIME_OMIT;
        if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
          error (1, errno, "utimensat failed");
      
        if (fstat64 (fd, &st2) != 0)
          error (1, errno, "fstat failed");
      
        if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
            || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
          {
            puts ("atim not set");
            status = 1;
          }
        if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
          {
            puts ("mtim changed from zero");
            status = 1;
          }
        if (status != 0)
          goto out;
      
        t[0].tv_sec = 0;
        t[0].tv_nsec = UTIME_OMIT;
        t[1] = st1.st_mtim;
        if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
          error (1, errno, "utimensat failed");
      
        if (fstat64 (fd, &st2) != 0)
          error (1, errno, "fstat failed");
      
        if (st2.st_atim.tv_sec != st1.st_atim.tv_sec
            || st2.st_atim.tv_nsec != st1.st_atim.tv_nsec)
          {
            puts ("mtim changed from original time");
            status = 1;
          }
        if (st2.st_mtim.tv_sec != st1.st_mtim.tv_sec
            || st2.st_mtim.tv_nsec != st1.st_mtim.tv_nsec)
          {
            puts ("mtim not set");
            status = 1;
          }
        if (status != 0)
          goto out;
      
        sleep (2);
      
        t[0].tv_sec = 0;
        t[0].tv_nsec = UTIME_NOW;
        t[1].tv_sec = 0;
        t[1].tv_nsec = UTIME_NOW;
        if (syscall(__NR_utimensat, AT_FDCWD, "ttt", t, 0) != 0)
          error (1, errno, "utimensat failed");
      
        if (fstat64 (fd, &st2) != 0)
          error (1, errno, "fstat failed");
      
        struct timeval tv;
        gettimeofday(&tv,NULL);
      
        if (st2.st_atim.tv_sec <= st1.st_atim.tv_sec
            || st2.st_atim.tv_sec > tv.tv_sec)
          {
            puts ("atim not set to NOW");
            status = 1;
          }
        if (st2.st_mtim.tv_sec <= st1.st_mtim.tv_sec
            || st2.st_mtim.tv_sec > tv.tv_sec)
          {
            puts ("mtim not set to NOW");
            status = 1;
          }
      
        if (symlink ("ttt", "tttsym") != 0)
          error (1, errno, "cannot create symlink");
      
        t[0].tv_sec = 0;
        t[0].tv_nsec = 0;
        t[1].tv_sec = 0;
        t[1].tv_nsec = 0;
        if (syscall(__NR_utimensat, AT_FDCWD, "tttsym", t, AT_SYMLINK_NOFOLLOW) != 0)
          error (1, errno, "utimensat failed");
      
        if (lstat64 ("tttsym", &st2) != 0)
          error (1, errno, "lstat failed");
      
        if (st2.st_atim.tv_sec != 0 || st2.st_atim.tv_nsec != 0)
          {
            puts ("symlink atim not reset to zero");
            status = 1;
          }
        if (st2.st_mtim.tv_sec != 0 || st2.st_mtim.tv_nsec != 0)
          {
            puts ("symlink mtim not reset to zero");
            status = 1;
          }
        if (status != 0)
          goto out;
      
        t[0].tv_sec = 1;
        t[0].tv_nsec = 0;
        t[1].tv_sec = 1;
        t[1].tv_nsec = 0;
        if (syscall(__NR_utimensat, fd, NULL, t, 0) != 0)
          error (1, errno, "utimensat failed");
      
        if (fstat64 (fd, &st2) != 0)
          error (1, errno, "fstat failed");
      
        if (st2.st_atim.tv_sec != 1 || st2.st_atim.tv_nsec != 0)
          {
            puts ("atim not reset to one");
            status = 1;
          }
        if (st2.st_mtim.tv_sec != 1 || st2.st_mtim.tv_nsec != 0)
          {
            puts ("mtim not reset to one");
            status = 1;
          }
      
        if (status == 0)
           puts ("all OK");
      
       out:
        close (fd);
        unlink ("ttt");
        unlink ("tttsym");
      
        return status;
      }
      
      [akpm@linux-foundation.org: add missing i386 syscall table entry]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@openvz.org>
      Cc: Michael Kerrisk <mtk-manpages@gmx.net>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c710c89
    • G
      dma_declare_coherent_memory wrong allocation · b247e8aa
      Guennadi Liakhovetski 提交于
      dma_declare_coherent_memory() allocates a bitmap 1 bit per page, it
      calculates the bitmap size based on size of long, but allocates bytes...
      Thanks to James Bottomley for clarifications and corrections.
      Signed-off-by: NG. Liakhovetski <g.liakhovetski@gmx.de>
      Acked-by: NJames Bottomley <James.Bottomley@SteelEye.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b247e8aa
    • A
      apm: fix incorrect comment · c9919380
      Alan Cox 提交于
      HZ has not always been 100Hz for some time.
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c9919380
    • B
      EFI: warn only for pre-1.00 system tables · 873ec746
      Bjorn Helgaas 提交于
      We used to warn unless the EFI system table major revision was exactly 1.
      But EFI 2.00 firmware is starting to appear, and the 2.00 changes don't
      affect anything in Linux.
      Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      873ec746
    • A
      Kprobes: print details of kretprobe on assertion failure · 0f95b7fc
      Ananth N Mavinakayanahalli 提交于
      In certain cases like when the real return address can't be found or when
      the number of tracked calls to a kretprobed function is less than the
      number of returns, we may not be able to find the correct return address
      after processing a kretprobe.  Currently we just do a BUG_ON, but no
      information is provided about the actual failing kretprobe.
      
      Print out details of the kretprobe before calling BUG().
      Signed-off-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
      Cc: Jim Keniston <jkenisto@us.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Maneesh Soni <maneesh@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f95b7fc
    • R
      header cleaning: don't include smp_lock.h when not used · e63340ae
      Randy Dunlap 提交于
      Remove includes of <linux/smp_lock.h> where it is not used/needed.
      Suggested by Al Viro.
      
      Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
      sparc64, and arm (all 59 defconfigs).
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e63340ae
    • C
      move die notifier handling to common code · 1eeb66a1
      Christoph Hellwig 提交于
      This patch moves the die notifier handling to common code.  Previous
      various architectures had exactly the same code for it.  Note that the new
      code is compiled unconditionally, this should be understood as an appel to
      the other architecture maintainer to implement support for it aswell (aka
      sprinkling a notify_die or two in the proper place)
      
      arm had a notifiy_die that did something totally different, I renamed it to
      arm_notify_die as part of the patch and made it static to the file it's
      declared and used at.  avr32 used to pass slightly less information through
      this interface and I brought it into line with the other architectures.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
      [bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NBryan Wu <bryan.wu@analog.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1eeb66a1
    • C
      ipmi: add new IPMI nmi watchdog handling · f64da958
      Corey Minyard 提交于
      Convert over to the new NMI handling for getting IPMI watchdog timeouts via an
      NMI.  This add config options to know if there is the ability to receive NMIs
      and if it has an NMI post processing call.  Then it modifies the IPMI watchdog
      to take advantage of this so that it can know if an NMI comes in.
      
      It also adds testing that the IPMI NMI watchdog works.
      Signed-off-by: NCorey Minyard <minyard@acm.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f64da958
    • A
      use SLAB_PANIC flag cleanup · 0e6b9c98
      Akinobu Mita 提交于
      Use SLAB_PANIC and delete duplicated panic().
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e6b9c98
  7. 08 5月, 2007 4 次提交
    • N
      i386: Use functions from library in msr driver · 78a62d2c
      Nicolas Boichat 提交于
      Use safe MSR functions provided by arch/*/lib/msr-on-cpu.c in
      arch/i386/kernel/msr.c.
      Signed-off-by: NNicolas Boichat <nicolas@boichat.ch>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NJean Delvare <khali@linux-fr.org>
      78a62d2c
    • R
      i386: Add safe variants of rdmsr_on_cpu and wrmsr_on_cpu · 4e9baad8
      Rudolf Marek 提交于
      Add safe (exception handled) variants of rdmsr_on_cpu and wrmsr_on_cpu.
      You should use these when the target MSR may not actually exist, as
      doing so could trigger an exception which the regular functions do not
      handle. The safe variants are slower, though.
      
      The upcoming coretemp hardware monitoring driver will need this.
      Signed-off-by: NRudolf Marek <r.marek@assembler.cz>
      Cc: Alexey Dobriyan <adobriyan@openvz.org>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: NJean Delvare <khali@linux-fr.org>
      4e9baad8
    • B
      get_unmapped_area handles MAP_FIXED on i386 · 5a8130f2
      Benjamin Herrenschmidt 提交于
      Handle MAP_FIXED in i386 hugetlb_get_unmapped_area(), just call
      prepare_hugepage_range.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NWilliam Irwin <bill.irwin@oracle.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Adam Litke <agl@us.ibm.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a8130f2
    • C
      SLUB core · 81819f0f
      Christoph Lameter 提交于
      This is a new slab allocator which was motivated by the complexity of the
      existing code in mm/slab.c. It attempts to address a variety of concerns
      with the existing implementation.
      
      A. Management of object queues
      
         A particular concern was the complex management of the numerous object
         queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for
         each allocating CPU and use objects from a slab directly instead of
         queueing them up.
      
      B. Storage overhead of object queues
      
         SLAB Object queues exist per node, per CPU. The alien cache queue even
         has a queue array that contain a queue for each processor on each
         node. For very large systems the number of queues and the number of
         objects that may be caught in those queues grows exponentially. On our
         systems with 1k nodes / processors we have several gigabytes just tied up
         for storing references to objects for those queues  This does not include
         the objects that could be on those queues. One fears that the whole
         memory of the machine could one day be consumed by those queues.
      
      C. SLAB meta data overhead
      
         SLAB has overhead at the beginning of each slab. This means that data
         cannot be naturally aligned at the beginning of a slab block. SLUB keeps
         all meta data in the corresponding page_struct. Objects can be naturally
         aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte
         boundaries and can fit tightly into a 4k page with no bytes left over.
         SLAB cannot do this.
      
      D. SLAB has a complex cache reaper
      
         SLUB does not need a cache reaper for UP systems. On SMP systems
         the per CPU slab may be pushed back into partial list but that
         operation is simple and does not require an iteration over a list
         of objects. SLAB expires per CPU, shared and alien object queues
         during cache reaping which may cause strange hold offs.
      
      E. SLAB has complex NUMA policy layer support
      
         SLUB pushes NUMA policy handling into the page allocator. This means that
         allocation is coarser (SLUB does interleave on a page level) but that
         situation was also present before 2.6.13. SLABs application of
         policies to individual slab objects allocated in SLAB is
         certainly a performance concern due to the frequent references to
         memory policies which may lead a sequence of objects to come from
         one node after another. SLUB will get a slab full of objects
         from one node and then will switch to the next.
      
      F. Reduction of the size of partial slab lists
      
         SLAB has per node partial lists. This means that over time a large
         number of partial slabs may accumulate on those lists. These can
         only be reused if allocator occur on specific nodes. SLUB has a global
         pool of partial slabs and will consume slabs from that pool to
         decrease fragmentation.
      
      G. Tunables
      
         SLAB has sophisticated tuning abilities for each slab cache. One can
         manipulate the queue sizes in detail. However, filling the queues still
         requires the uses of the spin lock to check out slabs. SLUB has a global
         parameter (min_slab_order) for tuning. Increasing the minimum slab
         order can decrease the locking overhead. The bigger the slab order the
         less motions of pages between per CPU and partial lists occur and the
         better SLUB will be scaling.
      
      G. Slab merging
      
         We often have slab caches with similar parameters. SLUB detects those
         on boot up and merges them into the corresponding general caches. This
         leads to more effective memory use. About 50% of all caches can
         be eliminated through slab merging. This will also decrease
         slab fragmentation because partial allocated slabs can be filled
         up again. Slab merging can be switched off by specifying
         slub_nomerge on boot up.
      
         Note that merging can expose heretofore unknown bugs in the kernel
         because corrupted objects may now be placed differently and corrupt
         differing neighboring objects. Enable sanity checks to find those.
      
      H. Diagnostics
      
         The current slab diagnostics are difficult to use and require a
         recompilation of the kernel. SLUB contains debugging code that
         is always available (but is kept out of the hot code paths).
         SLUB diagnostics can be enabled via the "slab_debug" option.
         Parameters can be specified to select a single or a group of
         slab caches for diagnostics. This means that the system is running
         with the usual performance and it is much more likely that
         race conditions can be reproduced.
      
      I. Resiliency
      
         If basic sanity checks are on then SLUB is capable of detecting
         common error conditions and recover as best as possible to allow the
         system to continue.
      
      J. Tracing
      
         Tracing can be enabled via the slab_debug=T,<slabcache> option
         during boot. SLUB will then protocol all actions on that slabcache
         and dump the object contents on free.
      
      K. On demand DMA cache creation.
      
         Generally DMA caches are not needed. If a kmalloc is used with
         __GFP_DMA then just create this single slabcache that is needed.
         For systems that have no ZONE_DMA requirement the support is
         completely eliminated.
      
      L. Performance increase
      
         Some benchmarks have shown speed improvements on kernbench in the
         range of 5-10%. The locking overhead of slub is based on the
         underlying base allocation size. If we can reliably allocate
         larger order pages then it is possible to increase slub
         performance much further. The anti-fragmentation patches may
         enable further performance increases.
      
      Tested on:
      i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator
      
      SLUB Boot options
      
      slub_nomerge		Disable merging of slabs
      slub_min_order=x	Require a minimum order for slab caches. This
      			increases the managed chunk size and therefore
      			reduces meta data and locking overhead.
      slub_min_objects=x	Mininum objects per slab. Default is 8.
      slub_max_order=x	Avoid generating slabs larger than order specified.
      slub_debug		Enable all diagnostics for all caches
      slub_debug=<options>	Enable selective options for all caches
      slub_debug=<o>,<cache>	Enable selective options for a certain set of
      			caches
      
      Available Debug options
      F		Double Free checking, sanity and resiliency
      R		Red zoning
      P		Object / padding poisoning
      U		Track last free / alloc
      T		Trace all allocs / frees (only use for individual slabs).
      
      To use SLUB: Apply this patch and then select SLUB as the default slab
      allocator.
      
      [hugh@veritas.com: fix an oops-causing locking error]
      [akpm@linux-foundation.org: various stupid cleanups and small fixes]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81819f0f
  8. 07 5月, 2007 1 次提交
    • L
      Revert "[PATCH] x86: __pa and __pa_symbol address space separation" · e3ebadd9
      Linus Torvalds 提交于
      This was broken.  It adds complexity, for no good reason.  Rather than
      separate __pa() and __pa_symbol(), we should deprecate __pa_symbol(),
      and preferably __pa() too - and just use "virt_to_phys()" instead, which
      is more readable and has nicer semantics.
      
      However, right now, just undo the separation, and make __pa_symbol() be
      the exact same as __pa().  That fixes the bugs this patch introduced,
      and we can do the fairly obvious cleanups later.
      
      Do the new __phys_addr() function (which is now the actual workhorse for
      the unified __pa()/__pa_symbol()) as a real external function, that way
      all the potential issues with compile/link-time optimizations of
      constant symbol addresses go away, and we can also, if we choose to, add
      more sanity-checking of the argument.
      
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@in.ibm.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e3ebadd9
  9. 03 5月, 2007 3 次提交