1. 20 10月, 2007 21 次提交
  2. 19 10月, 2007 19 次提交
    • I
      x86: fix global_flush_tlb() bug · 9a24d04a
      Ingo Molnar 提交于
      While we were reviewing pageattr_32/64.c for unification,
      Thomas Gleixner noticed the following serious SMP bug in
      global_flush_tlb():
      
      	down_read(&init_mm.mmap_sem);
      	list_replace_init(&deferred_pages, &l);
      	up_read(&init_mm.mmap_sem);
      
      this is SMP-unsafe because list_replace_init() done on two CPUs in
      parallel can corrupt the list.
      
      This bug has been introduced about a year ago in the 64-bit tree:
      
             commit ea7322de
             Author: Andi Kleen <ak@suse.de>
             Date:   Thu Dec 7 02:14:05 2006 +0100
      
             [PATCH] x86-64: Speed and clean up cache flushing in change_page_attr
      
                      down_read(&init_mm.mmap_sem);
              -       dpage = xchg(&deferred_pages, NULL);
              +       list_replace_init(&deferred_pages, &l);
                      up_read(&init_mm.mmap_sem);
      
      the xchg() based version was SMP-safe, but list_replace_init() is not.
      So this "cleanup" introduced a nasty bug.
      
      why this bug never become prominent is a mystery - it can probably be
      explained with the (still) relative obscurity of the x86_64 architecture.
      
      the safe fix for now is to write-lock init_mm.mmap_sem.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      9a24d04a
    • R
      Replace __attribute_pure__ with __pure · e8c44319
      Ralf Baechle 提交于
      To be consistent with the use of attributes in the rest of the kernel
      replace all use of __attribute_pure__ with __pure and delete the definition
      of __attribute_pure__.
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Acked-by: NMauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Bryan Wu <bryan.wu@analog.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8c44319
    • S
      sparse pointer use of zero as null · c80544dc
      Stephen Hemminger 提交于
      Get rid of sparse related warnings from places that use integer as NULL
      pointer.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeff Garzik <jeff@garzik.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Ian Kent <raven@themaw.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c80544dc
    • S
      x86 msr driver: Misc cpuinit annotations · 38048983
      Satyam Sharma 提交于
      msr_class_cpu_callback() can be marked __cpuinit, being the notifier callback
      for a __cpuinitdata notifier_block.  So can be marked msr_device_create() too,
      called only from the newly-__cpuinit msr_class_cpu_callback() or from
      __init-marked msr_init().
      Signed-off-by: NSatyam Sharma <satyam@infradead.org>
      Cc: Andi Kleen <ak@suse.de>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38048983
    • M
      powerpc: add scaled time accounting · 4603ac18
      Michael Neuling 提交于
      This adds POWERPC specific hooks for scaled time accounting.
      
      POWER6 includes a SPURR register.  The SPURR is based off the PURR register
      but is scaled based on CPU frequency and issue rates.  This gives a more
      accurate account of the instructions used per task.  The PURR and timebase
      will be constant relative to the wall clock, irrespective of the CPU
      frequency.
      
      This implementation reads the SPURR register in account_system_vtime which
      is only call called on context witch and hard and soft irq entry and exit.
      The percentage of user and system time is then estimated using the ratio of
      these accounted by the PURR.  If the SPURR is not present, the PURR read.
      
      An earlier implementation of this patch read the SPURR whenever the PURR
      was read, which included the system call entry and exit path.
      Unfortunately this showed a performance regression on lmbench runs, so was
      re-implemented.
      
      I've included the lmbench results here when run bare metal on POWER6.  1st
      column is the unpatch results.  2nd column is the results using the below
      patch and the 3rd is the % diff of these results from the base.  4th and
      5th columns are the results and % differnce from the base using the older
      patch (SPURR read in syscall entry/exit path).
      
                                    Base        Scaled-Acct     SPURR-in-syscall
                                   Result      Result  % diff    Result % diff
      Simple syscall:              0.3086      0.3086  0.0000    0.3452 11.8600
      Simple read:                 0.4591      0.4671  1.7425    0.5044 9.86713
      Simple write:                0.4364      0.4366  0.0458    0.4731 8.40971
      Simple stat:                 2.0055      2.0295  1.1967    2.0669 3.06158
      Simple fstat:                0.5962      0.5876  -1.442    0.6368 6.80979
      Simple open/close:           3.1283      3.1009  -0.875    3.2088 2.57328
      Select on 10 fd's:           0.8554      0.8457  -1.133    0.8667 1.32101
      Select on 100 fd's:          3.5292      3.6329  2.9383    3.6664 3.88756
      Select on 250 fd's:          7.9097      8.1881  3.5197    8.2242 3.97613
      Select on 500 fd's:          15.2659     15.836  3.7357    15.873 3.97814
      Select on 10 tcp fd's:       0.9576      0.9416  -1.670    0.9752 1.83792
      Select on 100 tcp fd's:      7.248       7.2254  -0.311    7.2685 0.28283
      Select on 250 tcp fd's:      17.7742     17.707  -0.375    17.749 -0.1406
      Select on 500 tcp fd's:      35.4258     35.25   -0.496    35.286 -0.3929
      Signal handler installation: 0.6131      0.6075  -0.913    0.647  5.52927
      Signal handler overhead:     2.0919      2.1078  0.7600    2.1831 4.35967
      Protection fault:            0.7345      0.7478  1.8107    0.8031 9.33968
      Pipe latency:                33.006      16.398  -50.31    33.475 1.42368
      AF_UNIX sock stream latency: 14.5093     30.910  113.03    30.715 111.692
      Process fork+exit:           219.8       222.8   1.3648    229.37 4.35623
      Process fork+execve:         876.14      873.28  -0.32     868.66 -0.8533
      Process fork+/bin/sh -c:     2830        2876.5  1.6431    2958   4.52296
      File /var/tmp/XXX write bw:  1193497     1195536 0.1708    118657 -0.5799
      Pagefaults on /var/tmp/XXX:  3.1272      3.2117  2.7020    3.2521 3.99398
      
      Also, kernel compile times show no difference with this patch applied.
      
      [pbadari@us.ibm.com: Avoid unnecessary PURR reading]
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4603ac18
    • J
      Add missing newlines to some uses of dev_<level> messages · 898eb71c
      Joe Perches 提交于
      Found these while looking at printk uses.
      
      Add missing newlines to dev_<level> uses
      Add missing KERN_<level> prefixes to multiline dev_<level>s
      Fixed a wierd->weird spelling typo
      Added a newline to a printk
      Signed-off-by: NJoe Perches <joe@perches.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Mark M. Hoffman <mhoffman@lightlink.com>
      Cc: Roland Dreier <rolandd@cisco.com>
      Cc: Tilman Schmidt <tilman@imap.cc>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Jeff Garzik <jeff@garzik.org>
      Cc: Stephen Hemminger <shemminger@linux-foundation.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: David Brownell <david-b@pacbell.net>
      Cc: James Smart <James.Smart@Emulex.Com>
      Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
      Cc: "Antonino A. Daplas" <adaplas@pol.net>
      Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Jaroslav Kysela <perex@suse.cz>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      898eb71c
    • E
      sysctl: x86_64 remove unnecessary binary paths · 282a821f
      Eric W. Biederman 提交于
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@sw.ru>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      282a821f
    • A
      cpu hotplug: intel_cacheinfo: fix cpu hotplug error handling · ef1d7151
      Akinobu Mita 提交于
      - Fix resource leakage in error case within detect_cache_attributes()
      
      - Don't register hotcpu notifier when cache_add_dev() returns error
      
      - Introduce cache_dev_map cpumask to track whether cache interface for
        CPU is successfully added by cache_add_dev() or not.
      
        cache_add_dev() may fail with out of memory error. In order to
        avoid cache_remove_dev() with that uninitialized cache interface when
        CPU_DEAD event is delivered we need to have the cache_dev_map cpumask.
      
        (We cannot change cache_add_dev() from CPU_ONLINE event handler
        to CPU_UP_PREPARE event handler. Because cache_add_dev() needs
        to do cpuid and store the results with its CPU online.)
      
      [nix.or.die@googlemail.com: fix a section mismatch warning]
      Cc: Ashok Raj <ashok.raj@intel.com>
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: NGabriel Craciunescu <nix.or.die@googlemail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef1d7151
    • A
      cpu hotplug: mce: fix cpu hotplug error handling · d435d862
      Akinobu Mita 提交于
      - Clear kobject in percpu device_mce before calling sysdev_register() with
      
        Because mce_create_device() may fail and it leaves kobject filled with
        junk. It will be the problem when mce_create_device() will be called
        next time.
      
      - Fix error handling in mce_create_device()
      
        Error handling should not do sysdev_remove_file() with not yet added
        attributes.
      
      - Don't register hotcpu notifier when mce_create_device() returns error
      
      - Do mce_create_device() in CPU_UP_PREPARE instead of CPU_ONLINE
      
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d435d862
    • A
      cpu hotplug: msr: fix cpu hotplug error handling · 881a841f
      Akinobu Mita 提交于
      Do msr_device_create() in CPU_UP_PREPARE instead of CPU_ONLINE.
      
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      881a841f
    • A
      cpu hotplug: thermal_throttle: fix cpu hotplug error handling · c7e38a9c
      Akinobu Mita 提交于
      Do thermal_throttle_add_dev() in CPU_UP_PREPARE instead of CPU_ONLINE.
      
      Cc: Dmitriy Zavin <dmitriyz@google.com>
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7e38a9c
    • T
      Fix discrepancy between VDSO based gettimeofday() and sys_gettimeofday(). · 2c622148
      Tony Breeds 提交于
      On platforms that copy sys_tz into the vdso (currently only x86_64, soon to
      include powerpc), it is possible for the vdso to get out of sync if a user
      calls (admittedly unusual) settimeofday(NULL, ptr).
      
      This patch adds a hook for architectures that set
      CONFIG_GENERIC_TIME_VSYSCALL to ensure when sys_tz is updated they can also
      updatee their copy in the vdso.
      Signed-off-by: NTony Breeds <tony@bakeyournoodle.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Acked-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2c622148
    • R
      Hibernation: Use temporary page tables for kernel text mapping on x86_64 · efa4d2fb
      Rafael J. Wysocki 提交于
      Use temporary page tables for the kernel text mapping during hibernation
      restore on x86_64.
      
      Without the patch, the original boot kernel's page tables that represent the
      kernel text mapping are used while the core of the image kernel is being
      restored.  However, in principle, if the boot kernel is not identical to the
      image kernel, the location of these page tables in the image kernel need not
      be the same, so we should create a safe copy of the kernel text mapping prior
      to restoring the core of the image kernel.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      efa4d2fb
    • R
      Hibernation: Pass CR3 in the image header on x86_64 · c30bb68c
      Rafael J. Wysocki 提交于
      Since we already pass the address of restore_registers() in the image header,
      we can also pass the value of the CR3 register from before the hibernation in
      the same way.  This will allow us to avoid using init_level4_pgt page tables
      during the restore.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c30bb68c
    • R
      Hibernation: Arbitrary boot kernel support on x86_64 · d158cbdf
      Rafael J. Wysocki 提交于
      Make it possible to restore a hibernation image on x86_64 with the help of a
      kernel different from the one in the image.
      
      The idea is to split the core restoration code into two separate parts and to
      place each of them in a different page.   The first part belongs to the boot
      kernel and is executed as the last step of the image kernel's memory
      restoration procedure.   Before being executed, it is relocated to a safe page
      that won't be overwritten while copying the image kernel pages.
      
      The final operation performed by it is a jump to the second part of the core
      restoration code that belongs to the image kernel and has just been restored.
      This code makes the CPU switch to the image kernel's page tables and restores
      the state of general purpose registers (including the stack pointer) from
      before the hibernation.
      
      The main issue with this idea is that in order to jump to the second part of
      the core restoration code the boot kernel needs to know its address.
       However, this address may be passed to it in the image header.   Namely, the
      part of the image header previously used for checking if the version of the
      image kernel is correct can be replaced with some architecture specific data
      that will allow the boot kernel to jump to the right address within the image
      kernel.   These data should also be used for checking if the image kernel is
      compatible with the boot kernel (as far as the memory restroration procedure
      is concerned).  It can be done, for example, with the help of a "magic" value
      that has to be equal in both kernels, so that they can be regarded as
      compatible.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d158cbdf
    • P
      s2ram: kill old debugging junk · 50a1efe1
      Pavel Machek 提交于
      This removes old debugging stuff, that should be no longer neccessary.  It
      accessed VGA hardware (which may not be ready at this point), and used LEDs
      at port 80 for debugging.
      Signed-off-by: NPavel Machek <pavel@suse.cz>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50a1efe1
    • A
      serial: turn serial console suspend a boot rather than compile time option · 8f4ce8c3
      Andres Salomon 提交于
      Currently, there's a CONFIG_DISABLE_CONSOLE_SUSPEND that allows one to stop
      the serial console from being suspended when the rest of the machine goes
      to sleep.  This is incredibly useful for debugging power management-related
      things; however, having it as a compile-time option has proved to be
      incredibly inconvenient for us (OLPC).  There are plenty of times that we
      want serial console to not suspend, but for the most part we'd like serial
      console to be suspended.
      
      This drops CONFIG_DISABLE_CONSOLE_SUSPEND, and replaces it with a kernel
      boot parameter (no_console_suspend).  By default, the serial console will
      be suspended along with the rest of the system; by passing
      'no_console_suspend' to the kernel during boot, serial console will remain
      alive during suspend.
      
      For now, this is pretty serial console specific; further fixes could be
      applied to make this work for things like netconsole.
      Signed-off-by: NAndres Salomon <dilinger@debian.org>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@suspend2.net>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f4ce8c3
    • R
      PM: Rework struct platform_suspend_ops · e6c5eb95
      Rafael J. Wysocki 提交于
      There is no reason why the .prepare() and .finish() methods in 'struct
      platform_suspend_ops' should take any arguments, since architectures don't use
      these methods' argument in any practically meaningful way (ie.  either the
      target system sleep state is conveyed to the platform by .set_target(), or
      there is only one suspend state supported and it is indicated to the PM core
      by .valid(), or .prepare() and .finish() aren't defined at all).   There also
      is no reason why .finish() should return any result.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6c5eb95
    • R
      PM: Rename struct pm_ops and related things · 26398a70
      Rafael J. Wysocki 提交于
      The name of 'struct pm_ops' suggests that it is related to the power
      management in general, but in fact it is only related to suspend.   Moreover,
      its name should indicate what this structure is used for, so it seems
      reasonable to change it to 'struct platform_suspend_ops'.   In that case, the
      name of the global variable of this type used by the PM core and the names of
      related functions should be changed accordingly.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26398a70