1. 31 10月, 2008 1 次提交
  2. 22 10月, 2008 1 次提交
  3. 21 10月, 2008 1 次提交
  4. 20 10月, 2008 1 次提交
    • M
      container freezer: implement freezer cgroup subsystem · dc52ddc0
      Matt Helsley 提交于
      This patch implements a new freezer subsystem in the control groups
      framework.  It provides a way to stop and resume execution of all tasks in
      a cgroup by writing in the cgroup filesystem.
      
      The freezer subsystem in the container filesystem defines a file named
      freezer.state.  Writing "FROZEN" to the state file will freeze all tasks
      in the cgroup.  Subsequently writing "RUNNING" will unfreeze the tasks in
      the cgroup.  Reading will return the current state.
      
      * Examples of usage :
      
         # mkdir /containers/freezer
         # mount -t cgroup -ofreezer freezer  /containers
         # mkdir /containers/0
         # echo $some_pid > /containers/0/tasks
      
      to get status of the freezer subsystem :
      
         # cat /containers/0/freezer.state
         RUNNING
      
      to freeze all tasks in the container :
      
         # echo FROZEN > /containers/0/freezer.state
         # cat /containers/0/freezer.state
         FREEZING
         # cat /containers/0/freezer.state
         FROZEN
      
      to unfreeze all tasks in the container :
      
         # echo RUNNING > /containers/0/freezer.state
         # cat /containers/0/freezer.state
         RUNNING
      
      This is the basic mechanism which should do the right thing for user space
      task in a simple scenario.
      
      It's important to note that freezing can be incomplete.  In that case we
      return EBUSY.  This means that some tasks in the cgroup are busy doing
      something that prevents us from completely freezing the cgroup at this
      time.  After EBUSY, the cgroup will remain partially frozen -- reflected
      by freezer.state reporting "FREEZING" when read.  The state will remain
      "FREEZING" until one of these things happens:
      
      	1) Userspace cancels the freezing operation by writing "RUNNING" to
      		the freezer.state file
      	2) Userspace retries the freezing operation by writing "FROZEN" to
      		the freezer.state file (writing "FREEZING" is not legal
      		and returns EIO)
      	3) The tasks that blocked the cgroup from entering the "FROZEN"
      		state disappear from the cgroup's set of tasks.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: export thaw_process]
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
      Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
      Tested-by: NMatt Helsley <matthltc@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc52ddc0
  5. 17 10月, 2008 2 次提交
  6. 16 10月, 2008 7 次提交
  7. 14 10月, 2008 1 次提交
  8. 13 10月, 2008 2 次提交
    • I
      x86: remove additional_cpus configurability · b8073050
      Ingo Molnar 提交于
      additional_cpus=<x> parameter is dangerous and broken: for example
      if we boot additional_cpus=-2 on a stock dual-core system it will
      crash the box on bootup.
      
      So reduce the maze of code a bit by removingthe user-configurability
      angle.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b8073050
    • C
      x86: allow number of additional hotplug CPUs to be set at compile time, V2 · 7f2f49a5
      Chuck Ebbert 提交于
      x86: allow number of additional hotplug CPUs to be set at compile time, V2
      
      The default number of additional CPU IDs for hotplugging is determined
      by asking ACPI or mptables how many "disabled" CPUs there are in the
      system, but many systems get this wrong so that e.g. a uniprocessor
      machine gets an extra CPU allocated and never switches to single CPU
      mode.
      
      And sometimes CPU hotplugging is enabled only for suspend/hibernate
      anyway, so the additional CPU IDs are not wanted. Allow the number
      to be set to zero at compile time.
      
      Also, force the number of extra CPUs to zero if hotplugging is disabled
      which allows removing some conditional code.
      
      Tested on uniprocessor x86_64 that ACPI claims has a disabled processor,
      with CPU hotplugging configured.
      
      ("After" has the number of additional CPUs set to 0)
      Before: NR_CPUS: 512, nr_cpu_ids: 2, nr_node_ids 1
      After: NR_CPUS: 512, nr_cpu_ids: 1, nr_node_ids 1
      
      [Changed the name of the option and the prompt according to Ingo's
       suggestion.]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7f2f49a5
  9. 01 10月, 2008 1 次提交
    • Y
      x86: change MTRR_SANITIZER to def_bool y · 2ffb3501
      Yinghai Lu 提交于
      This option has been added in v2.6.26 as a default-disabled
      feature and went through several revisions since then.
      
      The feature fixes a wide range of MTRR setup problems that BIOSes
      leave us with: slow system, slow Xorg, slow system when adding lots
      of RAM, etc., so we want to enable it by default for v2.6.28.
      
      See:
      
        [Bug 10508] Upgrade to 4GB of RAM messes up MTRRs
        http://bugzilla.kernel.org/show_bug.cgi?id=10508
      
      and the test results in:
      
         http://lkml.org/lkml/2008/9/29/273
      
      1. hpa
      reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1
      reg01: base=0x13c000000 (5056MB), size=  64MB: uncachable, count=1
      reg02: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
      reg03: base=0x100000000 (4096MB), size=1024MB: write-back, count=1
      reg04: base=0xbf700000 (3063MB), size=   1MB: uncachable, count=1
      reg05: base=0xbf800000 (3064MB), size=   8MB: uncachable, count=1
      
      will get
      Found optimal setting for mtrr clean up
      gran_size: 1M   chunk_size: 128M        num_reg: 6      lose RAM: 0M
      range0: 0000000000000000 - 00000000c0000000
      Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
      Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
      hole: 00000000bf700000 - 00000000c0000000
      Setting variable MTRR 2, base: 3063MB, range: 1MB, type UC
      Setting variable MTRR 3, base: 3064MB, range: 8MB, type UC
      range0: 0000000100000000 - 0000000140000000
      Setting variable MTRR 4, base: 4096MB, range: 1024MB, type WB
      hole: 000000013c000000 - 0000000140000000
      Setting variable MTRR 5, base: 5056MB, range: 64MB, type UC
      
      2. Dylan Taft
      reg00: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
      reg01: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
      reg02: base=0x120000000 (4608MB), size= 256MB: write-back, count=1
      reg03: base=0xd0000000 (3328MB), size= 256MB: uncachable, count=1
      reg04: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
      reg05: base=0xc7e00000 (3198MB), size=   2MB: uncachable, count=1
      reg06: base=0xc8000000 (3200MB), size= 128MB: uncachable, count=1
      
      will get
      Found optimal setting for mtrr clean up
      gran_size: 1M   chunk_size: 4M  num_reg: 6      lose RAM: 0M
      range0: 0000000000000000 - 00000000c8000000
      Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
      Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
      Setting variable MTRR 2, base: 3072MB, range: 128MB, type WB
      hole: 00000000c7e00000 - 00000000c8000000
      Setting variable MTRR 3, base: 3198MB, range: 2MB, type UC
      rangeX: 0000000100000000 - 0000000130000000
      Setting variable MTRR 4, base: 4096MB, range: 512MB, type WB
      Setting variable MTRR 5, base: 4608MB, range: 256MB, type WB
      
      3. Gabriel
      reg00: base=0xd0000000 (3328MB), size= 256MB: uncachable, count=1
      reg01: base=0xe0000000 (3584MB), size= 512MB: uncachable, count=1
      reg02: base=0x00000000 (   0MB), size=4096MB: write-back, count=1
      reg03: base=0x100000000 (4096MB), size= 512MB: write-back, count=1
      reg04: base=0x120000000 (4608MB), size= 128MB: write-back, count=1
      reg05: base=0x128000000 (4736MB), size=  64MB: write-back, count=1
      reg06: base=0xcf600000 (3318MB), size=   2MB: uncachable, count=1
      
      will get
      Found optimal setting for mtrr clean up
      gran_size: 1M   chunk_size: 16M         num_reg: 7      lose RAM: 0M
      range0: 0000000000000000 - 00000000d0000000
      Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
      Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
      Setting variable MTRR 2, base: 3072MB, range: 256MB, type WB
      hole: 00000000cf600000 - 00000000cf800000
      Setting variable MTRR 3, base: 3318MB, range: 2MB, type UC
      rangeX: 0000000100000000 - 000000012c000000
      Setting variable MTRR 4, base: 4096MB, range: 512MB, type WB
      Setting variable MTRR 5, base: 4608MB, range: 128MB, type WB
      Setting variable MTRR 6, base: 4736MB, range: 64MB, type WB
      
      4. Mika Fischer
      reg00: base=0xc0000000 (3072MB), size=1024MB: uncachable, count=1
      reg01: base=0x00000000 ( 0MB), size=4096MB: write-back, count=1
      reg02: base=0x100000000 (4096MB), size=1024MB: write-back, count=1
      reg03: base=0xbf700000 (3063MB), size= 1MB: uncachable, count=1
      reg04: base=0xbf800000 (3064MB), size= 8MB: uncachable, count=1
      
      will get
      Found optimal setting for mtrr clean up
      gran_size: 1M   chunk_size: 16M         num_reg: 5      lose RAM: 0M
      range0: 0000000000000000 - 00000000c0000000
      Setting variable MTRR 0, base: 0MB, range: 2048MB, type WB
      Setting variable MTRR 1, base: 2048MB, range: 1024MB, type WB
      hole: 00000000bf700000 - 00000000c0000000
      Setting variable MTRR 2, base: 3063MB, range: 1MB, type UC
      Setting variable MTRR 3, base: 3064MB, range: 8MB, type UC
      rangeX: 0000000100000000 - 0000000140000000
      Setting variable MTRR 4, base: 4096MB, range: 1024MB, type WB
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2ffb3501
  10. 23 9月, 2008 1 次提交
  11. 19 9月, 2008 1 次提交
  12. 16 9月, 2008 1 次提交
    • I
      x86: add X86_RESERVE_LOW_64K · fc381519
      Ingo Molnar 提交于
      This bugzilla:
      
        http://bugzilla.kernel.org/show_bug.cgi?id=11237
      
      Documents a wide range of systems where the BIOS utilizes the first
      64K of physical memory during suspend/resume and other hardware events.
      
      Currently we reserve this memory on all AMI and Phoenix BIOS systems.
      Life is too short to hunt subtle memory corruption problems like this,
      so we try to be robust by default.
      
      Still, allow this to be overriden: allow users who want that first 64K
      of memory to be available to the kernel disable the quirk, via
      CONFIG_X86_RESERVE_LOW_64K=n.
      
      Also, allow the early reservation to overlap with other
      early reservations.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fc381519
  13. 14 9月, 2008 3 次提交
  14. 09 9月, 2008 1 次提交
  15. 07 9月, 2008 5 次提交
  16. 26 8月, 2008 1 次提交
    • L
      [x86] Clean up MAXSMP Kconfig, and limit NR_CPUS to 512 · d25e26b6
      Linus Torvalds 提交于
      This fixes a regression that was indirectly caused by commit
      1184dc2f ("x86: modify Kconfig to allow
      up to 4096 cpus").
      
      Allowing 4k CPU's is not practical at this time, because we still have a
      number of places that have several 'cpumask_t's on the stack, and a
      4k-bit cpumask is 512 bytes of stack-space for each such variable.  This
      literally caused functions like 'smp_call_function_mask' to have a 2.5kB
      stack frame, and several functions to have 2kB stackframes.
      
      With an 8kB stack total, smashing the stack was simply much too likely.
      At least bugzilla entry
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=11342
      
      was due to this.
      
      The earlier commit to not inline load_module() into sys_init_module()
      fixed the particular symptoms of this that Alan Brunelle saw in that
      bugzilla entry, but the huge stack waste by cpumask_t's was the more
      direct cause.
      
      Some day we'll have allocation helpers that allocate large CPU masks
      dynamically, but in the meantime we simply cannot allow cpumasks this
      large.
      
      Cc: Alan D. Brunelle <Alan.Brunelle@hp.com>
      Cc: Mike Travis <travis@sgi.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d25e26b6
  17. 15 8月, 2008 2 次提交
    • T
      x86, bootup: add built-in kernel command line for x86 (v2) · 516cbf37
      Tim Bird 提交于
      Allow x86 to support a built-in kernel command line.  The built-in
      command line can override the one provided by the boot loader, for
      those cases where the boot loader is broken or it is difficult
      to change the command line in the the boot loader.
      
      H. Peter Anvin wrote:
      > Ingo Molnar wrote:
      >> Best would be to make it really apparent in the code that nothing
      >> changes if this config option is not set. Preferably there should be
      >> no extra code at all in that case.
      >>
      >
      > I would like to see this:
      [...Nested ifdefs...]
      
      OK. This version changes absolutely nothing if CONFIG_CMDLINE_BOOL is not
      set (the default).  Also, no space is appended even when CONFIG_CMDLINE_BOOL
      is set, but the builtin string is empty.  This is less sloppy all the way
      around, IMHO.
      
      Note that I use the same option names as on other arches for
      this feature.
      
      [ mingo@elte.hu: build fix ]
      Signed-off-by: NTim Bird <tim.bird@am.sony.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      516cbf37
    • P
      arch/x86/Kconfig: clean up, experimental adjustement · 04b69447
      Pavel Machek 提交于
      Adjust experimental tags in Kconfig, update config to notice that
      i386/x86_64 is now single architecture.
      Signed-off-by: NPavel Machek <pavel@suse.cz>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      04b69447
  18. 12 8月, 2008 3 次提交
  19. 29 7月, 2008 2 次提交
    • P
      x86: AMD microcode patch loading support · 80cc9f10
      Peter Oruba 提交于
      This patch introduces microcode patch loading for AMD
      processors. It is based on previous corresponding work
      for Intel processors.
      
      It hooks into the general patch loading module. Main
      difference is that a container file format is used to hold
      all patch data for multiple processors as well as an
      equivalent CPU table, which comes seperately, as opposed
      to Intel's microcode patching solution.
      
      Kconfig and Makefile have been changed provice config
      and build option for new source file.
      Signed-off-by: NPeter Oruba <peter.oruba@amd.com>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      80cc9f10
    • P
      x86: major refactoring · 8d86f390
      Peter Oruba 提交于
      Refactored code by introducing a two-module solution.
      
      There is one general module in which vendor specific modules can hook into.
      However, that is exclusive, there is only one vendor specific module
      allowed at a time. A CPU vendor check makes sure only the correct
      module for the underlying system gets called.
      
      Functinally in terms of patch loading itself there are no changes. This
      refactoring provides a basis for future implementations of other vendors'
      patch loaders.
      Signed-off-by: NPeter Oruba <peter.oruba@amd.com>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8d86f390
  20. 28 7月, 2008 1 次提交
  21. 27 7月, 2008 2 次提交
    • R
      x86: tracehook: CONFIG_HAVE_ARCH_TRACEHOOK · 99bbc4b1
      Roland McGrath 提交于
      The x86 arch code has all the prerequisites, so set HAVE_ARCH_TRACEHOOK.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      99bbc4b1
    • N
      x86: lockless get_user_pages_fast() · 8174c430
      Nick Piggin 提交于
      Implement get_user_pages_fast without locking in the fastpath on x86.
      
      Do an optimistic lockless pagetable walk, without taking mmap_sem or any
      page table locks or even mmap_sem.  Page table existence is guaranteed by
      turning interrupts off (combined with the fact that we're always looking
      up the current mm, means we can do the lockless page table walk within the
      constraints of the TLB shootdown design).  Basically we can do this
      lockless pagetable walk in a similar manner to the way the CPU's pagetable
      walker does not have to take any locks to find present ptes.
      
      This patch (combined with the subsequent ones to convert direct IO to use
      it) was found to give about 10% performance improvement on a 2 socket 8
      core Intel Xeon system running an OLTP workload on DB2 v9.5
      
       "To test the effects of the patch, an OLTP workload was run on an IBM
        x3850 M2 server with 2 processors (quad-core Intel Xeon processors at
        2.93 GHz) using IBM DB2 v9.5 running Linux 2.6.24rc7 kernel.  Comparing
        runs with and without the patch resulted in an overall performance
        benefit of ~9.8%.  Correspondingly, oprofiles showed that samples from
        __up_read and __down_read routines that is seen during thread contention
        for system resources was reduced from 2.8% down to .05%.  Monitoring the
        /proc/vmstat output from the patched run showed that the counter for
        fast_gup contained a very high number while the fast_gup_slow value was
        zero."
      
      (fast_gup is the old name for get_user_pages_fast, fast_gup_slow is a
      counter we had for the number of times the slowpath was invoked).
      
      The main reason for the improvement is that DB2 has multiple threads each
      issuing direct-IO.  Direct-IO uses get_user_pages, and thus the threads
      contend the mmap_sem cacheline, and can also contend on page table locks.
      
      I would anticipate larger performance gains on larger systems, however I
      think DB2 uses an adaptive mix of threads and processes, so it could be
      that thread contention remains pretty constant as machine size increases.
      In which case, we stuck with "only" a 10% gain.
      
      The downside of using get_user_pages_fast is that if there is not a pte
      with the correct permissions for the access, we end up falling back to
      get_user_pages and so the get_user_pages_fast is a bit of extra work.
      However this should not be the common case in most performance critical
      code.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: Kconfig fix]
      [akpm@linux-foundation.org: Makefile fix/cleanup]
      [akpm@linux-foundation.org: warning fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Reviewed-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8174c430