1. 13 3月, 2010 9 次提交
    • B
      cgroups: blkio subsystem as module · 67523c48
      Ben Blum 提交于
      Modify the Block I/O cgroup subsystem to be able to be built as a module.
      As the CFQ disk scheduler optionally depends on blk-cgroup, config options
      in block/Kconfig, block/Kconfig.iosched, and block/blk-cgroup.h are
      enhanced to support the new module dependency.
      Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      67523c48
    • B
      cgroups: subsystem module unloading · cf5d5941
      Ben Blum 提交于
      Provides support for unloading modular subsystems.
      
      This patch adds a new function cgroup_unload_subsys which is to be used
      for removing a loaded subsystem during module deletion.  Reference
      counting of the subsystems' modules is moved from once (at load time) to
      once per attached hierarchy (in parse_cgroupfs_options and
      rebind_subsystems) (i.e., 0 or 1).
      Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf5d5941
    • B
      cgroups: subsystem module loading interface · e6a1105b
      Ben Blum 提交于
      Add interface between cgroups subsystem management and module loading
      
      This patch implements rudimentary module-loading support for cgroups -
      namely, a cgroup_load_subsys (similar to cgroup_init_subsys) for use as a
      module initcall, and a struct module pointer in struct cgroup_subsys.
      
      Several functions that might be wanted by modules have had EXPORT_SYMBOL
      added to them, but it's unclear exactly which functions want it and which
      won't.
      Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6a1105b
    • B
      cgroups: revamp subsys array · aae8aab4
      Ben Blum 提交于
      This patch series provides the ability for cgroup subsystems to be
      compiled as modules both within and outside the kernel tree.  This is
      mainly useful for classifiers and subsystems that hook into components
      that are already modules.  cls_cgroup and blkio-cgroup serve as the
      example use cases for this feature.
      
      It provides an interface cgroup_load_subsys() and cgroup_unload_subsys()
      which modular subsystems can use to register and depart during runtime.
      The net_cls classifier subsystem serves as the example for a subsystem
      which can be converted into a module using these changes.
      
      Patch #1 sets up the subsys[] array so its contents can be dynamic as
      modules appear and (eventually) disappear.  Iterations over the array are
      modified to handle when subsystems are absent, and the dynamic section of
      the array is protected by cgroup_mutex.
      
      Patch #2 implements an interface for modules to load subsystems, called
      cgroup_load_subsys, similar to cgroup_init_subsys, and adds a module
      pointer in struct cgroup_subsys.
      
      Patch #3 adds a mechanism for unloading modular subsystems, which includes
      a more advanced rework of the rudimentary reference counting introduced in
      patch 2.
      
      Patch #4 modifies the net_cls subsystem, which already had some module
      declarations, to be configurable as a module, which also serves as a
      simple proof-of-concept.
      
      Part of implementing patches 2 and 4 involved updating css pointers in
      each css_set when the module appears or leaves.  In doing this, it was
      discovered that css_sets always remain linked to the dummy cgroup,
      regardless of whether or not any subsystems are actually bound to it
      (i.e., not mounted on an actual hierarchy).  The subsystem loading and
      unloading code therefore should keep in mind the special cases where the
      added subsystem is the only one in the dummy cgroup (and therefore all
      css_sets need to be linked back into it) and where the removed subsys was
      the only one in the dummy cgroup (and therefore all css_sets should be
      unlinked from it) - however, as all css_sets always stay attached to the
      dummy cgroup anyway, these cases are ignored.  Any fix that addresses this
      issue should also make sure these cases are addressed in the subsystem
      loading and unloading code.
      
      This patch:
      
      Make subsys[] able to be dynamically populated to support modular
      subsystems
      
      This patch reworks the way the subsys[] array is used so that subsystems
      can register themselves after boot time, and enables the internals of
      cgroups to be able to handle when subsystems are not present or may
      appear/disappear.
      Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aae8aab4
    • D
      cgroup: introduce coalesce css_get() and css_put() · d7b9fff7
      Daisuke Nishimura 提交于
      Current css_get() and css_put() increment/decrement css->refcnt one by
      one.
      
      This patch add a new function __css_get(), which takes "count" as a arg
      and increment the css->refcnt by "count".  And this patch also add a new
      arg("count") to __css_put() and change the function to decrement the
      css->refcnt by "count".
      
      These coalesce version of __css_get()/__css_put() will be used to improve
      performance of memcg's moving charge feature later, where instead of
      calling css_get()/css_put() repeatedly, these new functions will be used.
      
      No change is needed for current users of css_get()/css_put().
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7b9fff7
    • D
      cgroup: introduce cancel_attach() · 2468c723
      Daisuke Nishimura 提交于
      Add cancel_attach() operation to struct cgroup_subsys.  cancel_attach()
      can be used when can_attach() operation prepares something for the subsys,
      but we should rollback what can_attach() operation has prepared if attach
      task fails after we've succeeded in can_attach().
      Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Reviewed-by: NPaul Menage <menage@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2468c723
    • C
      Add generic sys_olduname() · 5cacdb4a
      Christoph Hellwig 提交于
      Add generic implementations of the old and really old uname system calls.
      Note that sh only implements sys_olduname but not sys_oldolduname, but I'm
      not going to bother with another ifdef for that special case.
      
      m32r implemented an old uname but never wired it up, so kill it, too.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5cacdb4a
    • C
      improve sys_newuname() for compat architectures · e28cbf22
      Christoph Hellwig 提交于
      On an architecture that supports 32-bit compat we need to override the
      reported machine in uname with the 32-bit value.  Instead of doing this
      separately in every architecture introduce a COMPAT_UTS_MACHINE define in
      <asm/compat.h> and apply it directly in sys_newuname().
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e28cbf22
    • C
      Add generic sys_ipc wrapper · baed7fc9
      Christoph Hellwig 提交于
      Add a generic implementation of the ipc demultiplexer syscall.  Except for
      s390 and sparc64 all implementations of the sys_ipc are nearly identical.
      
      There are slight differences in the types of the parameters, where mips
      and powerpc as the only 64-bit architectures with sys_ipc use unsigned
      long for the "third" argument as it gets casted to a pointer later, while
      it traditionally is an "int" like most other paramters.  frv goes even
      further and uses unsigned long for all parameters execept for "ptr" which
      is a pointer type everywhere.  The change from int to unsigned long for
      "third" and back to "int" for the others on frv should be fine due to the
      in-register calling conventions for syscalls (we already had a similar
      issue with the generic sys_ptrace), but I'd prefer to have the arch
      maintainers looks over this in details.
      
      Except for that h8300, m68k and m68knommu lack an impplementation of the
      semtimedop sub call which this patch adds, and various architectures have
      gets used - at least on i386 it seems superflous as the compat code on
      x86-64 and ia64 doesn't even bother to implement it.
      
      [akpm@linux-foundation.org: add sys_ipc to sys_ni.c]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Reviewed-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      baed7fc9
  2. 08 3月, 2010 5 次提交
  3. 07 3月, 2010 16 次提交
  4. 04 3月, 2010 5 次提交
  5. 03 3月, 2010 1 次提交
  6. 02 3月, 2010 2 次提交
    • Y
      early_res: Need to save the allocation name in drop_range_partial() · dce46a04
      Yinghai Lu 提交于
      During free_early_partial(), reserve_early_without_check() could end
      extending the early_res area from __check_and_double_early_res(); as a
      result, the location of the name for the current reservation could
      change.
      
      Therefore, we need to save a local copy of the name.
      
      [ hpa: rewrote comment and checkin description ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4B8C7C94.7070000@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      dce46a04
    • W
      resource: Fix generic page_is_ram() for partial RAM pages · 37b99dd5
      Wu Fengguang 提交于
      The System RAM walk shall skip partial RAM pages and avoid calling
      func() on them. So that page_is_ram() return 0 for a partial RAM page.
      
      In particular, it shall not call func() with len=0.
      This fixes a boot time bug reported by Sachin and root caused by Thomas:
      
      > >>> WARNING: at arch/x86/mm/ioremap.c:111 __ioremap_caller+0x169/0x2f1()
      > >>> Hardware name: BladeCenter LS21 -[79716AA]-
      > >>> Modules linked in:
      > >>> Pid: 0, comm: swapper Not tainted 2.6.33-git6-autotest #1
      > >>> Call Trace:
      > >>> [<ffffffff81047cff>] ? __ioremap_caller+0x169/0x2f1
      > >>> [<ffffffff81063b7d>] warn_slowpath_common+0x77/0xa4
      > >>> [<ffffffff81063bb9>] warn_slowpath_null+0xf/0x11
      > >>> [<ffffffff81047cff>] __ioremap_caller+0x169/0x2f1
      > >>> [<ffffffff813747a3>] ? acpi_os_map_memory+0x12/0x1b
      > >>> [<ffffffff81047f10>] ioremap_nocache+0x12/0x14
      > >>> [<ffffffff813747a3>] acpi_os_map_memory+0x12/0x1b
      > >>> [<ffffffff81282fa0>] acpi_tb_verify_table+0x29/0x5b
      > >>> [<ffffffff812827f0>] acpi_load_tables+0x39/0x15a
      > >>> [<ffffffff8191c8f8>] acpi_early_init+0x60/0xf5
      > >>> [<ffffffff818f2cad>] start_kernel+0x397/0x3a7
      > >>> [<ffffffff818f2295>] x86_64_start_reservations+0xa5/0xa9
      > >>> [<ffffffff818f237a>] x86_64_start_kernel+0xe1/0xe8
      > >>> ---[ end trace 4eaa2a86a8e2da22 ]---
      > >>> ioremap reserve_memtype failed -22
      
      The return code is -EINVAL, so it failed in the is_ram check, which is
      not too surprising
      
      > BIOS-provided physical RAM map:
      >  BIOS-e820: 0000000000000000 - 000000000009c000 (usable)
      >  BIOS-e820: 000000000009c000 - 00000000000a0000 (reserved)
      >  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
      >  BIOS-e820: 0000000000100000 - 00000000cffa3900 (usable)
      >  BIOS-e820: 00000000cffa3900 - 00000000cffa7400 (ACPI data)
      
      The ACPI data is not starting on a page boundary and neither does the
      usable RAM area end on a page boundary. Very useful !
      
      > ACPI: DSDT 00000000cffa3900 036CE (v01 IBM    SERLEWIS 00001000 INTL 20060912)
      
      ACPI is trying to map DSDT at cffa3900, which results in a check
      vs. cffa3000 which is the relevant page boundary. The generic is_ram
      check correctly identifies that as RAM because it's in the usable
      resource area. The old e820 based is_ram check does not take
      overlapping resource areas into account. That's why it works.
      
      CC: Sachin Sant <sachinp@in.ibm.com>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      LKML-Reference: <20100301135551.GA9998@localhost>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      37b99dd5
  7. 01 3月, 2010 2 次提交