1. 04 3月, 2011 1 次提交
  2. 16 2月, 2011 1 次提交
    • S
      perf: Add cgroup support · e5d1367f
      Stephane Eranian 提交于
      This kernel patch adds the ability to filter monitoring based on
      container groups (cgroups). This is for use in per-cpu mode only.
      
      The cgroup to monitor is passed as a file descriptor in the pid
      argument to the syscall. The file descriptor must be opened to
      the cgroup name in the cgroup filesystem. For instance, if the
      cgroup name is foo and cgroupfs is mounted in /cgroup, then the
      file descriptor is opened to /cgroup/foo. Cgroup mode is
      activated by passing PERF_FLAG_PID_CGROUP in the flags argument
      to the syscall.
      
      For instance to measure in cgroup foo on CPU1 assuming
      cgroupfs is mounted under /cgroup:
      
      struct perf_event_attr attr;
      int cgroup_fd, fd;
      
      cgroup_fd = open("/cgroup/foo", O_RDONLY);
      fd = perf_event_open(&attr, cgroup_fd, 1, -1, PERF_FLAG_PID_CGROUP);
      close(cgroup_fd);
      Signed-off-by: NStephane Eranian <eranian@google.com>
      [ added perf_cgroup_{exit,attach} ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4d590250.114ddf0a.689e.4482@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e5d1367f
  3. 11 2月, 2011 1 次提交
  4. 21 1月, 2011 1 次提交
    • D
      kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT · 6a108a14
      David Rientjes 提交于
      The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
      is used to configure any non-standard kernel with a much larger scope than
      only small devices.
      
      This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
      references to the option throughout the kernel.  A new CONFIG_EMBEDDED
      option is added that automatically selects CONFIG_EXPERT when enabled and
      can be used in the future to isolate options that should only be
      considered for embedded systems (RISC architectures, SLOB, etc).
      
      Calling the option "EXPERT" more accurately represents its intention: only
      expert users who understand the impact of the configuration changes they
      are making should enable it.
      Reviewed-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NDavid Woodhouse <david.woodhouse@intel.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Greg KH <gregkh@suse.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Robin Holt <holt@sgi.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a108a14
  5. 20 1月, 2011 1 次提交
    • T
      lockdep: Move early boot local IRQ enable/disable status to init/main.c · 2ce802f6
      Tejun Heo 提交于
      During early boot, local IRQ is disabled until IRQ subsystem is
      properly initialized.  During this time, no one should enable
      local IRQ and some operations which usually are not allowed with
      IRQ disabled, e.g. operations which might sleep or require
      communications with other processors, are allowed.
      
      lockdep tracked this with early_boot_irqs_off/on() callbacks.
      As other subsystems need this information too, move it to
      init/main.c and make it generally available.  While at it,
      toggle the boolean to early_boot_irqs_disabled instead of
      enabled so that it can be initialized with %false and %true
      indicates the exceptional condition.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <20110120110635.GB6036@htj.dyndns.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2ce802f6
  6. 14 1月, 2011 2 次提交
    • P
      rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status · c072a388
      Paul E. McKenney 提交于
      Because the adaptive synchronize_srcu_expedited() approach has
      worked very well in testing, remove the kernel parameter and
      replace it by a C-preprocessor macro.  If someone finds problems
      with this approach, a more complex and aggressively adaptive
      approach might be required.
      
      Longer term, SRCU will be merged with the other RCU implementations,
      at which point synchronize_srcu_expedited() will be event driven,
      just as synchronize_sched_expedited() currently is.  At that point,
      there will be no need for this adaptive approach.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      c072a388
    • L
      decompressors: add boot-time XZ support · 3ebe1243
      Lasse Collin 提交于
      This implements the API defined in <linux/decompress/generic.h> which is
      used for kernel, initramfs, and initrd decompression.  This patch together
      with the first patch is enough for XZ-compressed initramfs and initrd;
      XZ-compressed kernel will need arch-specific changes.
      
      The buffering requirements described in decompress_unxz.c are stricter
      than with gzip, so the relevant changes should be done to the
      arch-specific code when adding support for XZ-compressed kernel.
      Similarly, the heap size in arch-specific pre-boot code may need to be
      increased (30 KiB is enough).
      
      The XZ decompressor needs memmove(), memeq() (memcmp() == 0), and
      memzero() (memset(ptr, 0, size)), which aren't available in all
      arch-specific pre-boot environments.  I'm including simple versions in
      decompress_unxz.c, but a cleaner solution would naturally be nicer.
      Signed-off-by: NLasse Collin <lasse.collin@tukaani.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alain Knaff <alain@knaff.lu>
      Cc: Albin Tonnerre <albin.tonnerre@free-electrons.com>
      Cc: Phillip Lougher <phillip@lougher.demon.co.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ebe1243
  7. 04 1月, 2011 1 次提交
  8. 24 12月, 2010 1 次提交
    • T
      init: don't call flush_scheduled_work() from do_initcalls() · ee4569a3
      Tejun Heo 提交于
      The call to flush_scheduled_work() in do_initcalls() is there to make
      sure all works queued to system_wq by initcalls finish before the init
      sections are dropped.
      
      However, the call doesn't make much sense at this point - there
      already are multiple different workqueues and different subsystems are
      free to create and use their own.  Ordering requirements are and
      should be expressed explicitly.
      
      Drop the call to prepare for the deprecation and removal of
      flush_scheduled_work().
      
      Andrew suggested adding sanity check where the workqueue code checks
      whether any pending or running work has the work function in the init
      text section.  However, checking this for running works requires the
      worker to keep track of the current function being executed, and
      checking only the pending works will miss most cases.  As a violation
      will almost always be caught by the usual page fault mechanism, I
      don't think it would be worthwhile to make the workqueue code track
      extra state just for this.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      ee4569a3
  9. 23 12月, 2010 1 次提交
  10. 16 12月, 2010 2 次提交
  11. 30 11月, 2010 4 次提交
    • M
      sched: Add 'autogroup' scheduling feature: automated per session task groups · 5091faa4
      Mike Galbraith 提交于
      A recurring complaint from CFS users is that parallel kbuild has
      a negative impact on desktop interactivity.  This patch
      implements an idea from Linus, to automatically create task
      groups.  Currently, only per session autogroups are implemented,
      but the patch leaves the way open for enhancement.
      
      Implementation: each task's signal struct contains an inherited
      pointer to a refcounted autogroup struct containing a task group
      pointer, the default for all tasks pointing to the
      init_task_group.  When a task calls setsid(), a new task group
      is created, the process is moved into the new task group, and a
      reference to the preveious task group is dropped.  Child
      processes inherit this task group thereafter, and increase it's
      refcount.  When the last thread of a process exits, the
      process's reference is dropped, such that when the last process
      referencing an autogroup exits, the autogroup is destroyed.
      
      At runqueue selection time, IFF a task has no cgroup assignment,
      its current autogroup is used.
      
      Autogroup bandwidth is controllable via setting it's nice level
      through the proc filesystem:
      
        cat /proc/<pid>/autogroup
      
      Displays the task's group and the group's nice level.
      
        echo <nice level> > /proc/<pid>/autogroup
      
      Sets the task group's shares to the weight of nice <level> task.
      Setting nice level is rate limited for !admin users due to the
      abuse risk of task group locking.
      
      The feature is enabled from boot by default if
      CONFIG_SCHED_AUTOGROUP=y is selected, but can be disabled via
      the boot option noautogroup, and can also be turned on/off on
      the fly via:
      
        echo [01] > /proc/sys/kernel/sched_autogroup_enabled
      
      ... which will automatically move tasks to/from the root task group.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      [ Removed the task_group_path() debug code, and fixed !EVENTFD build failure. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290281700.28711.9.camel@maggy.simson.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5091faa4
    • P
      rcu: Make synchronize_srcu_expedited() fast if running readers · 46fdb093
      Paul E. McKenney 提交于
      The synchronize_srcu_expedited() function is currently quick if there
      are no active readers, but will delay a full jiffy if there are any.
      If these readers leave their SRCU read-side critical sections quickly,
      this is way too long to wait.  So this commit first waits ten microseconds,
      and only then falls back to jiffy-at-a-time waiting.
      Reported-by: NAvi Kivity <avi@redhat.com>
      Reported-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Tested-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      46fdb093
    • P
      rcu: add tracing for TINY_RCU and TINY_PREEMPT_RCU · 9e571a82
      Paul E. McKenney 提交于
      Add tracing for the tiny RCU implementations, including statistics on
      boosting in the case of TINY_PREEMPT_RCU and RCU_BOOST.
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      9e571a82
    • P
      rcu: priority boosting for TINY_PREEMPT_RCU · 24278d14
      Paul E. McKenney 提交于
      Add priority boosting, but only for TINY_PREEMPT_RCU.  This is enabled
      by the default-off RCU_BOOST kernel parameter.  The priority to which to
      boost preempted RCU readers is controlled by the RCU_BOOST_PRIO kernel
      parameter (defaulting to real-time priority 1) and the time to wait
      before boosting the readers blocking a given grace period is controlled
      by the RCU_BOOST_DELAY kernel parameter (defaulting to 500 milliseconds).
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      24278d14
  12. 26 11月, 2010 1 次提交
  13. 25 11月, 2010 1 次提交
    • M
      cgroups: make swap accounting default behavior configurable · a42c390c
      Michal Hocko 提交于
      Swap accounting can be configured by CONFIG_CGROUP_MEM_RES_CTLR_SWAP
      configuration option and then it is turned on by default.  There is a boot
      option (noswapaccount) which can disable this feature.
      
      This makes it hard for distributors to enable the configuration option as
      this feature leads to a bigger memory consumption and this is a no-go for
      general purpose distribution kernel.  On the other hand swap accounting
      may be very usuful for some workloads.
      
      This patch adds a new configuration option which controls the default
      behavior (CGROUP_MEM_RES_CTLR_SWAP_ENABLED).  If the option is selected
      then the feature is turned on by default.
      
      It also adds a new boot parameter swapaccount[=1|0] which enhances the
      original noswapaccount parameter semantic by means of enable/disable logic
      (defaults to 1 if no value is provided to be still consistent with
      noswapaccount).
      
      The default behavior is unchanged (if CONFIG_CGROUP_MEM_RES_CTLR_SWAP is
      enabled then CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED is enabled as well)
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a42c390c
  14. 18 11月, 2010 1 次提交
  15. 28 10月, 2010 6 次提交
  16. 27 10月, 2010 1 次提交
  17. 23 10月, 2010 2 次提交
    • A
      SYSFS: Allow boot time switching between deprecated and modern sysfs layout · e52eec13
      Andi Kleen 提交于
      I have some systems which need legacy sysfs due to old tools that are
      making assumptions that a directory can never be a symlink to another
      directory, and it's a big hazzle to compile separate kernels for them.
      
      This patch turns CONFIG_SYSFS_DEPRECATED into a run time option
      that can be switched on/off the kernel command line. This way
      the same binary can be used in both cases with just a option
      on the command line.
      
      The old CONFIG_SYSFS_DEPRECATED_V2 option is still there to set
      the default. I kept the weird name to not break existing
      config files.
      
      Also the compat code can be still completely disabled by undefining
      CONFIG_SYSFS_DEPRECATED_SWITCH -- just the optimizer takes
      care of this now instead of lots of ifdefs. This makes the code
      look nicer.
      
      v2: This is an updated version on top of Kay's patch to only
      handle the block devices. I tested it on my old systems
      and that seems to work.
      
      Cc: axboe@kernel.dk
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      e52eec13
    • K
      driver core: remove CONFIG_SYSFS_DEPRECATED_V2 but keep it for block devices · 39aba963
      Kay Sievers 提交于
      This patch removes the old CONFIG_SYSFS_DEPRECATED_V2 config option,
      but it keeps the logic around to handle block devices in the old manner
      as some people like to run new kernel versions on old (pre 2007/2008)
      distros.
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Stephen Hemminger <shemminger@vyatta.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: "James E.J. Bottomley" <James.Bottomley@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jaroslav Kysela <perex@perex.cz>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      
      39aba963
  18. 21 10月, 2010 1 次提交
    • A
      BKL: introduce CONFIG_BKL. · 6de5bd12
      Arnd Bergmann 提交于
      With all the patches we have queued in the BKL removal tree, only a
      few dozen modules are left that actually rely on the BKL, and even
      there are lots of low-hanging fruit. We need to decide what to do
      about them, this patch illustrates one of the options:
      
      Every user of the BKL is marked as 'depends on BKL' in Kconfig,
      and the CONFIG_BKL becomes a user-visible option. If it gets
      disabled, no BKL using module can be built any more and the BKL
      code itself is compiled out.
      
      The one exception is file locking, which is practically always
      enabled and does a 'select BKL' instead. This effectively forces
      CONFIG_BKL to be enabled until we have solved the fs/lockd
      mess and can apply the patch that removes the BKL from fs/locks.c.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      6de5bd12
  19. 19 10月, 2010 2 次提交
  20. 12 10月, 2010 1 次提交
  21. 04 10月, 2010 1 次提交
  22. 29 9月, 2010 1 次提交
    • H
      initramfs: fix initramfs size calculation · ffe8018c
      Hendrik Brueckner 提交于
      The size of a built-in initramfs is calculated in init/initramfs.c by
      "__initramfs_end - __initramfs_start".  Those symbols are defined in the
      linker script include/asm-generic/vmlinux.lds.h:
      
      #define INIT_RAM_FS                                                     \
              . = ALIGN(PAGE_SIZE);                                           \
              VMLINUX_SYMBOL(__initramfs_start) = .;                          \
              *(.init.ramfs)                                                  \
              VMLINUX_SYMBOL(__initramfs_end) = .;
      
      If the initramfs file has an odd number of bytes, the "__initramfs_end"
      symbol points to an odd address, for example, the symbols in the
      System.map might look like:
      
          0000000000572000 T __initramfs_start
          00000000005bcd05 T __initramfs_end	  <-- odd address
      
      At least on s390 this causes a problem:
      
      Certain s390 instructions, especially instructions for loading addresses
      (larl) or branch addresses must be on even addresses.  The compiler loads
      the symbol addresses with the "larl" instruction.  This instruction sets
      the last bit to 0 and, therefore, for odd size files, the calculated size
      is one byte less than it should be:
      
          0000000000540a9c <populate_rootfs>:
            540a9c:     eb cf f0 78 00 24       stmg    %r12,%r15,120(%r15),
            540aa2:     c0 10 00 01 8a af       larl    %r1,572000 <__initramfs_start>
            540aa8:     c0 c0 00 03 e1 2e       larl    %r12,5bcd04 <initramfs_end>
                                                        (Instead of  5bcd05)
            ...
            540abe:     1b c1                   sr      %r12,%r1
      
      To fix the problem, this patch introduces the global variable
      __initramfs_size, which is calculated in the "usr/initramfs_data.S" file.
      The populate_rootfs() function can then use the start marker of the
      .init.ramfs section and the value of __initramfs_size for loading the
      initramfs.  Because the start marker and size is sufficient, the
      __initramfs_end symbol is no longer needed and is removed.
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Acked-by: NMichal Marek <mmarek@suse.cz>
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NMichal Marek <mmarek@suse.cz>
      ffe8018c
  23. 17 9月, 2010 2 次提交
    • C
      NFS: Use super.c for NFSROOT mount option parsing · 56463e50
      Chuck Lever 提交于
      Replace duplicate code in NFSROOT for mounting an NFS server on '/'
      with logic that uses the existing mainline text-based logic in the NFS
      client.
      
      Add documenting comments where appropriate.
      
      Note that this means NFSROOT mounts now use the same default settings
      as v2/v3 mounts done via mount(2) from user space.
      
        vers=3,tcp,rsize=<negotiated default>,wsize=<negotiated default>
      
      As before, however, no version/protocol negotiation with the server is
      done.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      56463e50
    • J
      do_mounts: only enable PARTUUID for CONFIG_BLOCK · 6d0aed7a
      Jens Axboe 提交于
      When CONFIG_BLOCK is not enabled:
      
      init/do_mounts.c:71: error: implicit declaration of function 'dev_to_part'
      init/do_mounts.c:71: warning: initialization makes pointer from integer without a cast
      init/do_mounts.c:73: error: dereferencing pointer to incomplete type
      init/do_mounts.c:76: error: dereferencing pointer to incomplete type
      init/do_mounts.c:76: error: dereferencing pointer to incomplete type
      init/do_mounts.c:102: error: implicit declaration of function 'part_pack_uuid'
      init/do_mounts.c:104: error: 'block_class' undeclared (first use in this function)
      Reported-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      6d0aed7a
  24. 16 9月, 2010 2 次提交
  25. 15 9月, 2010 1 次提交
    • W
      init: add support for root devices specified by partition UUID · b5af921e
      Will Drewry 提交于
      This is the third patch in a series which adds support for
      storing partition metadata, optionally, off of the hd_struct.
      
      One major use for that data is being able to resolve partition
      by other identities than just the index on a block device.  Device
      enumeration varies by platform and there's a benefit to being able
      to use something like EFI GPT's GUIDs to determine the correct
      block device and partition to mount as the root.
      
      This change adds that support to root= by adding support for
      the following syntax:
      
        root=PARTUUID=hex-uuid
      Signed-off-by: NWill Drewry <wad@chromium.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      b5af921e
  26. 23 8月, 2010 1 次提交