1. 03 4月, 2009 7 次提交
    • L
      cgroups: show correct file mode · 099fca32
      Li Zefan 提交于
      We have some read-only files and write-only files, but currently they are
      all set to 0644, which is counter-intuitive and cause trouble for some
      cgroup tools like libcgroup.
      
      This patch adds 'mode' to struct cftype to allow cgroup subsys to set it's
      own files' file mode, and for the most cases cft->mode can be default to 0
      and cgroup will figure out proper mode.
      Acked-by: NPaul Menage <menage@google.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      099fca32
    • J
      kernel/cgroup.c: kfree(NULL) is legal · 66bdc9cf
      Jesper Juhl 提交于
      Reduces object file size a bit:
      
      Before:
      $ size kernel/cgroup.o
         text    data     bss     dec     hex filename
        21593    7804    4924   34321    8611 kernel/cgroup.o
      After:
      $ size kernel/cgroup.o
         text    data     bss     dec     hex filename
        21537    7744    4924   34205    859d kernel/cgroup.o
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66bdc9cf
    • K
      cgroup: fix frequent -EBUSY at rmdir · ec64f515
      KAMEZAWA Hiroyuki 提交于
      In following situation, with memory subsystem,
      
      	/groupA use_hierarchy==1
      		/01 some tasks
      		/02 some tasks
      		/03 some tasks
      		/04 empty
      
      When tasks under 01/02/03 hit limit on /groupA, hierarchical reclaim
      is triggered and the kernel walks tree under groupA. In this case,
      rmdir /groupA/04 fails with -EBUSY frequently because of temporal
      refcnt from the kernel.
      
      In general. cgroup can be rmdir'd if there are no children groups and
      no tasks. Frequent fails of rmdir() is not useful to users.
      (And the reason for -EBUSY is unknown to users.....in most cases)
      
      This patch tries to modify above behavior, by
      	- retries if css_refcnt is got by someone.
      	- add "return value" to pre_destroy() and allows subsystem to
      	  say "we're really busy!"
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec64f515
    • K
      cgroup: CSS ID support · 38460b48
      KAMEZAWA Hiroyuki 提交于
      Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
      
      This patch attaches unique ID to each css and provides following.
      
       - css_lookup(subsys, id)
         returns pointer to struct cgroup_subysys_state of id.
       - css_get_next(subsys, id, rootid, depth, foundid)
         returns the next css under "root" by scanning
      
      When cgroup_subsys->use_id is set, an id for css is maintained.
      
      The cgroup framework only parepares
      	- css_id of root css for subsys
      	- id is automatically attached at creation of css.
      	- id is *not* freed automatically. Because the cgroup framework
      	  don't know lifetime of cgroup_subsys_state.
      	  free_css_id() function is provided. This must be called by subsys.
      
      There are several reasons to develop this.
      	- Saving space .... For example, memcg's swap_cgroup is array of
      	  pointers to cgroup. But it is not necessary to be very fast.
      	  By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
      	  reduce much amount of memory usage.
      
      	- Scanning without lock.
      	  CSS_ID provides "scan id under this ROOT" function. By this, scanning
      	  css under root can be written without locks.
      	  ex)
      	  do {
      		rcu_read_lock();
      		next = cgroup_get_next(subsys, id, root, &found);
      		/* check sanity of next here */
      		css_tryget();
      		rcu_read_unlock();
      		id = found + 1
      	 } while(...)
      
      Characteristics:
      	- Each css has unique ID under subsys.
      	- Lifetime of ID is controlled by subsys.
      	- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
      	- Allowed ID is 1-65535, ID 0 is UNUSED ID.
      
      Design Choices:
      	- scan-by-ID v.s. scan-by-tree-walk.
      	  As /proc's pid scan does, scan-by-ID is robust when scanning is done
      	  by following kind of routine.
      	  scan -> rest a while(release a lock) -> conitunue from interrupted
      	  memcg's hierarchical reclaim does this.
      
      	- When subsys->use_id is set, # of css in the system is limited to
      	  65535.
      
      [bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38460b48
    • G
      cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroups · 313e924c
      Grzegorz Nosek 提交于
      The ns_proxy cgroup allows moving processes to child cgroups only one
      level deep at a time.  This commit relaxes this restriction and makes it
      possible to attach tasks directly to grandchild cgroups, e.g.:
      
      ($pid is in the root cgroup)
      echo $pid > /cgroup/CG1/CG2/tasks
      
      Previously this operation would fail with -EPERM and would have to be
      performed as two steps:
      echo $pid > /cgroup/CG1/tasks
      echo $pid > /cgroup/CG1/CG2/tasks
      
      Also, the target cgroup no longer needs to be empty to move a task there.
      Signed-off-by: NGrzegorz Nosek <root@localdomain.pl>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      313e924c
    • A
      Simplify copy_thread() · 6f2c55b8
      Alexey Dobriyan 提交于
      First argument unused since 2.3.11.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f2c55b8
    • D
      nommu: fix a number of issues with the per-MM VMA patch · 33e5d769
      David Howells 提交于
      Fix a number of issues with the per-MM VMA patch:
      
       (1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on
           a NOMMU system with more than 2G pages.  Makes no difference on a 32-bit
           system.
      
       (2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value,
           lest it overflow.
      
       (3) Move the allocation of the vm_area_struct slab back for fork.c.
      
       (4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs.
      
       (5) Use BUG_ON() rather than if () BUG().
      
       (6) Make the default validate_nommu_regions() a static inline rather than a
           #define.
      
       (7) Make free_page_series()'s objection to pages with a refcount != 1 more
           informative.
      
       (8) Adjust the __put_nommu_region() banner comment to indicate that the
           semaphore must be held for writing.
      
       (9) Limit the number of warnings about munmaps of non-mmapped regions.
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33e5d769
  2. 01 4月, 2009 4 次提交
    • D
      epoll keyed wakeups: add __wake_up_locked_key() and __wake_up_sync_key() · 4ede816a
      Davide Libenzi 提交于
      This patchset introduces wakeup hints for some of the most popular (from
      epoll POV) devices, so that epoll code can avoid spurious wakeups on its
      waiters.
      
      The problem with epoll is that the callback-based wakeups do not, ATM,
      carry any information about the events the wakeup is related to.  So the
      only choice epoll has (not being able to call f_op->poll() from inside the
      callback), is to add the file* to a ready-list and resolve the real events
      later on, at epoll_wait() (or its own f_op->poll()) time.  This can cause
      spurious wakeups, since the wake_up() itself might be for an event the
      caller is not interested into.
      
      The rate of these spurious wakeup can be pretty high in case of many
      network sockets being monitored.
      
      By allowing devices to report the events the wakeups refer to (at least
      the two major classes - POLLIN/POLLOUT), we are able to spare useless
      wakeups by proper handling inside the epoll's poll callback.
      
      Epoll will have in any case to call f_op->poll() on the file* later on,
      since the change to be done in order to have the full event set sent via
      wakeup, is too invasive for the way our f_op->poll() system works (the
      full event set is calculated inside the poll function - there are too many
      of them to even start thinking the change - also poll/select would need
      change too).
      
      Epoll is changed in a way that both devices which send event hints, and
      the ones that don't, are correctly handled.  The former will gain some
      efficiency though.
      
      As a general rule for devices, would be to add an event mask by using
      key-aware wakeup macros, when making up poll wait queues.  I tested it
      (together with the epoll's poll fix patch Andrew has in -mm) and wakeups
      for the supported devices are correctly filtered.
      
      Test program available here:
      
      http://www.xmailserver.org/epoll_test.c
      
      This patch:
      
      Nothing revolutionary here.  Just using the available "key" that our
      wakeup core already support.  The __wake_up_locked_key() was no brainer,
      since both __wake_up_locked() and __wake_up_locked_key() are thin wrappers
      around __wake_up_common().
      
      The __wake_up_sync() function had a body, so the choice was between
      borrowing the body for __wake_up_sync_key() and calling it from
      __wake_up_sync(), or make an inline and calling it from both.  I chose the
      former since in most archs it all resolves to "mov $0, REG; jmp ADDR".
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      Cc: William Lee Irwin III <wli@movementarian.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ede816a
    • M
      pm: rework includes, remove arch ifdefs · a8af7898
      Magnus Damm 提交于
      Make the following header file changes:
      
       - remove arch ifdefs and asm/suspend.h from linux/suspend.h
       - add asm/suspend.h to disk.c (for arch_prepare_suspend())
       - add linux/io.h to swsusp.c (for ioremap())
       - x86 32/64 bit compile fixes
      Signed-off-by: NMagnus Damm <damm@igel.co.jp>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8af7898
    • A
      mm: fix proc_dointvec_userhz_jiffies "breakage" · 704503d8
      Alexey Dobriyan 提交于
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9838
      
      On i386, HZ=1000, jiffies_to_clock_t() converts time in a somewhat strange
      way from the user's point of view:
      
      	# echo 500 >/proc/sys/vm/dirty_writeback_centisecs
      	# cat /proc/sys/vm/dirty_writeback_centisecs
      	499
      
      So, we have 5000 jiffies converted to only 499 clock ticks and reported
      back.
      
      TICK_NSEC = 999848
      ACTHZ = 256039
      
      Keeping in-kernel variable in units passed from userspace will fix issue
      of course, but this probably won't be right for every sysctl.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      704503d8
    • K
      mm: introduce for_each_populated_zone() macro · ee99c71c
      KOSAKI Motohiro 提交于
      Impact: cleanup
      
      In almost cases, for_each_zone() is used with populated_zone().  It's
      because almost function doesn't need memoryless node information.
      Therefore, for_each_populated_zone() can help to make code simplify.
      
      This patch has no functional change.
      
      [akpm@linux-foundation.org: small cleanup]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee99c71c
  3. 31 3月, 2009 6 次提交
    • P
      lockdep: fix deadlock in lockdep_trace_alloc · 2f850181
      Peter Zijlstra 提交于
      Heiko reported that we grab the graph lock with irqs enabled.
      
      Fix this by providng the same wrapper as all other lockdep entry
      functions have.
      Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Nick Piggin <npiggin@suse.de>
      LKML-Reference: <1237544000.24626.52.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2f850181
    • R
      kexec: Change kexec jump code ordering · 749b0afc
      Rafael J. Wysocki 提交于
      Change the ordering of the kexec jump code so that the nonboot CPUs
      are disabled after calling device drivers' "late suspend" methods.
      
      This change reflects the recent modifications of the power management
      code that is also used by kexec jump.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      749b0afc
    • R
      PM: Change hibernation code ordering · 4aecd671
      Rafael J. Wysocki 提交于
      Change the ordering of the hibernation core code so that the platform
      "prepare" callbacks are executed and the nonboot CPUs are disabled
      after calling device drivers' "late suspend" methods.
      
      This change (along with the previous analogous change of the suspend
      core code) will allow us to rework the PCI PM core so that the power
      state of devices is changed in the "late" phase of suspend (and
      analogously in the "early" phase of resume), which in turn will allow
      us to avoid the race condition where a device using shared interrupts
      is put into a low power state with interrupts enabled and then an
      interrupt (for another device) comes in and confuses its driver.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      4aecd671
    • R
      PM: Change suspend code ordering · 900af0d9
      Rafael J. Wysocki 提交于
      Change the ordering of the suspend core code so that the platform
      "prepare" callback is executed and the nonboot CPUs are disabled
      after calling device drivers' "late suspend" methods.
      
      This change will allow us to rework the PCI PM core so that the power
      state of devices is changed in the "late" phase of suspend (and
      analogously in the "early" phase of resume), which in turn will allow
      us to avoid the race condition where a device using shared interrupts
      is put into a low power state with interrupts enabled and then an
      interrupt (for another device) comes in and confuses its driver.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      900af0d9
    • R
      PM: Rework handling of interrupts during suspend-resume · 2ed8d2b3
      Rafael J. Wysocki 提交于
      Use the functions introduced in by the previous patch,
      suspend_device_irqs(), resume_device_irqs() and check_wakeup_irqs(),
      to rework the handling of interrupts during suspend (hibernation) and
      resume.  Namely, interrupts will only be disabled on the CPU right
      before suspending sysdevs, while device drivers will be prevented
      from receiving interrupts, with the help of the new helper function,
      before their "late" suspend callbacks run (and analogously during
      resume).
      
      In addition, since the device interrups are now disabled before the
      CPU has turned all interrupts off and the CPU will ACK the interrupts
      setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
      any wake-up interrupts are pending and abort suspend if that's the
      case.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      2ed8d2b3
    • R
      PM: Introduce functions for suspending and resuming device interrupts · 0a0c5168
      Rafael J. Wysocki 提交于
      Introduce helper functions allowing us to prevent device drivers from
      getting any interrupts (without disabling interrupts on the CPU)
      during suspend (or hibernation) and to make them start to receive
      interrupts again during the subsequent resume.  These functions make it
      possible to keep timer interrupts enabled while the "late" suspend and
      "early" resume callbacks provided by device drivers are being
      executed.  In turn, this allows device drivers' "late" suspend and
      "early" resume callbacks to sleep, execute ACPI callbacks etc.
      
      The functions introduced here will be used to rework the handling of
      interrupts during suspend (hibernation) and resume.  Namely,
      interrupts will only be disabled on the CPU right before suspending
      sysdevs, while device drivers will be prevented from receiving
      interrupts, with the help of the new helper function, before their
      "late" suspend callbacks run (and analogously during resume).
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      0a0c5168
  4. 30 3月, 2009 7 次提交
  5. 29 3月, 2009 2 次提交
  6. 28 3月, 2009 2 次提交
  7. 25 3月, 2009 12 次提交
    • G
      sched: Add comments to find_busiest_group() function · b7bb4c9b
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Add /** style comments around find_busiest_group(). Also add a few
      explanatory comments.
      
      This concludes the find_busiest_group() cleanup. The function is
      now down to 72 lines from the original 313 lines.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091427.13992.18933.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b7bb4c9b
    • G
      sched: Refactor the power savings balance code · c071df18
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Create seperate helper functions to initialize the
      power-savings-balance related variables, to update them and
      to check if we have a scope for performing power-savings balance.
      
      Add no-op inline functions for the !(CONFIG_SCHED_MC || CONFIG_SCHED_SMT)
      case.
      
      This will eliminate all the #ifdef jungle in find_busiest_group() and the
      other helper functions.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091422.13992.73616.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c071df18
    • G
      sched: Optimize the !power_savings_balance during fbg() · a021dc03
      Gautham R Shenoy 提交于
      Impact: cleanup, micro-optimization
      
      We don't need to perform power_savings balance if either the
      cpu is NOT_IDLE or if the sched_domain doesn't contain the
      SD_POWERSAVINGS_BALANCE flag set.
      
      Currently, we check for these conditions multiple number of
      times, even though these variables don't change over the scope
      of find_busiest_group().
      
      Check once, and store the value in the already exiting
      "power_savings_balance" variable.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091417.13992.2657.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a021dc03
    • G
      sched: Create a helper function to calculate imbalance · dbc523a3
      Gautham R Shenoy 提交于
      Move all the imbalance calculation out of find_busiest_group()
      through this helper function.
      
      With this change, the structure of find_busiest_group() will be
      as follows:
      
      - update_sched_domain_statistics.
      
      - check if imbalance exits.
      
      - update imbalance and return busiest.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091411.13992.43293.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dbc523a3
    • G
      sched: Create helper to calculate small_imbalance in fbg() · 2e6f44ae
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      We have two places in find_busiest_group() where we need to calculate
      the minor imbalance before returning the busiest group. Encapsulate
      this functionality into a seperate helper function.
      
      Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      LKML-Reference: <20090325091406.13992.54316.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2e6f44ae
    • G
      sched: Create a helper function to calculate sched_domain stats for fbg() · 37abe198
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Create a helper function named update_sd_lb_stats() to update the
      various sched_domain related statistics in find_busiest_group().
      
      With this we would have moved all the statistics computation out of
      find_busiest_group().
      
      Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      LKML-Reference: <20090325091401.13992.88737.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      37abe198
    • G
      sched: Define structure to store the sched_domain statistics for fbg() · 222d656d
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Currently we use a lot of local variables in find_busiest_group()
      to capture the various statistics related to the sched_domain.
      Group them together into a single data structure.
      
      This will help us to offload the job of updating the sched_domain
      statistics to a helper function.
      
      Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      LKML-Reference: <20090325091356.13992.25970.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      222d656d
    • G
      sched: Create a helper function to calculate sched_group stats for fbg() · 1f8c553d
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Create a helper function named update_sg_lb_stats() which
      can be invoked to calculate the individual group's statistics
      in find_busiest_group().
      
      This reduces the lenght of find_busiest_group() considerably.
      
      Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Aked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      LKML-Reference: <20090325091351.13992.43461.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f8c553d
    • G
      sched: Define structure to store the sched_group statistics for fbg() · 381be78f
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Currently a whole bunch of variables are used to store the
      various statistics pertaining to the groups we iterate over
      in find_busiest_group().
      
      Group them together in a single data structure and add
      appropriate comments.
      
      This will be useful later on when we create helper functions
      to calculate the sched_group statistics.
      
      Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      LKML-Reference: <20090325091345.13992.20099.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      381be78f
    • G
      sched: Fix indentations in find_busiest_group() using gotos · 6dfdb062
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Some indentations in find_busiest_group() can minimized by using
      early exits with the help of gotos. This improves readability in
      a couple of cases.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091340.13992.45062.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6dfdb062
    • G
      sched: Simple helper functions for find_busiest_group() · 67bb6c03
      Gautham R Shenoy 提交于
      Impact: cleanup
      
      Currently the load idx calculation code is in find_busiest_group().
      Move that to a static inline helper function.
      
      Similary, to find the first cpu of a sched_group we use
      cpumask_first(sched_group_cpus(group))
      
      Use a helper to that. It improves readability in some cases.
      Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Balbir Singh" <balbir@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
      Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
      Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
      LKML-Reference: <20090325091335.13992.55424.stgit@sofia.in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      67bb6c03
    • J
      dynamic debug: combine dprintk and dynamic printk · e9d376f0
      Jason Baron 提交于
      This patch combines Greg Bank's dprintk() work with the existing dynamic
      printk patchset, we are now calling it 'dynamic debug'.
      
      The new feature of this patchset is a richer /debugfs control file interface,
      (an example output from my system is at the bottom), which allows fined grained
      control over the the debug output. The output can be controlled by function,
      file, module, format string, and line number.
      
      for example, enabled all debug messages in module 'nf_conntrack':
      
      echo -n 'module nf_conntrack +p' > /mnt/debugfs/dynamic_debug/control
      
      to disable them:
      
      echo -n 'module nf_conntrack -p' > /mnt/debugfs/dynamic_debug/control
      
      A further explanation can be found in the documentation patch.
      Signed-off-by: NGreg Banks <gnb@sgi.com>
      Signed-off-by: NJason Baron <jbaron@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      e9d376f0