1. 09 8月, 2013 5 次提交
    • T
      cgroup: add css_parent() · 63876986
      Tejun Heo 提交于
      Currently, controllers have to explicitly follow the cgroup hierarchy
      to find the parent of a given css.  cgroup is moving towards using
      cgroup_subsys_state as the main controller interface construct, so
      let's provide a way to climb the hierarchy using just csses.
      
      This patch implements css_parent() which, given a css, returns its
      parent.  The function is guarnateed to valid non-NULL parent css as
      long as the target css is not at the top of the hierarchy.
      
      freezer, cpuset, cpu, cpuacct, hugetlb, memory, net_cls and devices
      are converted to use css_parent() instead of accessing cgroup->parent
      directly.
      
      * __parent_ca() is dropped from cpuacct and its usage is replaced with
        parent_ca().  The only difference between the two was NULL test on
        cgroup->parent which is now embedded in css_parent() making the
        distinction moot.  Note that eventually a css->parent field will be
        added to css and the NULL check in css_parent() will go away.
      
      This patch shouldn't cause any behavior differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      63876986
    • T
      cgroup: add/update accessors which obtain subsys specific data from css · a7c6d554
      Tejun Heo 提交于
      css (cgroup_subsys_state) is usually embedded in a subsys specific
      data structure.  Subsystems either use container_of() directly to cast
      from css to such data structure or has an accessor function wrapping
      such cast.  As cgroup as whole is moving towards using css as the main
      interface handle, add and update such accessors to ease dealing with
      css's.
      
      All accessors explicitly handle NULL input and return NULL in those
      cases.  While this looks like an extra branch in the code, as all
      controllers specific data structures have css as the first field, the
      casting doesn't involve any offsetting and the compiler can trivially
      optimize out the branch.
      
      * blkio, freezer, cpuset, cpu, cpuacct and net_cls didn't have such
        accessor.  Added.
      
      * memory, hugetlb and devices already had one but didn't explicitly
        handle NULL input.  Updated.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      a7c6d554
    • T
      cgroup: add subsystem pointer to cgroup_subsys_state · 72c97e54
      Tejun Heo 提交于
      Currently, given a cgroup_subsys_state, there's no way to find out
      which subsystem the css is for, which we'll need to convert the cgroup
      controller API to primarily use @css instead of @cgroup.  This patch
      adds cgroup_subsys_state->ss which points to the subsystem the @css
      belongs to.
      
      While at it, remove the comment about accessing @css->cgroup to
      determine the hierarchy.  cgroup core will provide API to traverse
      hierarchy of css'es and we don't want subsystems to directly walk
      cgroup hierarchies anymore.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      72c97e54
    • T
      cpuset: drop "const" qualifiers from struct cpuset instances · c9710d80
      Tejun Heo 提交于
      cpuset uses "const" qualifiers on struct cpuset in some functions;
      however, it doesn't work well when a value derived from returned const
      pointer has to be passed to an accessor.  It's C after all.
      
      Drop the "const" qualifiers except for the trivially leaf ones.  This
      patch doesn't make any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      c9710d80
    • T
      cgroup: s/cgroup_subsys_state/cgroup_css/ s/task_subsys_state/task_css/ · 8af01f56
      Tejun Heo 提交于
      The names of the two struct cgroup_subsys_state accessors -
      cgroup_subsys_state() and task_subsys_state() - are somewhat awkward.
      The former clashes with the type name and the latter doesn't even
      indicate it's somehow related to cgroup.
      
      We're about to revamp large portion of cgroup API, so, let's rename
      them so that they're less awkward.  Most per-controller usages of the
      accessors are localized in accessor wrappers and given the amount of
      scheduled changes, this isn't gonna add any noticeable headache.
      
      Rename cgroup_subsys_state() to cgroup_css() and task_subsys_state()
      to task_css().  This patch is pure rename.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      8af01f56
  2. 01 8月, 2013 2 次提交
  3. 31 7月, 2013 5 次提交
  4. 30 7月, 2013 2 次提交
  5. 16 7月, 2013 2 次提交
    • T
      cgroup: remove gratuituous BUG_ON()s from rebind_subsystems() · a698b448
      Tejun Heo 提交于
      rebind_subsystems() performs santiy checks even on subsystems which
      aren't specified to be added or removed and the checks aren't all that
      useful given that these are in a very cold path while the violations
      they check would trip up in much hotter paths.
      
      Let's remove these from rebind_subsystems().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      a698b448
    • T
      cgroup: move module ref handling into rebind_subsystems() · 1d5be6b2
      Tejun Heo 提交于
      Module ref handling in cgroup is rather weird.
      parse_cgroupfs_options() grabs all the modules for the specified
      subsystems.  A module ref is kept if the specified subsystem is newly
      bound to the hierarchy.  If not, or the operation fails, the refs are
      dropped.  This scatters module ref handling across multiple functions
      making it difficult to track.  It also make the function nasty to use
      for dynamic subsystem binding which is necessary for the planned
      unified hierarchy.
      
      There's nothing which requires the subsystem modules to be pinned
      between parse_cgroupfs_options() and rebind_subsystems() in both mount
      and remount paths.  parse_cgroupfs_options() can just parse and
      rebind_subsystems() can handle pinning the subsystems that it wants to
      bind, which is a natural part of its task - binding - anyway.
      
      Move module ref handling into rebind_subsystems() which makes the code
      a lot simpler - modules are gotten iff it's gonna be bound and put iff
      unbound or binding fails.
      
      v2: Li pointed out that if a controller module is unloaded between
          parsing and binding, rebind_subsystems() won't notice the missing
          controller as it only iterates through existing controllers.  Fix
          it by updating rebind_subsystems() to compare @added_mask to
          @pinned and fail with -ENOENT if they don't match.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      1d5be6b2
  6. 13 7月, 2013 9 次提交
    • T
      cgroup: replace task_cgroup_path_from_hierarchy() with task_cgroup_path() · 913ffdb5
      Tejun Heo 提交于
      task_cgroup_path_from_hierarchy() was added for the planned new users
      and none of the currently planned users wants to know about multiple
      hierarchies.  This patch drops the multiple hierarchy part and makes
      it always return the path in the first non-dummy hierarchy.
      
      As unified hierarchy will always have id 1, this is guaranteed to
      return the path for the unified hierarchy if mounted; otherwise, it
      will return the path from the hierarchy which happens to occupy the
      lowest hierarchy id, which will usually be the first hierarchy mounted
      after boot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Jan Kaluža <jkaluza@redhat.com>
      913ffdb5
    • T
      cgroup: move number_of_cgroups test out of rebind_subsystems() into cgroup_remount() · f172e67c
      Tejun Heo 提交于
      rebind_subsystems() currently fails if the hierarchy has any !root
      cgroups; however, on the planned unified hierarchy,
      rebind_subsystems() will be used while populated.  Move the test to
      cgroup_remount(), which is the only place the test is necessary
      anyway.
      
      As it's impossible for the other two callers of rebind_subsystems() to
      have populated hierarchy, this doesn't make any behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      f172e67c
    • T
      cgroup: make rebind_subsystems() handle file additions and removals with proper error handling · 3126121f
      Tejun Heo 提交于
      Currently, creating and removing cgroup files in the root directory
      are handled separately from the actual subsystem binding and unbinding
      which happens in rebind_subsystems().  Also, rebind_subsystems() users
      aren't handling file creation errors properly.  Let's integrate
      top_cgroup file handling into rebind_subsystems() so that it's simpler
      to use and everyone handles file creation errors correctly.
      
      * On a successful return, rebind_subsystems() is guaranteed to have
        created all files of the new subsystems and deleted the ones
        belonging to the removed subsystems.  After a failure, no file is
        created or removed.
      
      * cgroup_remount() no longer needs to make explicit populate/clear
        calls as it's all handled by rebind_subsystems(), and it gets proper
        error handling automatically.
      
      * cgroup_mount() has been updated such that the root dentry and cgroup
        are linked before rebind_subsystems().  Also, the init_cred dancing
        and base file handling are moved right above rebind_subsystems()
        call and proper error handling for the base files is added.  While
        at it, add a comment explaining what's going on with the cred thing.
      
      * cgroup_kill_sb() calls rebind_subsystems() to unbind all subsystems
        which now implies removing all subsystem files which requires the
        directory's i_mutex.  Grab it.  This means that files on the root
        cgroup are removed earlier - they used to be deleted from generic
        super_block cleanup from vfs.  This doesn't lead to any functional
        difference and it's cleaner to do the clean up explicitly for all
        files.
      
      Combined with the previous changes, this makes all cgroup file
      creation errors handled correctly.
      
      v2: Added comment on init_cred.
      
      v3: Li spotted that cgroup_mount() wasn't freeing tmp_links after base
          file addition failure.  Fix it by adding free_tmp_links error
          handling label.
      
      v4: v3 introduced build bugs which got noticed by Fengguang's awesome
          kbuild test robot.  Fixed, and shame on me.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      3126121f
    • T
      cgroup: use for_each_subsys() instead of for_each_root_subsys() in cgroup_populate/clear_dir() · b420ba7d
      Tejun Heo 提交于
      rebind_subsystems() will be updated to handle file creations and
      removals with proper error handling and to do that will need to
      perform file operations before actually adding the subsystem to the
      hierarchy.
      
      To enable such usage, update cgroup_populate/clear_dir() to use
      for_each_subsys() instead of for_each_root_subsys() so that they
      operate on all subsystems specified by @subsys_mask whether that
      subsystem is currently bound to the hierarchy or not.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      b420ba7d
    • T
      cgroup: update error handling in cgroup_populate_dir() · bee55099
      Tejun Heo 提交于
      cgroup_populate_dir() didn't use to check whether the actual file
      creations were successful and could return success with only subset of
      the requested files created, which is nasty.
      
      This patch udpates cgroup_populate_dir() so that it either succeeds
      with all files or fails with no file.
      
      v2: The original patch also converted for_each_root_subsys() usages to
          for_each_subsys() without explaining why.  That part has been
          moved to a separate patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      bee55099
    • T
      cgroup: separate out cgroup_base_files[] handling out of cgroup_populate/clear_dir() · 628f7cd4
      Tejun Heo 提交于
      cgroup_populate/clear_dir() currently take @base_files and adds and
      removes, respectively, cgroup_base_files[] to the directory.  File
      additions and removals are being reorganized for proper error handling
      and more dynamic handling for the unified hierarchy, and mixing base
      and subsys file handling into the same functions gets a bit confusing.
      
      This patch moves base file handling out of cgroup_populate/clear_dir()
      into their users - cgroup_mount(), cgroup_create() and
      cgroup_destroy_locked().
      
      Note that this changes the behavior of base file removal.  If
      @base_files is %true, cgroup_clear_dir() used to delete files
      regardless of cftype until there's no files left.  Now, only files
      with matching cfts are removed.  As files can only be created by the
      base or registered cftypes, this shouldn't result in any behavior
      difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      628f7cd4
    • T
      cgroup: fix cgroup_add_cftypes() error handling · 9ccece80
      Tejun Heo 提交于
      cgroup_add_cftypes() uses cgroup_cfts_commit() to actually create the
      files; however, both functions ignore actual file creation errors and
      just assume success.  This can lead to, for example, blkio hierarchy
      with some of the cgroups with only subset of interface files populated
      after cfq-iosched is loaded under heavy memory pressure, which is
      nasty.
      
      This patch updates cgroup_cfts_commit() and cgroup_add_cftypes() to
      guarantee that all files are created on success and no file is created
      on failure.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      9ccece80
    • T
      cgroup: fix error path of cgroup_addrm_files() · b1f28d31
      Tejun Heo 提交于
      cgroup_addrm_files() mishandled error return value from
      cgroup_add_file() and returns error iff the last file fails to create.
      As we're in the process of cleaning up file add/rm error handling and
      will reliably propagate file creation failures, there's no point in
      keeping adding files after a failure.
      
      Replace the broken error collection logic with immediate error return.
      While at it, add lockdep assertions and function comment.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      b1f28d31
    • T
      cgroup: minor updates around cgroup_clear_directory() · 8f89140a
      Tejun Heo 提交于
      * Rename it to cgroup_clear_dir() and make it take the pointer to the
        target cgroup instead of the the dentry.  This makes the function
        consistent with its counterpart - cgroup_populate_dir().
      
      * Move cgroup_clear_directory() invocation from cgroup_d_remove_dir()
        to cgroup_remount() so that the function doesn't have to determine
        the cgroup pointer back from the dentry.  cgroup_d_remove_dir() now
        only deals with vfs, which is slightly cleaner.
      
      This patch doesn't introduce any functional differences.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      8f89140a
  7. 11 7月, 2013 1 次提交
  8. 10 7月, 2013 11 次提交
  9. 06 7月, 2013 1 次提交
  10. 05 7月, 2013 2 次提交
    • T
      hrtimers: Move SMP function call to thread context · 5ec2481b
      Thomas Gleixner 提交于
      smp_call_function_* must not be called from softirq context.
      
      But clock_was_set() which calls on_each_cpu() is called from softirq
      context to implement a delayed clock_was_set() for the timer interrupt
      handler. Though that almost never gets invoked. A recent change in the
      resume code uses the softirq based delayed clock_was_set to support
      Xens resume mechanism.
      
      linux-next contains a new warning which warns if smp_call_function_*
      is called from softirq context which gets triggered by that Xen
      change.
      
      Fix this by moving the delayed clock_was_set() call to a work context.
      Reported-and-tested-by: NArtem Savkov <artem.savkov@gmail.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: H. Peter Anvin <hpa@zytor.com>,
      Cc: Konrad Wilk <konrad.wilk@oracle.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: xen-devel@lists.xen.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5ec2481b
    • T
      clocksource: Reselect clocksource when watchdog validated high-res capability · 332962f2
      Thomas Gleixner 提交于
      Up to commit 5d33b883 (clocksource: Always verify highres capability)
      we had no sanity check when selecting a clocksource, which prevented
      that a non highres capable clocksource is used when the system already
      switched to highres/nohz mode.
      
      The new sanity check works as Alex and Tim found out. It prevents the
      TSC from being used. This happens because on x86 the boot process
      looks like this:
      
       tsc_start_freqency_validation(TSC);
       clocksource_register(HPET);
       clocksource_done_booting();
      	clocksource_select()
      		Selects HPET which is valid for high-res
      
       switch_to_highres();
      
       clocksource_register(TSC);
       	TSC is not selected, because it is not yet
      	flagged as VALID_HIGH_RES
      
       clocksource_watchdog()
      	Validates TSC for highres, but that does not make TSC
      	the current clocksource.
      
      Before the sanity check was added, we installed TSC unvalidated which
      worked most of the time. If the TSC was really detected as unstable,
      then the unstable logic removed it and installed HPET again.
      
      The sanity check is correct and needed. So the watchdog needs to kick
      a reselection of the clocksource, when it qualifies TSC as a valid
      high res clocksource.
      
      To solve this, we mark the clocksource which got the flag
      CLOCK_SOURCE_VALID_FOR_HRES set by the watchdog with an new flag
      CLOCK_SOURCE_RESELECT and trigger the watchdog thread. The watchdog
      thread evaluates the flag and invokes clocksource_select() when set.
      
      To avoid that the clocksource_done_booting() code, which is about to
      install the first real clocksource anyway, needs to go through
      clocksource_select and tick_oneshot_notify() pointlessly, split out
      the clocksource_watchdog_kthread() list walk code and invoke the
      select/notify only when called from clocksource_watchdog_kthread().
      
      So clocksource_done_booting() can utilize the same splitout code
      without the select/notify invocation and the clocksource_mutex
      unlock/relock dance.
      Reported-and-tested-by: NAlex Shi <alex.shi@intel.com>
      Cc: Hans Peter Anvin <hpa@linux.intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <andi.kleen@intel.com>
      Tested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307042239150.11637@ionos.tec.linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      332962f2