提交 · f63674fd0d6afa1ba24309aee1f8c60195d39041 · openanolis / cloud-kernel

14 6月, 2013 11 次提交

cgroup: update sane_behavior documentation · f63674fd

由 Tejun Heo 提交于 6月 13, 2013

f12dc020 ("cgroup: mark "tasks" cgroup file as insane") and
cc5943a7 ("cgroup: mark "notify_on_release" and "release_agent"
cgroup files insane") forgot to update the changed behavior
documentation in cgroup.h.  Update it.
Signed-off-by: NTejun Heo <tj@kernel.org>

f63674fd

cgroup: use percpu refcnt for cgroup_subsys_states · d3daf28d

由 Tejun Heo 提交于 6月 13, 2013

A css (cgroup_subsys_state) is how each cgroup is represented to a
controller.  As such, it can be used in hot paths across the various
subsystems different controllers are associated with.

One of the common operations is reference counting, which up until now
has been implemented using a global atomic counter and can have
significant adverse impact on scalability.  For example, css refcnt
can be gotten and put multiple times by blkcg for each IO request.
For highops configurations which try to do as much per-cpu as
possible, the global frequent refcnting can be very expensive.

In general, given the various and hugely diverse paths css's end up
being used from, we need to make it cheap and highly scalable.  In its
usage, css refcnting isn't very different from module refcnting.

This patch converts css refcnting to use the recently added
percpu_ref.  css_get/tryget/put() directly maps to the matching
percpu_ref operations and the deactivation logic is no longer
necessary as percpu_ref already has refcnt killing.

The only complication is that as the refcnt is per-cpu,
percpu_ref_kill() in itself doesn't ensure that further tryget
operations will fail, which we need to guarantee before invoking
->css_offline()'s.  This is resolved collecting kill confirmation
using percpu_ref_kill_and_confirm() and initiating the offline phase
of destruction after all css refcnt's are confirmed to be seen as
killed on all CPUs.  The previous patches already splitted destruction
into two phases, so percpu_ref_kill_and_confirm() can be hooked up
easily.

This patch removes css_refcnt() which is used for rcu dereference
sanity check in css_id().  While we can add a percpu refcnt API to ask
the same question, css_id() itself is scheduled to be removed fairly
soon, so let's not bother with it.  Just drop the sanity check and use
rcu_dereference_raw() instead.

v2: - init_cgroup_css() was calling percpu_ref_init() without checking
      the return value.  This causes two problems - the obvious lack
      of error handling and percpu_ref_init() being called from
      cgroup_init_subsys() before the allocators are up, which
      triggers warnings but doesn't cause actual problems as the
      refcnt isn't used for roots anyway.  Fix both by moving
      percpu_ref_init() to cgroup_create().

    - The base references were put too early by
      percpu_ref_kill_and_confirm() and cgroup_offline_fn() put the
      refs one extra time.  This wasn't noticeable because css's go
      through another RCU grace period before being freed.  Update
      cgroup_destroy_locked() to grab an extra reference before
      killing the refcnts.  This problem was noticed by Kent.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NKent Overstreet <koverstreet@google.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Alasdair G. Kergon" <agk@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: Glauber Costa <glommer@gmail.com>

d3daf28d

cgroup: split cgroup destruction into two steps · ea15f8cc

由 Tejun Heo 提交于 6月 13, 2013

Split cgroup_destroy_locked() into two steps and put the latter half
into cgroup_offline_fn() which is executed from a work item.  The
latter half is responsible for offlining the css's, removing the
cgroup from internal lists, and propagating release notification to
the parent.  The separation is to allow using percpu refcnt for css.

Note that this allows for other cgroup operations to happen between
the first and second halves of destruction, including creating a new
cgroup with the same name.  As the target cgroup is marked DEAD in the
first half and cgroup internals don't care about the names of cgroups,
this should be fine.  A comment explaining this will be added by the
next patch which implements the actual percpu refcnting.

As RCU freeing is guaranteed to happen after the second step of
destruction, we can use the same work item for both.  This patch
renames cgroup->free_work to ->destroy_work and uses it for both
purposes.  INIT_WORK() is now performed right before queueing the work
item.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

ea15f8cc

percpu-refcount: implement percpu_tryget() along with percpu_ref_kill_and_confirm() · dbece3a0

由 Tejun Heo 提交于 6月 13, 2013

Implement percpu_tryget() which stops giving out references once the
percpu_ref is visible as killed.  Because the refcnt is per-cpu,
different CPUs will start to see a refcnt as killed at different
points in time and tryget() may continue to succeed on subset of cpus
for a while after percpu_ref_kill() returns.

For use cases where it's necessary to know when all CPUs start to see
the refcnt as dead, percpu_ref_kill_and_confirm() is added.  The new
function takes an extra argument @confirm_kill which is invoked when
the refcnt is guaranteed to be viewed as killed on all CPUs.

While this isn't the prettiest interface, it doesn't force synchronous
wait and is much safer than requiring the caller to do its own
call_rcu().

v2: Patch description rephrased to emphasize that tryget() may
    continue to succeed on some CPUs after kill() returns as suggested
    by Kent.

v3: Function comment in percpu_ref_kill_and_confirm() updated warning
    people to not depend on the implied RCU grace period from the
    confirm callback as it's an implementation detail.
Signed-off-by: NTejun Heo <tj@kernel.org>
Slightly-Grumpily-Acked-by: NKent Overstreet <koverstreet@google.com>

dbece3a0

percpu-refcount: implement percpu_ref_cancel_init() · bc497bd3

由 Tejun Heo 提交于 6月 12, 2013

Normally, percpu_ref_init() initializes and percpu_ref_kill()
initiates destruction which completes asynchronously.  The
asynchronous destruction can be problematic in init failure path where
the caller wants to destroy half-constructed object - distinguishing
half-constructed objects from the usual release method can be painful
for complex objects.

This patch implements percpu_ref_cancel_init() which synchronously
destroys the percpu_ref without invoking release.  To avoid
unintentional misuses, the function requires the ref to have finished
percpu_ref_init() but never used and triggers WARN otherwise.

v2: Explain the weird name and usage restriction in the function
    comment.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NKent Overstreet <koverstreet@google.com>

bc497bd3

percpu-refcount: add __must_check to percpu_ref_init() and don't use... · acac7883

由 Tejun Heo 提交于 6月 12, 2013

percpu-refcount: add __must_check to percpu_ref_init() and don't use ACCESS_ONCE() in percpu_ref_kill_rcu()

Two small changes.

* Unlike most init functions, percpu_ref_init() allocates memory and
  may fail.  Let's mark it with __must_check in case the caller
  forgets.

* percpu_ref_kill_rcu() is unnecessarily using ACCESS_ONCE() to
  dereference @ref->pcpu_count, which can be misleading.  The pointer
  is guaranteed to be valid and visible and can't change underneath
  the function.  Drop ACCESS_ONCE().
Signed-off-by: NTejun Heo <tj@kernel.org>

acac7883

cgroup: remove cgroup->count and use · 6f3d828f

由 Tejun Heo 提交于 6月 12, 2013

cgroup->count tracks the number of css_sets associated with the cgroup
and used only to verify that no css_set is associated when the cgroup
is being destroyed.  It's superflous as the destruction path can
simply check whether cgroup->cset_links is empty instead.

Drop cgroup->count and check ->cset_links directly from
cgroup_destroy_locked().
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

6f3d828f

cgroup: rename CGRP_REMOVED to CGRP_DEAD · 54766d4a

由 Tejun Heo 提交于 6月 12, 2013

We will add another flag indicating that the cgroup is in the process
of being killed.  REMOVING / REMOVED is more difficult to distinguish
and cgroup_is_removing()/cgroup_is_removed() are a bit awkward.  Also,
later percpu_ref usage will involve "kill"ing the refcnt.

 s/CGRP_REMOVED/CGRP_DEAD/
 s/cgroup_is_removed()/cgroup_is_dead()

This patch is purely cosmetic.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

54766d4a

cgroup: clean up css_[try]get() and css_put() · 5de0107e

由 Tejun Heo 提交于 6月 12, 2013

* __css_get() isn't used by anyone.  Fold it into css_get().

* Add proper function comments to all css reference functions.

This patch is purely cosmetic.

v2: Typo fix as per Li.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

5de0107e

cgroup: bring some sanity to naming around cg_cgroup_link · 69d0206c

由 Tejun Heo 提交于 6月 12, 2013

cgroups and css_sets are mapped M:N and this M:N mapping is
represented by struct cg_cgroup_link which forms linked lists on both
sides.  The naming around this mapping is already confusing and struct
cg_cgroup_link exacerbates the situation quite a bit.

>From cgroup side, it starts off ->css_sets and runs through
->cgrp_link_list.  From css_set side, it starts off ->cg_links and
runs through ->cg_link_list.  This is rather reversed as
cgrp_link_list is used to iterate css_sets and cg_link_list cgroups.
Also, this is the only place which is still using the confusing "cg"
for css_sets.  This patch cleans it up a bit.

* s/cgroup->css_sets/cgroup->cset_links/
  s/css_set->cg_links/css_set->cgrp_links/
  s/cgroup_iter->cg_link/cgroup_iter->cset_link/

* s/cg_cgroup_link/cgrp_cset_link/

* s/cgrp_cset_link->cg/cgrp_cset_link->cset/
  s/cgrp_cset_link->cgrp_link_list/cgrp_cset_link->cset_link/
  s/cgrp_cset_link->cg_link_list/cgrp_cset_link->cgrp_link/

* s/init_css_set_link/init_cgrp_cset_link/
  s/free_cg_links/free_cgrp_cset_links/
  s/allocate_cg_links/allocate_cgrp_cset_links/

* s/cgl[12]/link[12]/ in compare_css_sets()

* s/saved_link/tmp_link/ s/tmp/tmp_links/ and a couple similar
  adustments.

* Comment and whiteline adjustments.

After the changes, we have

	list_for_each_entry(link, &cont->cset_links, cset_link) {
		struct css_set *cset = link->cset;

instead of

	list_for_each_entry(link, &cont->css_sets, cgrp_link_list) {
		struct css_set *cset = link->cg;

This patch is purely cosmetic.

v2: Fix broken sentences in the patch description.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

69d0206c

T
cgroup: remove now unused css_depth() · 3fc3db9a
由 Tejun Heo 提交于 6月 12, 2013
```
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
```
3fc3db9a

13 6月, 2013 2 次提交

percpu-refcount: cosmetic updates · ac899061

由 Tejun Heo 提交于 6月 12, 2013

* s/percpu_ref_release/percpu_ref_func_t/ as it's customary to have _t
  postfix for types and the type is gonna be used for a different type
  of callback too.

* Add @ARG to function comments.

* Drop unnecessary and unaligned indentation from percpu_ref_init()
  function comment.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NKent Overstreet <koverstreet@google.com>

ac899061

percpu-refcount: consistently use plain (non-sched) RCU · 6a24474d

由 Tejun Heo 提交于 6月 12, 2013

percpu_ref_get/put() are using preempt_disable/enable() while
percpu_ref_kill() is using plain call_rcu() instead of
call_rcu_sched().  This is buggy as grace periods of the two may not
match.  Fix it by using plain RCU in percpu_ref_get/put().

(I suggested using sched RCU in the first place but there's no actual
 benefit in doing so unless we're gonna introduce different variants
 of get/put to be called while preemption is alredy disabled, which we
 definitely shouldn't.)
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NKent Overstreet <koverstreet@google.com>

6a24474d

04 6月, 2013 1 次提交

percpu: implement generic percpu refcounting · 215e262f

由 Kent Overstreet 提交于 5月 31, 2013

This implements a refcount with similar semantics to
atomic_get()/atomic_dec_and_test() - but percpu.

It also implements two stage shutdown, as we need it to tear down the
percpu counts.  Before dropping the initial refcount, you must call
percpu_ref_kill(); this puts the refcount in "shutting down mode" and
switches back to a single atomic refcount with the appropriate
barriers (synchronize_rcu()).

It's also legal to call percpu_ref_kill() multiple times - it only
returns true once, so callers don't have to reimplement shutdown
synchronization.

[akpm@linux-foundation.org: fix build]
[akpm@linux-foundation.org: coding-style tweak]
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NTejun Heo <tj@kernel.org>

215e262f

31 5月, 2013 1 次提交

aerdrv: Move cper_print_aer() call out of interrupt context · 37448adf

由 Lance Ortiz 提交于 5月 30, 2013

The following warning was seen on 3.9 when a corrected PCIe error was being
handled by the AER subsystem.

WARNING: at .../drivers/pci/search.c:214 pci_get_dev_by_id+0x8a/0x90()

This occurred because a call to pci_get_domain_bus_and_slot() was added to
cper_print_pcie() to setup for the call to cper_print_aer().  The warning
showed up because cper_print_pcie() is called in an interrupt context and
pci_get* functions are not supposed to be called in that context.

The solution is to move the cper_print_aer() call out of the interrupt
context and into aer_recover_work_func() to avoid any warnings when calling
pci_get* functions.
Signed-off-by: NLance Ortiz <lance.ortiz@hp.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

37448adf

25 5月, 2013 4 次提交

linux/kernel.h: fix kernel-doc warning · 7450231f

由 Randy Dunlap 提交于 5月 24, 2013

Fix kernel-doc warning in <linux/kernel.h>:

  Warning(include/linux/kernel.h:590): No description found for parameter 'ip'

scripts/kernel-doc cannot handle macros, functions, or function
prototypes between the function or macro that is being documented and
its definition, so move these prototypes above the function that is
being documented.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7450231f

wait: fix false timeouts when using wait_event_timeout() · 4c663cfc

由 Imre Deak 提交于 5月 24, 2013

Many callers of the wait_event_timeout() and
wait_event_interruptible_timeout() expect that the return value will be
positive if the specified condition becomes true before the timeout
elapses.  However, at the moment this isn't guaranteed.  If the wake-up
handler is delayed enough, the time remaining until timeout will be
calculated as 0 - and passed back as a return value - even if the
condition became true before the timeout has passed.

Fix this by returning at least 1 if the condition becomes true.  This
semantic is in line with what wait_for_condition_timeout() does; see
commit bb10ed09 ("sched: fix wait_for_completion_timeout() spurious
failure under heavy load").

Daniel said "We have 3 instances of this bug in drm/i915.  One case even
where we switch between the interruptible and not interruptible
wait_event_timeout variants, foolishly presuming they have the same
semantics.  I very much like this."

One such bug is reported at
  https://bugs.freedesktop.org/show_bug.cgi?id=64133Signed-off-by: NImre Deak <imre.deak@intel.com>
Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NJens Axboe <axboe@kernel.dk>
Cc: "Paul E.  McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4c663cfc

rapidio: add enumeration/discovery start from user space · bc8fcfea

由 Alexandre Bounine 提交于 5月 24, 2013

Add RapidIO enumeration/discovery start from user space.  User space
start allows to defer RapidIO fabric scan until the moment when all
participating endpoints are initialized avoiding mandatory synchronized
start of all endpoints (which may be challenging in systems with large
number of RapidIO endpoints).
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bc8fcfea

rapidio: make enumeration/discovery configurable · a11650e1

由 Alexandre Bounine 提交于 5月 24, 2013

Systems that use RapidIO fabric may need to implement their own
enumeration and discovery methods which are better suitable for needs of
a target application.

The following set of patches is intended to simplify process of
introduction of new RapidIO fabric enumeration/discovery methods.

The first patch offers ability to add new RapidIO enumeration/discovery
methods using kernel configuration options.  This new configuration
option mechanism allows to select statically linked or modular
enumeration/discovery method(s) from the list of existing methods or use
external module(s).

This patch also updates the currently existing enumeration/discovery
code to be used as a statically linked or modular method.

The corresponding configuration option is named "Basic
enumeration/discovery" method.  This is the only one configuration
option available today but new methods are expected to be introduced
after adoption of provided patches.

The second patch address a long time complaint of RapidIO subsystem
users regarding fabric enumeration/discovery start sequence.  Existing
implementation offers only a boot-time enumeration/discovery start which
requires synchronized boot of all endpoints in RapidIO network.  While
it works for small closed configurations with limited number of
endpoints, using this approach in systems with large number of endpoints
is quite challenging.

To eliminate requirement for synchronized start the second patch
introduces RapidIO enumeration/discovery start from user space.

For compatibility with the existing RapidIO subsystem implementation,
automatic boot time enumeration/discovery start can be configured in by
specifying "rio-scan.scan=1" command line parameter if statically linked
basic enumeration method is selected.

This patch:

Rework to implement RapidIO enumeration/discovery method selection
combined with ability to use enumeration/discovery as a kernel module.

This patch adds ability to introduce new RapidIO enumeration/discovery
methods using kernel configuration options.  Configuration option
mechanism allows to select statically linked or modular
enumeration/discovery method from the list of existing methods or use
external modules.  If a modular enumeration/discovery is selected each
RapidIO mport device can have its own method attached to it.

The existing enumeration/discovery code was updated to be used as
statically linked or modular method.  This configuration option is named
"Basic enumeration/discovery" method.

Several common routines have been moved from rio-scan.c to make them
available to other enumeration methods and reduce number of exported
symbols.
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a11650e1

24 5月, 2013 4 次提交

cgroup: update iterators to use cgroup_next_sibling() · 75501a6d

由 Tejun Heo 提交于 5月 24, 2013

This patch converts cgroup_for_each_child(),
cgroup_next_descendant_pre/post() and thus
cgroup_for_each_descendant_pre/post() to use cgroup_next_sibling()
instead of manually dereferencing ->sibling.next.

The only reason the iterators couldn't allow dropping RCU read lock
while iteration is in progress was because they couldn't determine the
next sibling safely once RCU read lock is dropped.  Using
cgroup_next_sibling() removes that problem and enables all iterators
to allow dropping RCU read lock in the middle.  Comments are updated
accordingly.

This makes the iterators easier to use and will simplify controllers.

Note that @cgroup argument is renamed to @cgrp in
cgroup_for_each_child() because it conflicts with "struct cgroup" used
in the new macro body.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>

75501a6d

cgroup: add cgroup->serial_nr and implement cgroup_next_sibling() · 53fa5261

由 Tejun Heo 提交于 5月 24, 2013

Currently, there's no easy way to find out the next sibling cgroup
unless it's known that the current cgroup is accessed from the
parent's children list in a single RCU critical section.  This in turn
forces all iterators to require whole iteration to be enclosed in a
single RCU critical section, which sometimes is too restrictive.  This
patch implements cgroup_next_sibling() which can reliably determine
the next sibling regardless of the state of the current cgroup as long
as it's accessible.

It currently is impossible to determine the next sibling after
dropping RCU read lock because the cgroup being iterated could be
removed anytime and if RCU read lock is dropped, nothing guarantess
its ->sibling.next pointer is accessible.  A removed cgroup would
continue to point to its next sibling for RCU accesses but stop
receiving updates from the sibling.  IOW, the next sibling could be
removed and then complete its grace period while RCU read lock is
dropped, making it unsafe to dereference ->sibling.next after dropping
and re-acquiring RCU read lock.

This can be solved by adding a way to traverse to the next sibling
without dereferencing ->sibling.next.  This patch adds a monotonically
increasing cgroup serial number, cgroup->serial_nr, which guarantees
that all cgroup->children lists are kept in increasing serial_nr
order.  A new function, cgroup_next_sibling(), is implemented, which,
if CGRP_REMOVED is not set on the current cgroup, follows
->sibling.next; otherwise, traverses the parent's ->children list
until it sees a sibling with higher ->serial_nr.

This allows the function to always return the next sibling regardless
of the state of the current cgroup without adding overhead in the fast
path.

Further patches will update the iterators to use cgroup_next_sibling()
so that they allow dropping RCU read lock and blocking while iteration
is in progress which in turn will be used to simplify controllers.

v2: Typo fix as per Serge.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>

53fa5261

cgroup: make cgroup_is_removed() static · bdc7119f

由 Tejun Heo 提交于 5月 24, 2013

cgroup_is_removed() no longer has external users and it shouldn't grow
any - controllers should deal with cgroup_subsys_state on/offline
state instead of cgroup removal state.  Make it static.

While at it, make it return bool.
Signed-off-by: NTejun Heo <tj@kernel.org>

bdc7119f

cgroup: fix a subtle bug in descendant pre-order walk · 7805d000

由 Tejun Heo 提交于 5月 24, 2013

When cgroup_next_descendant_pre() initiates a walk, it checks whether
the subtree root doesn't have any children and if not returns NULL.
Later code assumes that the subtree isn't empty.  This is broken
because the subtree may become empty inbetween, which can lead to the
traversal escaping the subtree by walking to the sibling of the
subtree root.

There's no reason to have the early exit path.  Remove it along with
the later assumption that the subtree isn't empty.  This simplifies
the code a bit and fixes the subtle bug.

While at it, fix the comment of cgroup_for_each_descendant_pre() which
was incorrectly referring to ->css_offline() instead of
->css_online().
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: stable@vger.kernel.org

7805d000

22 5月, 2013 1 次提交

Add include dependencies to <linux/printk.h>. · 154c2670

由 Ralf Baechle 提交于 5月 21, 2013

If <linux/linkage.h> has not been included before <linux/printk.h>,
a build error like the below one will result:

CC arch/mips/kernel/idle.o
In file included from arch/mips/kernel/idle.c:17:0:
include/linux/printk.h:109:1: error: data definition has no type or storage class [-Werror]
include/linux/printk.h:109:1: error: type defaults to ‘int’ in declaration of ‘asmlinkage’ [-Werror=implicit-int]
include/linux/printk.h:110:1: error: ‘format’ attribute only applies to function types [-Werror=attributes]
include/linux/printk.h:110:1: error: expected ‘,’ or ‘;’ before ‘int’
include/linux/printk.h:114:1: error: data definition has no type or storage class [-Werror]
include/linux/printk.h:114:1: error: type defaults to ‘int’ in declaration of ‘asmlinkage’ [-Werror=implicit-int]
include/linux/printk.h:115:1: error: ‘format’ attribute only applies to function types [-Werror=attributes]
include/linux/printk.h:115:1: error: expected ‘,’ or ‘;’ before ‘int’
include/linux/printk.h:117:1: error: data definition has no type or storage class [-Werror]
include/linux/printk.h:117:1: error: type defaults to ‘int’ in declaration of ‘asmlinkage’ [-Werror=implicit-int]
include/linux/printk.h:118:1: error: ‘format’ attribute only applies to function types [-Werror=attributes]
include/linux/printk.h:118:1: error: ‘__cold__’ attribute ignored [-Werror=attributes]
include/linux/printk.h:118:1: error: expected ‘,’ or ‘;’ before ‘asmlinkage’
include/linux/printk.h:122:1: error: data definition has no type or storage class [-Werror]
include/linux/printk.h:122:1: error: type defaults to ‘int’ in declaration of ‘asmlinkage’ [-Werror=implicit-int]
include/linux/printk.h:123:1: error: ‘format’ attribute only applies to function types [-Werror=attributes]
include/linux/printk.h:123:1: error: ‘__cold__’ attribute ignored [-Werror=attributes]
include/linux/printk.h:123:1: error: expected ‘,’ or ‘;’ before ‘int’
In file included from include/linux/kernel.h:14:0,
from include/linux/sched.h:15,
from arch/mips/kernel/idle.c:18:
include/linux/dynamic_debug.h: In function ‘ddebug_dyndbg_module_param_cb’:
include/linux/dynamic_debug.h:124:3: error: implicit declaration of function ‘printk’ [-Werror=implicit-function-declaration]

Fixed by including <linux/linkage.h>.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

154c2670

21 5月, 2013 1 次提交

tty/vt: Fix vc_deallocate() lock order · 421b40a6

由 Peter Hurley 提交于 5月 17, 2013

Now that the tty port owns the flip buffers and i/o is allowed
from the driver even when no tty is attached, the destruction
of the tty port (and the flip buffers) must ensure that no
outstanding work is pending.

Unfortunately, this creates a lock order problem with the
console_lock (see attached lockdep report [1] below).

For single console deallocation, drop the console_lock prior
to port destruction. When multiple console deallocation,
defer port destruction until the consoles have been
deallocated.

tty_port_destroy() is not required if the port has not
been used; remove from vc_allocate() failure path.

[1] lockdep report from Dave Jones <davej@redhat.com>

 ======================================================
 [ INFO: possible circular locking dependency detected ]
 3.9.0+ #16 Not tainted
 -------------------------------------------------------
 (agetty)/26163 is trying to acquire lock:
 blocked:  ((&buf->work)){+.+...}, instance: ffff88011c8b0020, at: [<ffffffff81062065>] flush_work+0x5/0x2e0

 but task is already holding lock:
 blocked:  (console_lock){+.+.+.}, instance: ffffffff81c2fde0, at: [<ffffffff813bc201>] vt_ioctl+0xb61/0x1230

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (console_lock){+.+.+.}:
        [<ffffffff810b3f74>] lock_acquire+0xa4/0x210
        [<ffffffff810416c7>] console_lock+0x77/0x80
        [<ffffffff813c3dcd>] con_flush_chars+0x2d/0x50
        [<ffffffff813b32b2>] n_tty_receive_buf+0x122/0x14d0
        [<ffffffff813b7709>] flush_to_ldisc+0x119/0x170
        [<ffffffff81064381>] process_one_work+0x211/0x700
        [<ffffffff8106498b>] worker_thread+0x11b/0x3a0
        [<ffffffff8106ce5d>] kthread+0xed/0x100
        [<ffffffff81601cac>] ret_from_fork+0x7c/0xb0

 -> #0 ((&buf->work)){+.+...}:
        [<ffffffff810b349a>] __lock_acquire+0x193a/0x1c00
        [<ffffffff810b3f74>] lock_acquire+0xa4/0x210
        [<ffffffff810620ae>] flush_work+0x4e/0x2e0
        [<ffffffff81065305>] __cancel_work_timer+0x95/0x130
        [<ffffffff810653b0>] cancel_work_sync+0x10/0x20
        [<ffffffff813b8212>] tty_port_destroy+0x12/0x20
        [<ffffffff813c65e8>] vc_deallocate+0xf8/0x110
        [<ffffffff813bc20c>] vt_ioctl+0xb6c/0x1230
        [<ffffffff813b01a5>] tty_ioctl+0x285/0xd50
        [<ffffffff811ba825>] do_vfs_ioctl+0x305/0x530
        [<ffffffff811baad1>] sys_ioctl+0x81/0xa0
        [<ffffffff81601d59>] system_call_fastpath+0x16/0x1b

 other info that might help us debug this:

 [ 6760.076175]  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(console_lock);
                                lock((&buf->work));
                                lock(console_lock);
   lock((&buf->work));

  *** DEADLOCK ***

 1 lock on stack by (agetty)/26163:
  #0: blocked:  (console_lock){+.+.+.}, instance: ffffffff81c2fde0, at: [<ffffffff813bc201>] vt_ioctl+0xb61/0x1230
 stack backtrace:
 Pid: 26163, comm: (agetty) Not tainted 3.9.0+ #16
 Call Trace:
  [<ffffffff815edb14>] print_circular_bug+0x200/0x20e
  [<ffffffff810b349a>] __lock_acquire+0x193a/0x1c00
  [<ffffffff8100a269>] ? sched_clock+0x9/0x10
  [<ffffffff8100a269>] ? sched_clock+0x9/0x10
  [<ffffffff8100a200>] ? native_sched_clock+0x20/0x80
  [<ffffffff810b3f74>] lock_acquire+0xa4/0x210
  [<ffffffff81062065>] ? flush_work+0x5/0x2e0
  [<ffffffff810620ae>] flush_work+0x4e/0x2e0
  [<ffffffff81062065>] ? flush_work+0x5/0x2e0
  [<ffffffff810b15db>] ? mark_held_locks+0xbb/0x140
  [<ffffffff8113c8a3>] ? __free_pages_ok.part.57+0x93/0xc0
  [<ffffffff810b15db>] ? mark_held_locks+0xbb/0x140
  [<ffffffff810652f2>] ? __cancel_work_timer+0x82/0x130
  [<ffffffff81065305>] __cancel_work_timer+0x95/0x130
  [<ffffffff810653b0>] cancel_work_sync+0x10/0x20
  [<ffffffff813b8212>] tty_port_destroy+0x12/0x20
  [<ffffffff813c65e8>] vc_deallocate+0xf8/0x110
  [<ffffffff813bc20c>] vt_ioctl+0xb6c/0x1230
  [<ffffffff810aec41>] ? lock_release_holdtime.part.30+0xa1/0x170
  [<ffffffff813b01a5>] tty_ioctl+0x285/0xd50
  [<ffffffff812b00f6>] ? inode_has_perm.isra.46.constprop.61+0x56/0x80
  [<ffffffff811ba825>] do_vfs_ioctl+0x305/0x530
  [<ffffffff812b04db>] ? selinux_file_ioctl+0x5b/0x110
  [<ffffffff811baad1>] sys_ioctl+0x81/0xa0
  [<ffffffff81601d59>] system_call_fastpath+0x16/0x1b

Cc: Dave Jones <davej@redhat.com>
Signed-off-by: NPeter Hurley <peter@hurleysoftware.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

421b40a6

20 5月, 2013 2 次提交

Hoist memcpy_fromiovec/memcpy_toiovec into lib/ · d2f83e90

由 Rusty Russell 提交于 5月 17, 2013

ERROR: "memcpy_fromiovec" [drivers/vhost/vhost_scsi.ko] undefined!

That function is only present with CONFIG_NET.  Turns out that
crypto/algif_skcipher.c also uses that outside net, but it actually
needs sockets anyway.

In addition, commit 6d4f0139 added
CONFIG_NET dependency to CONFIG_VMCI for memcpy_toiovec, so hoist
that function and revert that commit too.

socket.h already includes uio.h, so no callers need updating; trying
only broke things fo x86_64 randconfig (thanks Fengguang!).
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

d2f83e90

SERIAL: OMAP: Remove the slave idle handling from the driver · 7f18d05a

由 Santosh Shilimkar 提交于 5月 15, 2013

UART IP slave idle handling now taken care by runtime pm backend(hwmod layer)
so remove the hackery from the driver.

As discussed on the list, in future if dma mode needs to be brought
back to this driver, UART sysc handling needs to be updated in
framework such a way that no-idle/force idle profile can be supported.
Given the broken dma mode for OMAP uarts, its very unlikely.
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Tested-by: NVaibhav Bedia <vaibhav.bedia@ti.com>
Tested-by: NSourav Poddar <sourav.poddar@ti.com>
Signed-off-by: NRajendra nayak <rnayak@ti.com>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: NKevin Hilman <khilman@linaro.org>
Tested-by: Kevin Hilman <khilman@linaro.org>  # OMAP4/Panda
Signed-off-by: NPaul Walmsley <paul@pwsan.com>

7f18d05a

18 5月, 2013 2 次提交

PCI: acpiphp: Re-enumerate devices when host bridge receives Bus Check · 3f327e39

由 Yinghai Lu 提交于 5月 07, 2013

When a PCI host bridge device receives a Bus Check notification, we
must re-enumerate starting with the bridge to discover changes (devices
that have been added or removed).

Prior to 668192b6 ("PCI: acpiphp: Move host bridge hotplug to
pci_root.c"), this happened in _handle_hotplug_event_bridge().  After that
commit, _handle_hotplug_event_bridge() is not installed for host bridges,
and the host bridge notify handler, _handle_hotplug_event_root() did not
re-enumerate.

This patch adds re-enumeration to _handle_hotplug_event_root().

This fixes cases where we don't notice the addition or removal of
PCI devices, e.g., the PCI-to-USB ExpressCard in the bugzilla below.

[bhelgaas: changelog, references]
Reference: https://lkml.kernel.org/r/CAAh6nkmbKR3HTqm5ommevsBwhL_u0N8Rk7Wsms_LfP=nBgKNew@mail.gmail.com
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=57961Reported-by: NGavin Guo <tuffkidtt@gmail.com>
Tested-by: NGavin Guo <tuffkidtt@gmail.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Reviewed-by: NJiang Liu <jiang.liu@huawei.com>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
CC: stable@vger.kernel.org	# v3.9+

3f327e39

bcma: add more core IDs · d4988d4c

由 Rafał Miłecki 提交于 5月 09, 2013

PCIe and ARM CR4 cores were found on 14e4:43b1 AKA BCM4352.
Reported-by: NGabriel Thörnblad <gabriel@thornblad.com>
Signed-off-by: NRafał Miłecki <zajec5@gmail.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

d4988d4c

17 5月, 2013 4 次提交

USB: serial: add generic wait_until_sent implementation · dcf01050

由 Johan Hovold 提交于 5月 08, 2013

Add generic wait_until_sent implementation which polls for empty
hardware buffers using the new port-operation tx_empty.

The generic implementation will be used for all sub-drivers that
implement tx_empty but does not define wait_until_sent.
Signed-off-by: NJohan Hovold <jhovold@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

dcf01050

USB: serial: add wait_until_sent operation · 0693196f

由 Johan Hovold 提交于 5月 05, 2013

Add wait_until_sent operation which can be used to wait for hardware
buffers to drain.
Signed-off-by: NJohan Hovold <jhovold@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

0693196f

mfd: ab8500-sysctrl: Always enable pm_power_off handler · 0b8ebdb1

由 Fabio Baltieri 提交于 5月 09, 2013

AB8500 sysctrl driver implements a pm_power_off handler, but that is
currently not registered until a specific platform data field is
enabled.

This patch drops the platform data field and always registers
ab8500_power_off if no other pm_power_off handler was defined before,
and also introduces the necessary cleanup code in the driver's remove
function.
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Signed-off-by: NFabio Baltieri <fabio.baltieri@linaro.org>
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

0b8ebdb1

bonding: allow TSO being set on bonding master · b0ce3508

由 Eric Dumazet 提交于 5月 16, 2013

In some situations, we need to disable TSO on bonding slaves.

bonding device automatically unset TSO in bond_fix_features(), and
performance is not good because :

1) We consume more cpu cycles.

2) GSO segmentation has some bugs leading to out of order TCP packets
if this segmentation is done before virtual device. This particular
problem will be addressed in a separate patch.

This patch allows TSO being set/unset on the bonding master,
so that GSO segmentation is done after bonding layer.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Michał Mirosław <mirqus@gmail.com>
Cc: Jay Vosburgh <fubar@us.ibm.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0ce3508

16 5月, 2013 1 次提交

broadcom: add include guards to include/linux/brcmphy.h · 755ccb9d

由 Florian Fainelli 提交于 5月 15, 2013

include/linux/brcmphy.h is currently not protected against double
inclusion, add ifdefs guard to fix that.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

755ccb9d

15 5月, 2013 5 次提交

Correct typo "supperspeed" to "superspeed". · de97f250

由 Robert P. J. Day 提交于 5月 02, 2013

Tidy up kernel-doc content for USB GADGET. No functional change.
Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: NFelipe Balbi <balbi@ti.com>

de97f250

target: close target_put_sess_cmd() vs. core_tmr_abort_task() race · ccf5ae83

由 Joern Engel 提交于 5月 13, 2013

It is possible for one thread to to take se_sess->sess_cmd_lock in
core_tmr_abort_task() before taking a reference count on
se_cmd->cmd_kref, while another thread in target_put_sess_cmd() drops
se_cmd->cmd_kref before taking se_sess->sess_cmd_lock.

This introduces kref_put_spinlock_irqsave() and uses it in
target_put_sess_cmd() to close the race window.
Signed-off-by: NJoern Engel <joern@logfs.org>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>

ccf5ae83

time: Revert ALWAYS_USE_PERSISTENT_CLOCK compile time optimizaitons · b4f711ee

由 John Stultz 提交于 4月 24, 2013

Kay Sievers noted that the ALWAYS_USE_PERSISTENT_CLOCK config,
which enables some minor compile time optimization to avoid
uncessary code in mostly the suspend/resume path could cause
problems for userland.

In particular, the dependency for RTC_HCTOSYS on
!ALWAYS_USE_PERSISTENT_CLOCK, which avoids setting the time
twice and simplifies suspend/resume, has the side effect
of causing the /sys/class/rtc/rtcN/hctosys flag to always be
zero, and this flag is commonly used by udev to setup the
/dev/rtc symlink to /dev/rtcN, which can cause pain for
older applications.

While the udev rules could use some work to be less fragile,
breaking userland should strongly be avoided. Additionally
the compile time optimizations are fairly minor, and the code
being optimized is likely to be reworked in the future, so
lets revert this change.
Reported-by: NKay Sievers <kay@vrfy.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: stable <stable@vger.kernel.org> #3.9
Cc: Feng Tang <feng.tang@intel.com>
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Link: http://lkml.kernel.org/r/1366828376-18124-1-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

b4f711ee

cgroup.h: remove some functions that are now gone · 23958e72

由 Greg KH 提交于 5月 03, 2013

cgroup_lock() and cgroup_unlock() are now no longer exported, so fix
cgroup.h to not declare them if CONFIG_CGROUPS is not enabled.
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

23958e72

cgroup: implement task_cgroup_path_from_hierarchy() · 857a2beb

由 Tejun Heo 提交于 4月 14, 2013

kdbus folks want a sane way to determine the cgroup path that a given
task belongs to on a given hierarchy, which is a reasonble thing to
expect from cgroup core.

Implement task_cgroup_path_from_hierarchy().

v2: Dropped unnecessary NULL check on the return value of
    task_cgroup_from_root() as suggested by Li Zefan.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NGreg Kroah-Hartman <greg@kroah.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Lennart Poettering <lennart@poettering.net>
Cc: Daniel Mack <daniel@zonque.org>

857a2beb

14 5月, 2013 1 次提交

pinctrl: generic: Fix typos and clarify comments · 63ad9cbf

由 Laurent Pinchart 提交于 4月 23, 2013

Drive strength controls both sink and source currents, clarify the
description accordingly.
Signed-off-by: NLaurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>

63ad9cbf

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功