提交 · b41686401e501430ffe93b575ef7959d2ecc6f2e · openeuler / raspberrypi-kernel

14 5月, 2014 2 次提交

cgroup: implement cftype->write() · b4168640

由 Tejun Heo 提交于 5月 13, 2014

During the recent conversion to kernfs, cftype's seq_file operations
are updated so that they are directly mapped to kernfs operations and
thus can fully access the associated kernfs and cgroup contexts;
however, write path hasn't seen similar updates and none of the
existing write operations has access to, for example, the associated
kernfs_open_file.

Let's introduce a new operation cftype->write() which maps directly to
the kernfs write operation and has access to all the arguments and
contexts.  This will replace ->write_string() and ->trigger() and ease
manipulation of kernfs active protection from cgroup file operations.

Two accessors - of_cft() and of_css() - are introduced to enable
accessing the associated cgroup context from cftype->write() which
only takes kernfs_open_file for the context information.  The
accessors for seq_file operations - seq_cft() and seq_css() - are
rewritten to wrap the of_ accessors.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

b4168640

cgroup: rename css_tryget*() to css_tryget_online*() · ec903c0c

由 Tejun Heo 提交于 5月 13, 2014

Unlike the more usual refcnting, what css_tryget() provides is the
distinction between online and offline csses instead of protection
against upping a refcnt which already reached zero.  cgroup is
planning to provide actual tryget which fails if the refcnt already
reached zero.  Let's rename the existing trygets so that they clearly
indicate that they're onliness.

I thought about keeping the existing names as-are and introducing new
names for the planned actual tryget; however, given that each
controller participates in the synchronization of the online state, it
seems worthwhile to make it explicit that these functions are about
on/offline state.

Rename css_tryget() to css_tryget_online() and css_tryget_from_dir()
to css_tryget_online_from_dir().  This is pure rename.

v2: cgroup_freezer grew new usages of css_tryget().  Update
    accordingly.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>

ec903c0c

13 5月, 2014 1 次提交

cgroup: introduce task_css_is_root() · 5024ae29

由 Tejun Heo 提交于 5月 07, 2014

Determining the css of a task usually requires RCU read lock as that's
the only thing which keeps the returned css accessible till its
reference is acquired; however, testing whether a task belongs to the
root can be performed without dereferencing the returned css by
comparing the returned pointer against the root one in init_css_set[]
which never changes.

Implement task_css_is_root() which can be invoked in any context.
This will be used by the scheduled cgroup_freezer change.

v2: cgroup no longer supports modular controllers.  No need to export
    init_css_set.  Pointed out by Li.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

5024ae29

10 5月, 2014 2 次提交

percpu-refcount: implement percpu_ref_tryget() · 4fb6e250

由 Tejun Heo 提交于 5月 09, 2014

Implement percpu_ref_tryget() which fails if the refcnt already
reached zero.  Note that this is different from the recently renamed
percpu_ref_tryget_live() which fails if the refcnt has been killed and
is draining the remaining references.  percpu_ref_tryget() succeeds on
a killed refcnt as long as its current refcnt is above zero.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NKent Overstreet <kmo@daterainc.com>

4fb6e250

percpu-refcount: rename percpu_ref_tryget() to percpu_ref_tryget_live() · 2070d50e

由 Tejun Heo 提交于 5月 09, 2014

percpu_ref_tryget() is different from the usual tryget semantics in
that it fails if the refcnt is in its dying stage even if the refcnt
hasn't reached zero yet.  We're about to introduce the more
conventional tryget and the current one has only one user.  Let's
rename it to percpu_ref_tryget_live() so that it explicitly signifies
the peculiarities of its semantics.

This is pure rename.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NKent Overstreet <kmo@daterainc.com>

2070d50e

07 5月, 2014 1 次提交

cgroup: remove unused CGRP_SANE_BEHAVIOR · 2b53f41f

由 Tejun Heo 提交于 5月 07, 2014

This cgroup flag has never been used.  Only CGRP_ROOT_SANE_BEHAVIOR is
used.  Remove it.
Signed-off-by: NTejun Heo <tj@kernel.org>

2b53f41f

05 5月, 2014 3 次提交

cgroup, memcg: implement css->id and convert css_from_id() to use it · 15a4c835

由 Tejun Heo 提交于 5月 04, 2014

Until now, cgroup->id has been used to identify all the associated
csses and css_from_id() takes cgroup ID and returns the matching css
by looking up the cgroup and then dereferencing the css associated
with it; however, now that the lifetimes of cgroup and css are
separate, this is incorrect and breaks on the unified hierarchy when a
controller is disabled and enabled back again before the previous
instance is released.

This patch adds css->id which is a subsystem-unique ID and converts
css_from_id() to look up by the new css->id instead.  memcg is the
only user of css_from_id() and also converted to use css->id instead.

For traditional hierarchies, this shouldn't make any functional
difference.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jianyu Zhan <nasa4836@gmail.com>
Acked-by: NLi Zefan <lizefan@huawei.com>

15a4c835

cgroup, memcg: allocate cgroup ID from 1 · 7d699ddb

由 Tejun Heo 提交于 5月 04, 2014

Currently, cgroup->id is allocated from 0, which is always assigned to
the root cgroup; unfortunately, memcg wants to use ID 0 to indicate
invalid IDs and ends up incrementing all IDs by one.

It's reasonable to reserve 0 for special purposes.  This patch updates
cgroup core so that ID 0 is not used and the root cgroups get ID 1.
The ID incrementing is removed form memcg.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

7d699ddb

cgroup: make flags and subsys_masks unsigned int · 69dfa00c

由 Tejun Heo 提交于 5月 04, 2014

There's no reason to use atomic bitops for cgroup_subsys_state->flags,
cgroup_root->flags and various subsys_masks.  This patch updates those
to use bitwise and/or operations instead and converts them form
unsigned long to unsigned int.

This makes the fields occupy (marginally) smaller space and makes it
clear that they don't require atomicity.

This patch doesn't cause any behavior difference.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

69dfa00c

26 4月, 2014 3 次提交

cgroup: implement cgroup.populated for the default hierarchy · 842b597e

由 Tejun Heo 提交于 4月 25, 2014

cgroup users often need a way to determine when a cgroup's
subhierarchy becomes empty so that it can be cleaned up.  cgroup
currently provides release_agent for it; unfortunately, this mechanism
is riddled with issues.

* It delivers events by forking and execing a userland binary
  specified as the release_agent.  This is a long deprecated method of
  notification delivery.  It's extremely heavy, slow and cumbersome to
  integrate with larger infrastructure.

* There is single monitoring point at the root.  There's no way to
  delegate management of a subtree.

* The event isn't recursive.  It triggers when a cgroup doesn't have
  any tasks or child cgroups.  Events for internal nodes trigger only
  after all children are removed.  This again makes it impossible to
  delegate management of a subtree.

* Events are filtered from the kernel side.  "notify_on_release" file
  is used to subscribe to or suppress release event.  This is
  unnecessarily complicated and probably done this way because event
  delivery itself was expensive.

This patch implements interface file "cgroup.populated" which can be
used to monitor whether the cgroup's subhierarchy has tasks in it or
not.  Its value is 0 if there is no task in the cgroup and its
descendants; otherwise, 1, and kernfs_notify() notificaiton is
triggers when the value changes, which can be monitored through poll
and [di]notify.

This is a lot ligther and simpler and trivially allows delegating
management of subhierarchy - subhierarchy monitoring can block further
propgation simply by putting itself or another process in the root of
the subhierarchy and monitor events that it's interested in from there
without interfering with monitoring higher in the tree.

v2: Patch description updated as per Serge.

v3: "cgroup.subtree_populated" renamed to "cgroup.populated".  The
    subtree_ prefix was a bit confusing because
    "cgroup.subtree_control" uses it to denote the tree rooted at the
    cgroup sans the cgroup itself while the populated state includes
    the cgroup itself.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NSerge Hallyn <serge.hallyn@ubuntu.com>
Acked-by: NLi Zefan <lizefan@huawei.com>
Cc: Lennart Poettering <lennart@poettering.net>

842b597e

kobject: Make support for uevent_helper optional. · 86d56134

由 Michael Marineau 提交于 4月 10, 2014

Support for uevent_helper, aka hotplug, is not required on many systems
these days but it can still be enabled via sysfs or sysctl.
Reported-by: NDarren Shepherd <darren.s.shepherd@gmail.com>
Signed-off-by: NMichael Marineau <mike@marineau.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

86d56134

kernfs: implement kernfs_root->supers list · 7d568a83

由 Tejun Heo 提交于 4月 09, 2014

Currently, there's no way to find out which super_blocks are
associated with a given kernfs_root.  Let's implement it - the planned
inotify extension to kernfs_notify() needs it.

Make kernfs_super_info point back to the super_block and chain it at
kernfs_root->supers.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7d568a83

23 4月, 2014 6 次提交

cgroup: implement dynamic subtree controller enable/disable on the default hierarchy · f8f22e53

由 Tejun Heo 提交于 4月 23, 2014

cgroup is switching away from multiple hierarchies and will use one
unified default hierarchy where controllers can be dynamically enabled
and disabled per subtree.  The default hierarchy will serve as the
unified hierarchy to which all controllers are attached and a css on
the default hierarchy would need to also serve the tasks of descendant
cgroups which don't have the controller enabled - ie. the tree may be
collapsed from leaf towards root when viewed from specific
controllers.  This has been implemented through effective css in the
previous patches.

This patch finally implements dynamic subtree controller
enable/disable on the default hierarchy via a new knob -
"cgroup.subtree_control" which controls which controllers are enabled
on the child cgroups.  Let's assume a hierarchy like the following.

  root - A - B - C
               \ D

root's "cgroup.subtree_control" determines which controllers are
enabled on A.  A's on B.  B's on C and D.  This coincides with the
fact that controllers on the immediate sub-level are used to
distribute the resources of the parent.  In fact, it's natural to
assume that resource control knobs of a child belong to its parent.
Enabling a controller in "cgroup.subtree_control" declares that
distribution of the respective resources of the cgroup will be
controlled.  Note that this means that controller enable states are
shared among siblings.

The default hierarchy has an extra restriction - only cgroups which
don't contain any task may have controllers enabled in
"cgroup.subtree_control".  Combined with the other properties of the
default hierarchy, this guarantees that, from the view point of
controllers, tasks are only on the leaf cgroups.  In other words, only
leaf csses may contain tasks.  This rules out situations where child
cgroups compete against internal tasks of the parent, which is a
competition between two different types of entities without any clear
way to determine resource distribution between the two.  Different
controllers handle it differently and all the implemented behaviors
are ambiguous, ad-hoc, cumbersome and/or just wrong.  Having this
structural constraints imposed from cgroup core removes the burden
from controller implementations and enables showing one consistent
behavior across all controllers.

When a controller is enabled or disabled, css associations for the
controller in the subtrees of each child should be updated.  After
enabling, the whole subtree of a child should point to the new css of
the child.  After disabling, the whole subtree of a child should point
to the cgroup's css.  This is implemented by first updating cgroup
states such that cgroup_e_css() result points to the appropriate css
and then invoking cgroup_update_dfl_csses() which migrates all tasks
in the affected subtrees to the self cgroup on the default hierarchy.

* When read, "cgroup.subtree_control" lists all the currently enabled
  controllers on the children of the cgroup.

* White-space separated list of controller names prefixed with either
  '+' or '-' can be written to "cgroup.subtree_control".  The ones
  prefixed with '+' are enabled on the controller and '-' disabled.

* A controller can be enabled iff the parent's
  "cgroup.subtree_control" enables it and disabled iff no child's
  "cgroup.subtree_control" has it enabled.

* If a cgroup has tasks, no controller can be enabled via
  "cgroup.subtree_control".  Likewise, if "cgroup.subtree_control" has
  some controllers enabled, tasks can't be migrated into the cgroup.

* All controllers which aren't bound on other hierarchies are
  automatically associated with the root cgroup of the default
  hierarchy.  All the controllers which are bound to the default
  hierarchy are listed in the read-only file "cgroup.controllers" in
  the root directory.

* "cgroup.controllers" in all non-root cgroups is read-only file whose
  content is equal to that of "cgroup.subtree_control" of the parent.
  This indicates which controllers can be used in the cgroup's
  "cgroup.subtree_control".

This is still experimental and there are some holes, one of which is
that ->can_attach() failure during cgroup_update_dfl_csses() may leave
the cgroups in an undefined state.  The issues will be addressed by
future patches.

v2: Non-root cgroups now also have "cgroup.controllers".
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

f8f22e53

cgroup: add css_set->dfl_cgrp · 6803c006

由 Tejun Heo 提交于 4月 23, 2014

To implement the unified hierarchy behavior, we'll need to be able to
determine the associated cgroup on the default hierarchy from css_set.
Let's add css_set->dfl_cgrp so that it can be accessed conveniently
and efficiently.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

6803c006

cgroup: teach css_task_iter about effective csses · 3ebb2b6e

由 Tejun Heo 提交于 4月 23, 2014

Currently, css_task_iter iterates tasks associated with a css by
visiting each css_set associated with the owning cgroup and walking
tasks of each of them.  This works fine for !unified hierarchies as
each cgroup has its own css for each associated subsystem on the
hierarchy; however, on the planned unified hierarchy, a cgroup may not
have csses associated and its tasks would be considered associated
with the matching css of the nearest ancestor which has the subsystem
enabled.

This means that on the default unified hierarchy, just walking all
tasks associated with a cgroup isn't enough to walk all tasks which
are associated with the specified css.  If any of its children doesn't
have the matching css enabled, task iteration should also include all
tasks from the subtree.  We already added cgroup->e_csets[] to list
all css_sets effectively associated with a given css and walk css_sets
on that list instead to achieve such iteration.

This patch updates css_task_iter iteration such that it walks css_sets
on cgroup->e_csets[] instead of cgroup->cset_links if iteration is
requested on an non-dummy css.  Thanks to the previous iteration
update, this change can be achieved with the addition of
css_task_iter->ss and minimal updates to css_advance_task_iter() and
css_task_iter_start().
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

3ebb2b6e

cgroup: reorganize css_task_iter · 0f0a2b4f

由 Tejun Heo 提交于 4月 23, 2014

This patch reorganizes css_task_iter so that adding effective css
support is easier.

* s/->cset_link/->cset_pos/ and s/->task/->task_pos/ for consistency

* ->origin_css is used to determine whether the iteration reached the
  last css_set.  Replace it with explicit ->cset_head so that
  css_advance_task_iter() doesn't have to know the termination
  condition directly.

* css_task_iter_next() currently assumes that it's walking list of
  cgrp_cset_link and reaches into the current cset through the current
  link to determine the termination conditions for task walking.  As
  this won't always be true for effective css walking, add
  ->tasks_head and ->mg_tasks_head and use them to control task
  walking so that css_task_iter_next() doesn't have to know how
  css_sets are being walked.

This patch doesn't make any behavior changes.  The iteration logic
stays unchanged after the patch.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

0f0a2b4f

cgroup: implement cgroup->e_csets[] · 2d8f243a

由 Tejun Heo 提交于 4月 23, 2014

On the default unified hierarchy, a cgroup may be associated with
csses of its ancestors, which means that a css of a given cgroup may
be associated with css_sets of descendant cgroups.  This means that we
can't walk all tasks associated with a css by iterating the css_sets
associated with the cgroup as there are css_sets which are pointing to
the css but linked on the descendants.

This patch adds per-subsystem list heads cgroup->e_csets[].  Any
css_set which is pointing to a css is linked to
css->cgroup->e_csets[$SUBSYS_ID] through
css_set->e_cset_node[$SUBSYS_ID].  The lists are protected by
css_set_rwsem and will allow us to walk all css_sets associated with a
given css so that we can find out all associated tasks.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

2d8f243a

cgroup: update cgroup->subsys_mask to ->child_subsys_mask and restore cgroup_root->subsys_mask · f392e51c

由 Tejun Heo 提交于 4月 23, 2014

94419627 ("cgroup: move ->subsys_mask from cgroupfs_root to
cgroup") moved ->subsys_mask from cgroup_root to cgroup to prepare for
the unified hierarhcy; however, it turns out that carrying the
subsys_mask of the children in the parent, instead of itself, is a lot
more natural.  This patch restores cgroup_root->subsys_mask and morphs
cgroup->subsys_mask into cgroup->child_subsys_mask.

* Uses of root->cgrp.subsys_mask are restored to root->subsys_mask.

* Remove automatic setting and clearing of cgrp->subsys_mask and
  instead just inherit ->child_subsys_mask from the parent during
  cgroup creation.  Note that this doesn't affect any current
  behaviors.

* Undo __kill_css() separation.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NLi Zefan <lizefan@huawei.com>

f392e51c

19 4月, 2014 2 次提交

wait: explain the shadowing and type inconsistencies · 8b32201d

由 Peter Zijlstra 提交于 4月 18, 2014

Stick in a comment before someone else tries to fix the sparse warning
this generates.
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-o2ro6f3vkxklni0bc8f7m68s@git.kernel.orgSigned-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8b32201d

Shiraz has moved · 9cc23682

由 Viresh Kumar 提交于 4月 18, 2014

shiraz.hashim@st.com email-id doesn't exist anymore as he has left the
company.  Replace ST's id with shiraz.linux.kernel@gmail.com.

It also updates .mailmap file to fix address for 'git shortlog'.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Cc: Shiraz Hashim <shiraz.linux.kernel@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9cc23682

18 4月, 2014 3 次提交

of: add empty of_find_node_by_path() for !OF · 20cd477c

由 Alexander Shiyan 提交于 4月 16, 2014

Add an empty version of of_find_node_by_path().
This fixes following build error for asoc tree:
sound/soc/fsl/fsl_ssi.c: In function 'fsl_ssi_probe':
sound/soc/fsl/fsl_ssi.c:1471:2: error: implicit declaration of function 'of_find_node_by_path' [-Werror=implicit-function-declaration]
sprop = of_get_property(of_find_node_by_path("/"), "compatible", NULL);
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAlexander Shiyan <shc_work@mail.ru>
Signed-off-by: NRob Herring <robh@kernel.org>

20cd477c

ipmi: boolify some things · 7aefac26

由 Corey Minyard 提交于 4月 14, 2014

Convert some ints to bools.
Signed-off-by: NCorey Minyard <cminyard@mvista.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7aefac26

ipmi: Turn off all activity on an idle ipmi interface · 89986496

由 Corey Minyard 提交于 4月 14, 2014

The IPMI driver would wake up periodically looking for events and
watchdog pretimeouts.  If there is nothing waiting for these events,
it's really kind of pointless to be checking for them.  So modify the
driver so the message handler can pass down if it needs the lower layer
to be waiting for these.  Modify the system interface lower layer to
turn off all timer and thread activity if the upper layer doesn't need
anything and it is not currently handling messages.  And modify the
message handler to not restart the timer if its timer is not needed.

The timers and kthread will still be enabled if:
 - the SI interface is handling a message.
 - a user has enabled watching for events.
 - the IPMI watchdog timer is in use (since it uses pretimeouts).
 - the message handler is waiting on a remote response.
 - a user has registered to receive commands.

This mostly affects interfaces without interrupts.  Interfaces with
interrupts already don't use CPU in the system interface when the
interface is idle.
Signed-off-by: NCorey Minyard <cminyard@mvista.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

89986496

17 4月, 2014 5 次提交

Drivers: hv: vmbus: Negotiate version 3.0 when running on ws2012r2 hosts · 03367ef5

由 K. Y. Srinivasan 提交于 4月 03, 2014

Only ws2012r2 hosts support the ability to reconnect to the host on VMBUS. This functionality
is needed by kexec in Linux. To use this functionality we need to negotiate version 3.0 of the
VMBUS protocol.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>        [3.9+]
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

03367ef5

net: mdio-gpio: Add support for separate MDI and MDO gpio pins · f1d54c47

由 Guenter Roeck 提交于 4月 15, 2014

This is for a system with fixed assignments of input and output pins
(various variants of Kontron COMe).
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1d54c47

net: mdio-gpio: Add support for active low gpio pins · 1d251481

由 Guenter Roeck 提交于 4月 15, 2014

Some systems using mdio-gpio may use active-low gpio pins
(eg with inverters or FETs connected to all or some of the
gpio pins).
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d251481

sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner() · 33ac1257

由 Tejun Heo 提交于 1月 10, 2014

All device_schedule_callback_owner() users are converted to use
device_remove_file_self().  Remove now unused
{sysfs|device}_schedule_callback_owner().
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

33ac1257

net: phy: add minimal support for QSGMII PHY · b9d12085

由 Thomas Petazzoni 提交于 4月 15, 2014

This commit adds the necessary definitions for the PHY layer to
recognize "qsgmii" as a valid PHY interface. A QSMII interface, as
defined at
http://en.wikipedia.org/wiki/Media_Independent_Interface#Quad_Serial_Gigabit_Media_Independent_Interface,
is "is a method of combining four SGMII lines into a 5Gbit/s
interface. QSGMII, like SGMII, uses LVDS signalling for the TX and RX
data and a single LVDS clock signal. QSGMII uses significantly fewer
signal lines than four SGMII busses."

This type of MAC <-> PHY connection might require special handling on
the MAC driver side, so it should be possible to express this type of
MAC <-> PHY connection, for example in the Device Tree.
Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
Cc: devicetree@vger.kernel.org
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b9d12085

16 4月, 2014 2 次提交

x86: Remove the PCI reboot method from the default chain · 5be44a6f

由 Ingo Molnar 提交于 4月 04, 2014

Steve reported a reboot hang and bisected it back to this commit:

  a4f1987e x86, reboot: Add EFI and CF9 reboot methods into the default list

He heroically tested all reboot methods and found the following:

  reboot=t       # triple fault                  ok
  reboot=k       # keyboard ctrl                 FAIL
  reboot=b       # BIOS                          ok
  reboot=a       # ACPI                          FAIL
  reboot=e       # EFI                           FAIL   [system has no EFI]
  reboot=p       # PCI 0xcf9                     FAIL

And I think it's pretty obvious that we should only try PCI 0xcf9 as a
last resort - if at all.

The other observation is that (on this box) we should never try
the PCI reboot method, but close with either the 'triple fault'
or the 'BIOS' (terminal!) reboot methods.

Thirdly, CF9_COND is a total misnomer - it should be something like
CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...

So this patch fixes the worst problems:

 - it orders the actual reboot logic to follow the reboot ordering
   pattern - it was in a pretty random order before for no good
   reason.

 - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
   BOOT_CF9_SAFE flags to make the code more obvious.

 - it tries the BIOS reboot method before the PCI reboot method.
   (Since 'BIOS' is a terminal reboot method resulting in a hang
    if it does not work, this is essentially equivalent to removing
    the PCI reboot method from the default reboot chain.)

 - just for the miraculous possibility of terminal (resulting
   in hang) reboot methods of triple fault or BIOS returning
   without having done their job, there's an ordering between
   them as well.
Reported-and-bisected-and-tested-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: Li Aubrey <aubrey.li@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

5be44a6f

percpu: Replace __get_cpu_var with this_cpu_ptr · fdb9c293

由 Christoph Lameter 提交于 4月 15, 2014

__this_cpu_ptr is being phased out.  Use raw_cpu_ptr instead which was
introduced in 3.15-rc1.  One case of using __get_cpu_var in the
get_cpu_var macro for address calculation was remaining in
include/linux/percpu.h.

tj: Updated patch description.
Signed-off-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

fdb9c293

15 4月, 2014 1 次提交

net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W · 8c482cdc

由 Daniel Borkmann 提交于 4月 14, 2014

While reviewing seccomp code, we found that BPF_S_ANC_SECCOMP_LD_W has
been wrongly decoded by commit a8fc9277 ("sk-filter: Add ability to
get socket filter program (v2)") into the opcode BPF_LD|BPF_B|BPF_ABS
although it should have been decoded as BPF_LD|BPF_W|BPF_ABS.

In practice, this should not have much side-effect though, as such
conversion is/was being done through prctl(2) PR_SET_SECCOMP. Reverse
operation PR_GET_SECCOMP will only return the current seccomp mode, but
not the filter itself. Since the transition to the new BPF infrastructure,
it's also not used anymore, so we can simply remove this as it's
unreachable.

Fixes: a8fc9277 ("sk-filter: Add ability to get socket filter program (v2)")
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c482cdc

12 4月, 2014 1 次提交

net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369

由 David S. Miller 提交于 4月 11, 2014

Several spots in the kernel perform a sequence like:

	skb_queue_tail(&sk->s_receive_queue, skb);
	sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up.  So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument.  And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

676d2369

11 4月, 2014 7 次提交

mm: slab/slub: use page->list consistently instead of page->lru · 34bf6ef9

由 Dave Hansen 提交于 4月 08, 2014

'struct page' has two list_head fields: 'lru' and 'list'.  Conveniently,
they are unioned together.  This means that code can use them
interchangably, which gets horribly confusing like with this nugget from
slab.c:

>	list_del(&page->lru);
>	if (page->active == cachep->num)
>		list_add(&page->list, &n->slabs_full);

This patch makes the slab and slub code use page->lru universally instead
of mixing ->list and ->lru.

So, the new rule is: page->lru is what the you use if you want to keep
your page on a list.  Don't like the fact that it's not called ->list?
Too bad.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NPekka Enberg <penberg@kernel.org>

34bf6ef9

IB/mlx5: Add block multicast loopback support · f360d88a

由 Eli Cohen 提交于 4月 02, 2014

Add support for the block multicast loopback QP creation flag along
the proper firmware API for that.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

f360d88a

AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC · 312103d6

由 Chris Metcalf 提交于 3月 25, 2014

On systems with CONFIG_COMPAT we introduced the new requirement that
audit_classify_compat_syscall() exists.  This wasn't true for everything
(apparently not for "tilegx", which I know less that nothing about.)

Instead of wrapping the preprocessor optomization with CONFIG_COMPAT we
should have used the new CONFIG_AUDIT_COMPAT_GENERIC.  This patch uses
that config option to make sure only arches which intend to implement
this have the requirement.

This works fine for tilegx according to Chris Metcalf
Signed-off-by: NEric Paris <eparis@redhat.com>

312103d6

NVMe: Retry failed commands with non-fatal errors · edd10d33

由 Keith Busch 提交于 4月 03, 2014

For commands returned with failed status, queue these for resubmission
and continue retrying them until success or for a limited amount of
time. The final timeout was arbitrarily chosen so requests can't be
retried indefinitely.

Since these are requeued on the nvmeq that submitted the command, the
callbacks have to take an nvmeq instead of an nvme_dev as a parameter
so that we can use the locked queue to append the iod to retry later.

The nvme_iod conviently can be used to track how long we've been trying
to successfully complete an iod request. The nvme_iod also provides the
nvme prp dma mappings, so I had to move a few things around so we can
keep those mappings.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
[fixed checkpatch issue with long line]
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

edd10d33

NVMe: Make I/O timeout a module parameter · b355084a

由 Keith Busch 提交于 4月 04, 2014

Increase the default timeout to 30 seconds to match SCSI.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
[use byte instead of ushort]
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

b355084a

NVMe: CPU hot plug notification · 33b1e95c

由 Keith Busch 提交于 3月 24, 2014

Registers with hot cpu notification to rebalance, and potentially allocate
additional, io queues.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

33b1e95c

NVMe: per-cpu io queues · 42f61420

由 Keith Busch 提交于 3月 24, 2014

The device's IO queues are associated with CPUs, so we can use a per-cpu
variable to map the a qid to a cpu. This provides a convienient way
to optimally assign queues to multiple cpus when the device supports
fewer queues than the host has cpus. The previous implementation may
have assigned these poorly in these situations. This patch addresses
this by sharing queues among cpus that are "close" together and should
have a lower lock contention penalty.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>

42f61420

10 4月, 2014 1 次提交

block: fix regression with block enabled tagging · 360f92c2

由 Jens Axboe 提交于 4月 09, 2014

Martin reported that his test system would not boot with
current git, it oopsed with this:

BUG: unable to handle kernel paging request at ffff88046c6c9e80
IP: [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150
PGD 1ddf067 PUD 1de2067 PMD 47fc7d067 PTE 800000046c6c9060
Oops: 0002 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: sd_mod lpfc(+) scsi_transport_fc scsi_tgt oracleasm
rpcsec_gss_krb5 ipv6 igb dca i2c_algo_bit i2c_core hwmon
CPU: 3 PID: 87 Comm: kworker/u17:1 Not tainted 3.14.0+ #246
Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013
Workqueue: events_unbound async_run_entry_fn
task: ffff8802743c2150 ti: ffff880273d02000 task.ti: ffff880273d02000
RIP: 0010:[<ffffffff812971e0>]  [<ffffffff812971e0>]
blk_queue_start_tag+0x90/0x150
RSP: 0018:ffff880273d03a58  EFLAGS: 00010092
RAX: ffff88046c6c9e78 RBX: ffff880077208e78 RCX: 00000000fffc8da6
RDX: 00000000fffc186d RSI: 0000000000000009 RDI: 00000000fffc8d9d
RBP: ffff880273d03a88 R08: 0000000000000001 R09: ffff8800021c2410
R10: 0000000000000005 R11: 0000000000015b30 R12: ffff88046c5bb8a0
R13: ffff88046c5c0890 R14: 000000000000001e R15: 000000000000001e
FS:  0000000000000000(0000) GS:ffff880277b00000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff88046c6c9e80 CR3: 00000000018f6000 CR4: 00000000000407e0
Stack:
 ffff880273d03a98 ffff880474b18800 0000000000000000 ffff880474157000
 ffff88046c5c0890 ffff880077208e78 ffff880273d03ae8 ffffffff813b9e62
 ffff880200000010 ffff880474b18968 ffff880474b18848 ffff88046c5c0cd8
Call Trace:
 [<ffffffff813b9e62>] scsi_request_fn+0xf2/0x510
 [<ffffffff81293167>] __blk_run_queue+0x37/0x50
 [<ffffffff8129ac43>] blk_execute_rq_nowait+0xb3/0x130
 [<ffffffff8129ad24>] blk_execute_rq+0x64/0xf0
 [<ffffffff8108d2b0>] ? bit_waitqueue+0xd0/0xd0
 [<ffffffff813bba35>] scsi_execute+0xe5/0x180
 [<ffffffff813bbe4a>] scsi_execute_req_flags+0x9a/0x110
 [<ffffffffa01b1304>] sd_spinup_disk+0x94/0x460 [sd_mod]
 [<ffffffff81160000>] ? __unmap_hugepage_range+0x200/0x2f0
 [<ffffffffa01b2b9a>] sd_revalidate_disk+0xaa/0x3f0 [sd_mod]
 [<ffffffffa01b2fb8>] sd_probe_async+0xd8/0x200 [sd_mod]
 [<ffffffff8107703f>] async_run_entry_fn+0x3f/0x140
 [<ffffffff8106a1c5>] process_one_work+0x175/0x410
 [<ffffffff8106b373>] worker_thread+0x123/0x400
 [<ffffffff8106b250>] ? manage_workers+0x160/0x160
 [<ffffffff8107104e>] kthread+0xce/0xf0
 [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70
 [<ffffffff815f0bac>] ret_from_fork+0x7c/0xb0
 [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70
Code: 48 0f ab 11 72 db 48 81 4b 40 00 00 10 00 89 83 08 01 00 00 48 89
df 49 8b 04 24 48 89 1c d0 e8 f7 a8 ff ff 49 8b 85 28 05 00 00 <48> 89
58 08 48 89 03 49 8d 85 28 05 00 00 48 89 43 08 49 89 9d
RIP  [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150
 RSP <ffff880273d03a58>
CR2: ffff88046c6c9e80

Martin bisected and found this to be the problem patch;

	commit 6d113398
	Author: Jan Kara <jack@suse.cz>
	Date:   Mon Feb 24 16:39:54 2014 +0100

	    block: Stop abusing rq->csd.list in blk-softirq

and the problem was immediately apparent. The patch states that
it is safe to reuse queuelist at completion time, since it is
no longer used. However, that is not true if a device is using
block enabled tagging. If that is the case, then the queuelist
is reused to keep track of busy tags. If a device also ended
up using softirq completions, we'd reuse ->queuelist for the
IPI handling while block tagging was still using it. Boom.

Fix this by adding a new ipi_list list head, and share the
memory used with the request hash table. The hash table is
never used after the request is moved to the dispatch list,
which happens long before any potential completion of the
request. Add a new request bit for this, so we don't have
cases that check rq->hash while it could potentially have
been reused for the IPI completion.
Reported-by: NMartin K. Petersen <martin.petersen@oracle.com>
Tested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NJens Axboe <axboe@fb.com>

360f92c2