提交 · c47586b6d36ef2d5d7dc39afc44b75e31bc1a671 · openanolis / cloud-kernel

02 7月, 2011 4 次提交

PM: Introduce generic "noirq" callback routines for subsystems (v2) · e5291928

由 Rafael J. Wysocki 提交于 7月 01, 2011

Introduce generic "noirq" power management callback routines for
subsystems in addition to the "regular" generic PM callback routines.

The new routines will be used, among other things, for implementing
system-wide PM transitions support for generic PM domains.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

e5291928

PM / Domains: Rename struct dev_power_domain to struct dev_pm_domain · 564b905a

由 Rafael J. Wysocki 提交于 6月 23, 2011

The naming convention used by commit 7538e3db6e015e890825fbd9f86599b
(PM: Add support for device power domains), which introduced the
struct dev_power_domain type for representing device power domains,
evidently confuses some developers who tend to think that objects
of this type must correspond to "power domains" as defined by
hardware, which is not the case.  Namely, at the kernel level, a
struct dev_power_domain object can represent arbitrary set of devices
that are mutually dependent power management-wise and need not belong
to one hardware power domain.  To avoid that confusion, rename struct
dev_power_domain to struct dev_pm_domain and rename the related
pointers in struct device and struct pm_clk_notifier_block from
pwr_domain to pm_domain.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: NKevin Hilman <khilman@ti.com>

564b905a

PM / Runtime: Update documentation regarding driver removal · f5da24db

由 Rafael J. Wysocki 提交于 7月 02, 2011

Commit e1866b33 (PM / Runtime: Rework
runtime PM handling during driver removal) forgot to update the
documentation in Documentation/power/runtime_pm.txt to match the new
code in drivers/base/dd.c.  Update that documentation to match the
code it describes.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: NKevin Hilman <khilman@ti.com>

f5da24db

PM: Documentation: fix typo: pm_runtime_idle_sync() doesn't exist. · 5efb54cc

由 Kevin Hilman 提交于 6月 30, 2011

Replace reference to pm_runtime_idle_sync() in the driver core with
pm_runtime_put_sync() which is used in the code.
Signed-off-by: NKevin Hilman <khilman@ti.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

5efb54cc

22 6月, 2011 3 次提交

PM / Domains: Update documentation · ca9c6890

由 Rafael J. Wysocki 提交于 6月 21, 2011

Commit 4d27e9dc (PM: Make power
domain callbacks take precedence over subsystem ones) forgot to
update the device power management documentation to take changes
made by it into account.  Correct that mistake.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

ca9c6890

PM: Update documentation regarding sysdevs · 78420884

由 Rafael J. Wysocki 提交于 6月 18, 2011

The part of Documentation/power/devices.txt regarding sysdevs is not
valid any more after commit 2e711c04
(PM: Remove sysdev suspend, resume and shutdown operations), so
remove it.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

78420884

PM / Runtime: Update doc: usage count no longer incremented across system PM · 129b656a

由 Kevin Hilman 提交于 6月 10, 2011

Commit e8665002 (PM: Allow
pm_runtime_suspend() to succeed during system suspend) removed usage
count increment across system PM.

Update doc to reflect this.
Signed-off-by: NKevin Hilman <khilman@ti.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

129b656a

18 6月, 2011 1 次提交

USB: Fix up URB error codes to reflect implementation. · a9e75863

由 Sarah Sharp 提交于 6月 16, 2011

Documentation/usb/error-codes.txt mentions that urb->status can be set to
-EXDEV, if the isochronous transfer was not fully completed.  However, in
practice, EHCI, UHCI, and OHCI all only set -EXDEV in the individual frame
status, never in the URB status.  Those host controller actually always
pass in a zero status to usb_hcd_giveback_urb, and rely on the core to set
the appropriate status value.

The xHCI driver ran into issues with the uvcvideo driver when it tried to
set -EXDEV in urb->status, because the driver refused to submit URBs, and
the userspace camera application's video froze.

Clean up the documentation to reflect the actual implementation.
Signed-off-by: NSarah Sharp <sarah.a.sharp@linux.intel.com>
Acked-by: NAlan Stern <stern@rowland.harvard.edu>

a9e75863

16 6月, 2011 7 次提交

Documentation: fix cgroup typos and formatting · 67de0162

由 Jörg Sommer 提交于 6月 15, 2011

Fix format and spelling.
Signed-off-by: NJörg Sommer <joerg@alea.gnuu.de>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

67de0162

Documentation: update cgroupfs mount point · f6e07d38

由 Jörg Sommer 提交于 6月 15, 2011

According to commit 676db4af ("cgroupfs: create /sys/fs/cgroup to
mount cgroupfs on") the canonical mountpoint for the cgroup filesystem
is /sys/fs/cgroup.  Hence, this should be used in the documentation.
Signed-off-by: NJörg Sommer <joerg@alea.gnuu.de>
Acked-by: NPaul Menage <menage@google.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f6e07d38

Documentation: update kmemleak supported archs · 06a2c45d

由 Maxin B. John 提交于 6月 15, 2011

Instead of listing the architectures that are supported by
kmemleak in Documentation/kmemleak.txt, just refer people to
the list of supported architecutures in lib/Kconfig.debug so
that Documentation/kmemleak.txt does not need more updates
for this.
Signed-off-by: NMaxin B. John <maxin.john@gmail.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

06a2c45d

Documentation: update printk-formats.txt · 04c55715

由 Andrew Murray 提交于 6月 15, 2011

This patch updates the incomplete documentation concerning the printk
extended format specifiers.
Signed-off-by: NAndrew Murray <amurray@mpc-data.co.uk>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

04c55715

Documentation/feature-removal-schedule.txt: remove ns_cgroup from feature-removal-schedule.txt · 31b5f8ee

由 akpm@linux-foundation.org 提交于 6月 15, 2011

Commit a77aea92 ("cgroup: remove the ns_cgroup") removed the
ns_cgroup but it forgot to remove the related doc in
feature-removal-schedule.txt.
Signed-off-by: NWANG Cong <xiyou.wangcong@gmail.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Cc: Serge E.  Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

31b5f8ee

memcg: add documentation for the memory.numastat API · 50c35e5b

由 Ying Han 提交于 6月 15, 2011

[akpm@linux-foundation.org: rework text, fit it into 80-cols]
Signed-off-by: NYing Han <yinghan@google.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

50c35e5b

backlight: new driver for the ADP8870 backlight devices · a59ec1e7

由 Michael Hennerich 提交于 6月 15, 2011

Signed-off-by: NMichael Hennerich <michael.hennerich@analog.com>
Signed-off-by: NMike Frysinger <vapier@gentoo.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a59ec1e7

15 6月, 2011 1 次提交

rcu: Use softirq to address performance regression · 09223371

由 Shaohua Li 提交于 6月 14, 2011

Commit a26ac245(rcu: move TREE_RCU from softirq to kthread)
introduced performance regression. In an AIM7 test, this commit degraded
performance by about 40%.

The commit runs rcu callbacks in a kthread instead of softirq. We observed
high rate of context switch which is caused by this. Out test system has
64 CPUs and HZ is 1000, so we saw more than 64k context switch per second
which is caused by RCU's per-CPU kthread. A trace showed that most of
the time the RCU per-CPU kthread doesn't actually handle any callbacks,
but instead just does a very small amount of work handling grace periods.
This means that RCU's per-CPU kthreads are making the scheduler do quite
a bit of work in order to allow a very small amount of RCU-related
processing to be done.

Alex Shi's analysis determined that this slowdown is due to lock
contention within the scheduler. Unfortunately, as Peter Zijlstra points
out, the scheduler's real-time semantics require global action, which
means that this contention is inherent in real-time scheduling. (Yes,
perhaps someone will come up with a workaround -- otherwise, -rt is not
going to do well on large SMP systems -- but this patch will work around
this issue in the meantime. And "the meantime" might well be forever.)

This patch therefore re-introduces softirq processing to RCU, but only
for core RCU work. RCU callbacks are still executed in kthread context,
so that only a small amount of RCU work runs in softirq context in the
common case. This should minimize ksoftirqd execution, allowing us to
skip boosting of ksoftirqd for CONFIG_RCU_BOOST=y kernels.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Tested-by: N"Alex,Shi" <alex.shi@intel.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

09223371

09 6月, 2011 1 次提交

md:Documentation/md.txt - fix typo · f699bf23

由 NeilBrown 提交于 6月 09, 2011

Reported-by: NCoolCold <coolthecold@gmail.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

f699bf23

08 6月, 2011 1 次提交

usb-storage: redo incorrect reads · 21c13a4f

由 Alan Stern 提交于 6月 07, 2011

Some USB mass-storage devices have bugs that cause them not to handle
the first READ(10) command they receive correctly.  The Corsair
Padlock v2 returns completely bogus data for its first read (possibly
it returns the data in encrypted form even though the device is
supposed to be unlocked).  The Feiya SD/SDHC card reader fails to
complete the first READ(10) command after it is plugged in or after a
new card is inserted, returning a status code that indicates it thinks
the command was invalid, which prevents the kernel from retrying the
read.

Since the first read of a new device or a new medium is for the
partition sector, the kernel is unable to retrieve the device's
partition table.  Users have to manually issue an "hdparm -z" or
"blockdev --rereadpt" command before they can access the device.

This patch (as1470) works around the problem.  It adds a new quirk
flag, US_FL_INVALID_READ10, indicating that the first READ(10) should
always be retried immediately, as should any failing READ(10) commands
(provided the preceding READ(10) command succeeded, to avoid getting
stuck in a loop).  The patch also adds appropriate unusual_devs
entries containing the new flag.
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Tested-by: NSven Geggus <sven-usbst@geggus.net>
Tested-by: NPaul Hartman <paul.hartman+linux@gmail.com>
CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
CC: <stable@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

21c13a4f

01 6月, 2011 1 次提交

intel-iommu: Enable super page (2MiB, 1GiB, etc.) support · 6dd9a7c7

由 Youquan Song 提交于 5月 25, 2011

There are no externally-visible changes with this. In the loop in the
internal __domain_mapping() function, we simply detect if we are mapping:
  - size >= 2MiB, and
  - virtual address aligned to 2MiB, and
  - physical address aligned to 2MiB, and
  - on hardware that supports superpages.

(and likewise for larger superpages).

We automatically use a superpage for such mappings. We never have to
worry about *breaking* superpages, since we trust that we will always
*unmap* the same range that was mapped. So all we need to do is ensure
that dma_pte_clear_range() will also cope with superpages.

Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so
it can return a PTE at the appropriate level rather than always
extending the page tables all the way down to level 1. Again, this is
simplified by the fact that we should never encounter existing small
pages when we're creating a mapping; any old mapping that used the same
virtual range will have been entirely removed and its obsolete page
tables freed.

Provide an 'intel_iommu=sp_off' argument on the command line as a
chicken bit. Not that it should ever be required.

==

The original commit seen in the iommu-2.6.git was Youquan's
implementation (and completion) of my own half-baked code which I'd
typed into an email. Followed by half a dozen subsequent 'fixes'.

I've taken the unusual step of rewriting history and collapsing the
original commits in order to keep the main history simpler, and make
life easier for the people who are going to have to backport this to
older kernels. And also so I can give it a more coherent commit comment
which (hopefully) gives a better explanation of what's going on.

The original sequence of commits leading to identical code was:

Youquan Song (3):
      intel-iommu: super page support
      intel-iommu: Fix superpage alignment calculation error
      intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte()

David Woodhouse (4):
      intel-iommu: Precalculate superpage support for dmar_domain
      intel-iommu: Fix hardware_largepage_caps()
      intel-iommu: Fix inappropriate use of superpages in __domain_mapping()
      intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages
Signed-off-by: NYouquan Song <youquan.song@intel.com>
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>

6dd9a7c7

30 5月, 2011 2 次提交

lguest: remove support for VIRTIO_F_NOTIFY_ON_EMPTY. · 990c91f0

由 Rusty Russell 提交于 5月 30, 2011

No virtio device does this any more, so no need to clutter lguest with it.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

990c91f0

lguest: fix up compilation after move · bc805a03

由 Rusty Russell 提交于 5月 30, 2011

ed16648e "Move kvm, uml, and lguest
subdirectories" broke the lguest example launcher.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

bc805a03

29 5月, 2011 5 次提交

x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param · 5d4c47e0

由 Len Brown 提交于 4月 01, 2011

mwait_idle() is a C1-only idle loop intended to be more efficient
than HLT on SMP hardware that supports it.

But mwait_idle() has been replaced by the more general
mwait_idle_with_hints(), which handles both C1 and deeper C-states.
ACPI uses only mwait_idle_with_hints(), and never uses mwait_idle().

Deprecate mwait_idle() and the "idle=mwait" cmdline param
to simplify the x86 idle code.

After this change, kernels configured with
(!CONFIG_ACPI=n && !CONFIG_INTEL_IDLE=n) when run on hardware
that support MWAIT will simply use HLT.  If MWAIT is desired
on those systems, cpuidle and the cpuidle drivers above
can be used.

cc: x86@kernel.org
cc: stable@kernel.org # .39.x
Signed-off-by: NLen Brown <len.brown@intel.com>

5d4c47e0

x86 idle: deprecate "no-hlt" cmdline param · cdaab4a0

由 Len Brown 提交于 4月 01, 2011

We'd rather that modern machines not check if HLT works on
every entry into idle, for the benefit of machines that had
marginal electricals 15-years ago.  If those machines are still running
the upstream kernel, they can use "idle=poll".  The only difference
will be that they'll now invoke HLT in machine_hlt().

cc: x86@kernel.org # .39.x
Signed-off-by: NLen Brown <len.brown@intel.com>

cdaab4a0

x86 idle APM: deprecate CONFIG_APM_CPU_IDLE · 99c63221

由 Len Brown 提交于 4月 01, 2011

We don't want to export the pm_idle function pointer to modules.
Currently CONFIG_APM_CPU_IDLE w/ CONFIG_APM_MODULE forces us to.

CONFIG_APM_CPU_IDLE is of dubious value, it runs only on 32-bit
uniprocessor laptops that are over 10 years old.  It calls into
the BIOS during idle, and is known to cause a number of machines
to fail.

Removing CONFIG_APM_CPU_IDLE and will allow us to stop exporting
pm_idle.  Any systems that were calling into the APM BIOS
at run-time will simply use HLT instead.

cc: x86@kernel.org
cc: Jiri Kosina <jkosina@suse.cz>
cc: stable@kernel.org # .39.x
Signed-off-by: NLen Brown <len.brown@intel.com>

99c63221

x86 idle floppy: deprecate disable_hlt() · 3b70b2e5

由 Len Brown 提交于 4月 01, 2011

Plan to remove floppy_disable_hlt in 2012, an ancient
workaround with comments that it should be removed.

This allows us to remove clutter and a run-time branch
from the idle code.

WARN_ONCE() on invocation until it is removed.

cc: x86@kernel.org
cc: stable@kernel.org # .39.x
Signed-off-by: NLen Brown <len.brown@intel.com>

3b70b2e5

ACPI: Split out custom_method functionality into an own driver · 526b4af4

由 Thomas Renninger 提交于 5月 26, 2011

With /sys/kernel/debug/acpi/custom_method root can write
to arbitrary memory and increase his priveleges, even if
these are restricted.

-> Make this an own debug .config option and warn about the
security issue in the config description.

-> Still keep acpi/debugfs.c which now only creates an empty
   /sys/kernel/debug/acpi directory. There might be other
   users of it later.
Signed-off-by: NThomas Renninger <trenn@suse.de>
Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: rui.zhang@intel.com
Signed-off-by: NLen Brown <len.brown@intel.com>

526b4af4

28 5月, 2011 2 次提交

Documentation: Add statistics about nested locks · f62508f6

由 Juri Lelli 提交于 5月 19, 2011

Explain what the trailing "/1" on some lock class names of
lock_stat output means.
Reviewed-by: NYong Zhang <yong.zhang0@gmail.com>
Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4DD4F6C1.5090701@gmail.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

f62508f6

acer-wmi: Delete out-of-date documentation · 02003667

由 Carlos Corbacho 提交于 5月 02, 2011

The documentation file for acer-wmi is long out of date, and there's not
much point in keeping it around either.
Signed-off-by: NCarlos Corbacho <carlos@strangeworlds.co.uk>
Signed-off-by: NMatthew Garrett <mjg@redhat.com>

02003667

27 5月, 2011 12 次提交

fs: pass exact type of data dirties to ->dirty_inode · aa385729

由 Christoph Hellwig 提交于 5月 27, 2011

Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
anything else, so that the filesystem can track internally if it
needs to push out a transaction for fdatasync or not.

This is just the prototype change with no user for it yet.  I plan
to push large XFS changes for the next merge window, and getting
this trivial infrastructure in this window would help a lot to avoid
tree interdependencies.

Also remove incorrect comments that ->dirty_inode can't block.  That
has been changed a long time ago, and many implementations rely on it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aa385729

regulator: Remove supply_regulator_dev from machine configuration · 492c826b

由 Mark Brown 提交于 5月 08, 2011

supply_regulator_dev (using a struct pointer) has been deprecated in favour
of supply_regulator (using a regulator name) for quite a few releases
now with a warning generated if it is used and there are no current in tree
users so just remove the code.
Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: NLiam Girdwood <lrg@slimlogic.co.uk>

492c826b

coredump: add support for exe_file in core name · 57cc083a

由 Jiri Slaby 提交于 5月 26, 2011

Now, exe_file is not proc FS dependent, so we can use it to name core
file.  So we add %E pattern for core file name cration which extract path
from mm_struct->exe_file.  Then it converts slashes to exclamation marks
and pastes the result to the core file name itself.

This is useful for environments where binary names are longer than 16
character (the current->comm limitation).  Also where there are binaries
with same name but in a different path.  Further in case the binery itself
changes its current->comm after exec.

So by doing (s/$/#/ -- # is treated as git comment):

  $ sysctl kernel.core_pattern='core.%p.%e.%E'
  $ ln /bin/cat cat45678901234567890
  $ ./cat45678901234567890
  ^Z
  $ rm cat45678901234567890
  $ fg
  ^\Quit (core dumped)
  $ ls core*

we now get:

  core.2434.cat456789012345.!root!cat45678901234567890 (deleted)
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reviewed-by: NAndi Kleen <andi@firstfloor.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57cc083a

cgroup: remove the ns_cgroup · a77aea92

由 Daniel Lezcano 提交于 5月 26, 2011

The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
leads to some problems:

  * cgroup creation is out-of-control
  * cgroup name can conflict when pids are looping
  * it is not possible to have a single process handling a lot of
    namespaces without falling in a exponential creation time
  * we may want to create a namespace without creating a cgroup

  The ns_cgroup was replaced by a compatibility flag 'clone_children',
  where a newly created cgroup will copy the parent cgroup values.
  The userspace has to manually create a cgroup and add a task to
  the 'tasks' file.

This patch removes the ns_cgroup as suggested in the following thread:

https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html

The 'cgroup_clone' function is removed because it is no longer used.

This is a userspace-visible change.  Commit 45531757 ("cgroup: notify
ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
printk warning users that the feature is planned for removal.  Since that
time we have heard from XXX users who were affected by this.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPaul Menage <menage@google.com>
Acked-by: NMatt Helsley <matthltc@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a77aea92

cgroups: make procs file writable · 74a1166d

由 Ben Blum 提交于 5月 26, 2011

Make procs file writable to move all threads by tgid at once.

Add functionality that enables users to move all threads in a threadgroup
at once to a cgroup by writing the tgid to the 'cgroup.procs' file.  This
current implementation makes use of a per-threadgroup rwsem that's taken
for reading in the fork() path to prevent newly forking threads within the
threadgroup from "escaping" while the move is in progress.
Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Reviewed-by: NPaul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

74a1166d

cgroups: add per-thread subsystem callbacks · f780bdb7

由 Ben Blum 提交于 5月 26, 2011

Add cgroup subsystem callbacks for per-thread attachment in atomic contexts

Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
for cgroups's subsystem interface.  Unlike can_attach and attach, these
are for per-thread operations, to be called potentially many times when
attaching an entire threadgroup.

Also, the old "bool threadgroup" interface is removed, as replaced by
this.  All subsystems are modified for the new interface - of note is
cpuset, which requires from/to nodemasks for attach to be globally scoped
(though per-cpuset would work too) to persist from its pre_attach to
attach_task and attach.

This is a pre-patch for cgroup-procs-writable.patch.
Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Reviewed-by: NPaul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f780bdb7

Documentation: configfs examples crash fix · dcb3a08e

由 Jiri Slaby 提交于 5月 26, 2011

When configfs_register_subsystem() fails, we unregister too many
subsystems in configfs_example_init.  Decrement i by one to not unregister
non-registered subsystem.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dcb3a08e

getdelays: show average CPU/IO/SWAP/RECLAIM delays · 02d54f09

由 Wu Fengguang 提交于 5月 26, 2011

I find it very handy to show the average delays in milliseconds.

Example output (on 100 concurrent dd reading sparse files):

  CPU             count     real total  virtual total    delay total  delay average
                    986     3223509952     3207643301    38863410579         39.415ms
  IO              count    delay total  delay average
                      0              0              0ms
  SWAP            count    delay total  delay average
                      0              0              0ms
  RECLAIM         count    delay total  delay average
                   1059     5131834899              4ms
  dd: read=0, write=0, cancelled_write=0
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Mel Gorman <mel@linux.vnet.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Reviewed-by: NSatoru Moriya <satoru.moriya@hds.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

02d54f09

Documentation/accounting/getdelays.c: handle sendto() failures · 4ed960b1

由 Andrew Morton 提交于 5月 26, 2011

Fixes

  Documentation/accounting/getdelays.c: In function `get_family_id':
  Documentation/accounting/getdelays.c:172:14: warning: variable `rc' set but not used [-Wunused-but-set-variable]
Reported-by: N"Justin P. Mattock" <justinmattock@gmail.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4ed960b1

Documentation/accounting/getdelays.c: fix unused var warning · fbdd91a6

由 Justin P. Mattock 提交于 5月 26, 2011

Fixes

  Documentation/accounting/getdelays.c: In function `main':
  Documentation/accounting/getdelays.c:436:7: warning: variable `i' set but not used [-Wunused-but-set-variable]
Signed-off-by: NJustin P. Mattock <justinmattock@gmail.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fbdd91a6

Documentation/atomic_ops.txt: avoid volatile in sample code · 72eef0f3

由 Nikanth Karthikesan 提交于 5月 26, 2011

As declaring counter as volatile is discouraged, it is best not to use it
in sample code as well.
Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

72eef0f3

rcu: Decrease memory-barrier usage based on semi-formal proof · 23b5c8fa

由 Paul E. McKenney 提交于 9月 07, 2010

(Note: this was reverted, and is now being re-applied in pieces, with
this being the fifth and final piece. See below for the reason that
it is now felt to be safe to re-apply this.)

Commit d09b62df fixed grace-period synchronization, but left some smp_mb()
invocations in rcu_process_callbacks() that are no longer needed, but
sheer paranoia prevented them from being removed. This commit removes
them and provides a proof of correctness in their absence. It also adds
a memory barrier to rcu_report_qs_rsp() immediately before the update to
rsp->completed in order to handle the theoretical possibility that the
compiler or CPU might move massive quantities of code into a lock-based
critical section. This also proves that the sheer paranoia was not
entirely unjustified, at least from a theoretical point of view.

In addition, the old dyntick-idle synchronization depended on the fact
that grace periods were many milliseconds in duration, so that it could
be assumed that no dyntick-idle CPU could reorder a memory reference
across an entire grace period. Unfortunately for this design, the
addition of expedited grace periods breaks this assumption, which has
the unfortunate side-effect of requiring atomic operations in the
functions that track dyntick-idle state for RCU. (There is some hope
that the algorithms used in user-level RCU might be applied here, but
some work is required to handle the NMIs that user-space applications
can happily ignore. For the short term, better safe than sorry.)

This proof assumes that neither compiler nor CPU will allow a lock
acquisition and release to be reordered, as doing so can result in
deadlock. The proof is as follows:

1. A given CPU declares a quiescent state under the protection of
its leaf rcu_node's lock.

2. If there is more than one level of rcu_node hierarchy, the
last CPU to declare a quiescent state will also acquire the
->lock of the next rcu_node up in the hierarchy, but only
after releasing the lower level's lock. The acquisition of this
lock clearly cannot occur prior to the acquisition of the leaf
node's lock.

3. Step 2 repeats until we reach the root rcu_node structure.
Please note again that only one lock is held at a time through
this process. The acquisition of the root rcu_node's ->lock
must occur after the release of that of the leaf rcu_node.

4. At this point, we set the ->completed field in the rcu_state
structure in rcu_report_qs_rsp(). However, if the rcu_node
hierarchy contains only one rcu_node, then in theory the code
preceding the quiescent state could leak into the critical
section. We therefore precede the update of ->completed with a
memory barrier. All CPUs will therefore agree that any updates
preceding any report of a quiescent state will have happened
before the update of ->completed.

5. Regardless of whether a new grace period is needed, rcu_start_gp()
will propagate the new value of ->completed to all of the leaf
rcu_node structures, under the protection of each rcu_node's ->lock.
If a new grace period is needed immediately, this propagation
will occur in the same critical section that ->completed was
set in, but courtesy of the memory barrier in #4 above, is still
seen to follow any pre-quiescent-state activity.

6. When a given CPU invokes __rcu_process_gp_end(), it becomes
aware of the end of the old grace period and therefore makes
any RCU callbacks that were waiting on that grace period eligible
for invocation.

If this CPU is the same one that detected the end of the grace
period, and if there is but a single rcu_node in the hierarchy,
we will still be in the single critical section. In this case,
the memory barrier in step #4 guarantees that all callbacks will
be seen to execute after each CPU's quiescent state.

On the other hand, if this is a different CPU, it will acquire
the leaf rcu_node's ->lock, and will again be serialized after
each CPU's quiescent state for the old grace period.

On the strength of this proof, this commit therefore removes the memory
barriers from rcu_process_callbacks() and adds one to rcu_report_qs_rsp().
The effect is to reduce the number of memory barriers by one and to
reduce the frequency of execution from about once per scheduling tick
per CPU to once per grace period.

This was reverted do to hangs found during testing by Yinghai Lu and
Ingo Molnar. Frederic Weisbecker supplied Yinghai with tracing that
located the underlying problem, and Frederic also provided the fix.

The underlying problem was that the HARDIRQ_ENTER() macro from
lib/locking-selftest.c invoked irq_enter(), which in turn invokes
rcu_irq_enter(), but HARDIRQ_EXIT() invoked __irq_exit(), which
does not invoke rcu_irq_exit(). This situation resulted in calls
to rcu_irq_enter() that were not balanced by the required calls to
rcu_irq_exit(). Therefore, after these locking selftests completed,
RCU's dyntick-idle nesting count was a large number (for example,
72), which caused RCU to to conclude that the affected CPU was not in
dyntick-idle mode when in fact it was.

RCU would therefore incorrectly wait for this dyntick-idle CPU, resulting
in hangs.

In contrast, with Frederic's patch, which replaces the irq_enter()
in HARDIRQ_ENTER() with an __irq_enter(), these tests don't ever call
either rcu_irq_enter() or rcu_irq_exit(), which works because the CPU
running the test is already marked as not being in dyntick-idle mode.
This means that the rcu_irq_enter() and rcu_irq_exit() calls and RCU
then has no problem working out which CPUs are in dyntick-idle mode and
which are not.

The reason that the imbalance was not noticed before the barrier patch
was applied is that the old implementation of rcu_enter_nohz() ignored
the nesting depth. This could still result in delays, but much shorter
ones. Whenever there was a delay, RCU would IPI the CPU with the
unbalanced nesting level, which would eventually result in rcu_enter_nohz()
being called, which in turn would force RCU to see that the CPU was in
dyntick-idle mode.

The reason that very few people noticed the problem is that the mismatched
irq_enter() vs. __irq_exit() occured only when the kernel was built with
CONFIG_DEBUG_LOCKING_API_SELFTESTS.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>

23b5c8fa

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功