提交 · e9dd2b6837e26fe202708cce5ea4bb4ee3e3482e · openanolis / cloud-kernel

22 10月, 2010 10 次提交

pcmcia: IOCARD is also required for using IRQs · ff10fca5

由 Dominik Brodowski 提交于 10月 22, 2010

Dave Hinds pointed out to me that 37979e15 will break b43 and
ray_cs, as IOCARD is not -- as the name would suggest -- only needed
for cards using IO ports. Instead, as it re-deines several pins, it
is also required for using interrupts.
Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>

ff10fca5

include/linux/libata.h: fix typo · 89692c03

由 Andrea Gelmini 提交于 10月 16, 2010

Signed-off-by: NAndrea Gelmini <andrea.gelmini@gelma.net>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

89692c03

libata: reorder ata_queued_cmd to remove alignment padding on 64 bit builds · b34e9042

由 Richard Kennedy 提交于 9月 10, 2010

Reorder structure ata_queued_cmd to remove 8 bytes of alignment padding
on 64 bit builds & therefore reduce the size of structure ata_port by
256 bytes.

Overall this will have little impact, other than reducing the amount of
memory that is cleared when allocating ata_ports.
Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

b34e9042

libata: implement cross-port EH exclusion · c0c362b6

由 Tejun Heo 提交于 9月 06, 2010

In libata, the non-EH code paths should always take and release
ap->lock explicitly when accessing hardware or shared data structures.
However, once EH is active, it's assumed that the port is owned by EH
and EH methods don't explicitly take ap->lock unless race from irq
handler or other code paths are expected.  However, libata EH didn't
guarantee exclusion among EHs for ports of the same host.  IOW,
multiple EHs may execute in parallel on multiple ports of the same
controller.

In many cases, especially in SATA, the ports are completely
independent of each other and this doesn't cause problems; however,
there are cases where different ports share the same resource, which
lead to obscure timing related bugs such as the one fixed by commit
213373cf (ata_piix: fix locking around SIDPR access).

This patch implements exclusion among EHs of the same host.  When EH
begins, it acquires per-host EH ownership by calling ata_eh_acquire().
When EH finishes, the ownership is released by calling
ata_eh_release().  EH ownership is also released whenever the EH
thread goes to sleep from ata_msleep() or explicitly and reacquired
after waking up.

This ensures that while EH is actively accessing the hardware, it has
exclusive access to it while allowing EHs to interleave and progress
in parallel as they hit waiting stages, which dominate the time spent
in EH.  This achieves cross-port EH exclusion without pervasive and
fragile changes while still allowing parallel EH for the most part.

This was first reported by yuanding02@gmail.com more than three years
ago in the following bugzilla.  :-)

  https://bugzilla.kernel.org/show_bug.cgi?id=8223Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reported-by: yuanding02@gmail.com
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

c0c362b6

libata: add @ap to ata_wait_register() and introduce ata_msleep() · 97750ceb

由 Tejun Heo 提交于 9月 06, 2010

Add optional @ap argument to ata_wait_register() and replace msleep()
calls with ata_msleep() which take optional @ap in addition to the
duration.  These will be used to implement EH exclusion.

This patch doesn't cause any behavior difference.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

97750ceb

libata: reimplement link power management · 6b7ae954

由 Tejun Heo 提交于 9月 01, 2010

The current LPM implementation has the following issues.

* Operation order isn't well thought-out.  e.g. HIPM should be
  configured after IPM in SControl is properly configured.  Not the
  other way around.

* Suspend/resume paths call ata_lpm_enable/disable() which must only
  be called from EH context directly.  Also, ata_lpm_enable/disable()
  were called whether LPM was in use or not.

* Implementation is per-port when it should be per-link.  As a result,
  it can't be used for controllers with slave links or PMP.

* LPM state isn't managed consistently.  After a link reset for
  whatever reason including suspend/resume the actual LPM state would
  be reset leaving ap->lpm_policy inconsistent.

* Generic/driver-specific logic boundary isn't clear.  Currently,
  libahci has to mangle stuff which libata EH proper should be
  handling.  This makes the implementation unnecessarily complex and
  fragile.

* Tied to ALPM.  Doesn't consider DIPM only cases and doesn't check
  whether the device allows HIPM.

* Error handling isn't implemented.

Given the extent of mismatch with the rest of libata, I don't think
trying to fix it piecewise makes much sense.  This patch reimplements
LPM support.

* The new implementation is per-link.  The target policy is still
  port-wide (ap->target_lpm_policy) but all the mechanisms and states
  are per-link and integrate well with the rest of link abstraction
  and can work with slave and PMP links.

* Core EH has proper control of LPM state.  LPM state is reconfigured
  when and only when reconfiguration is necessary.  It makes sure that
  LPM state is reset when probing for new device on the link.
  Controller agnostic logic is now implemented in libata EH proper and
  driver implementation only has to deal with controller specifics.

* Proper error handling.  LPM config failure is attributed to the
  device on the link and LPM is disabled for the link if it fails
  repeatedly.

* ops->enable/disable_pm() are replaced with single ops->set_lpm()
  which takes @policy and @hints.  This simplifies driver specific
  implementation.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

6b7ae954

libata: implement sata_link_scr_lpm() and make ata_dev_set_feature() global · 1152b261

由 Tejun Heo 提交于 9月 01, 2010

Link power management is about to be reimplemented.  Prepare for it.

* Implement sata_link_scr_lpm().

* Drop static from ata_dev_set_feature() and make it available to
  other libata files.

* Trivial whitespace adjustments.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

1152b261

libata: clean up lpm related symbols and sysfs show/store functions · c93b263e

由 Tejun Heo 提交于 9月 01, 2010

Link power management related symbols are in confusing state w/ mixed
usages of lpm, ipm and pm.  This patch cleans up lpm related symbols
and sysfs show/store functions as follows.

* lpm states - NOT_AVAILABLE, MIN_POWER, MAX_PERFORMANCE and
  MEDIUM_POWER are renamed to ATA_LPM_UNKNOWN and
  ATA_LPM_{MIN|MAX|MED}_POWER.

* Pre/postfixes are unified to lpm.

* sysfs show/store functions for link_power_management_policy were
  curiously named get/put and unnecessarily complex.  Renamed to
  show/store and simplified.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

c93b263e

[libata] support for > 512 byte sectors (e.g. 4K Native) · 295124dc

由 Grant Grundler 提交于 8月 17, 2010

This change enables my x86 machine to recognize and talk to a
"Native 4K" SATA device.

When I started working on this, I didn't know Matthew Wilcox had
posted a similar patch 2 years ago:
  http://git.kernel.org/?p=linux/kernel/git/willy/ata.git;a=shortlog;h=refs/heads/ata-large-sectors

Gwendal Grignou pointed me at the the above code and small portions of
this patch include Matthew's work. That's why Mathew is first on the
"Signed-off-by:". I've NOT included his use of a bitmap to determine
512 vs Native for ATA command block size - just used a simple table.
And bugs are almost certainly mine.

Lastly, the patch has been tested with a native 4K 'Engineering
Sample' drive provided by Hitachi GST.
Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: NGrant Grundler <grundler@google.com>
Reviewed-by: NGwendal Grignou <gwendal@google.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

295124dc

[libata] Add ATA transport class · d9027470

由 Gwendal Grignou 提交于 5月 25, 2010

This is a scheleton for libata transport class.
All information is read only, exporting information from libata:
- ata_port class: one per ATA port
- ata_link class: one per ATA port or 15 for SATA Port Multiplier
- ata_device class: up to 2 for PATA link, usually one for SATA.
Signed-off-by: NGwendal Grignou <gwendal@google.com>
Reviewed-by: NGrant Grundler <grundler@google.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

d9027470

21 10月, 2010 14 次提交

BKL: introduce CONFIG_BKL. · 6de5bd12

由 Arnd Bergmann 提交于 9月 11, 2010

With all the patches we have queued in the BKL removal tree, only a
few dozen modules are left that actually rely on the BKL, and even
there are lots of low-hanging fruit. We need to decide what to do
about them, this patch illustrates one of the options:

Every user of the BKL is marked as 'depends on BKL' in Kconfig,
and the CONFIG_BKL becomes a user-visible option. If it gets
disabled, no BKL using module can be built any more and the BKL
code itself is compiled out.

The one exception is file locking, which is practically always
enabled and does a 'select BKL' instead. This effectively forces
CONFIG_BKL to be enabled until we have solved the fs/lockd
mess and can apply the patch that removes the BKL from fs/locks.c.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

6de5bd12

EDAC: Export edac sysfs class to users. · 30e1f7a8

由 Borislav Petkov 提交于 9月 02, 2010

Move toplevel sysfs class to the stub and make it available to
non-modularized code too. Add proper refcounting of its users and move
the registration functionality into the reference counting routines.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>

30e1f7a8

block: Turn bvec_k{un,}map_irq() into static inline functions · 11a691be

由 Geert Uytterhoeven 提交于 10月 21, 2010

Convert bvec_k{un,}map_irq() from macros to static inline functions if
!CONFIG_HIGHMEM, so we can easier detect mistakes like the one fixed in
93055c31 ("ps3disk: passing wrong variable =
to
bvec_kunmap_irq()")
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

11a691be

x86-32, percpu: Correct the ordering of the percpu readmostly section · 2aeb66d3

由 H. Peter Anvin 提交于 10月 21, 2010

Checkin c957ef2c had inconsistent
ordering of .data..percpu..page_aligned and .data..percpu..readmostly;
the still-broken version affected x86-32 at least.

The page aligned version really must be page aligned...
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1287544022.4571.7.camel@sli10-conroe.sh.intel.com>
Cc: Shaohua Li <shaohua.li@intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>

2aeb66d3

kernel: roundup should only reference arguments once · b28efd54

由 Eric Paris 提交于 10月 13, 2010

Currently the roundup macro references it's arguments more than one time.
This patch changes it so it will only use its arguments once.
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

b28efd54

kernel: rounddown helper function · 686a0f3d

由 Eric Paris 提交于 10月 13, 2010

The roundup() helper function will round a given value up to a multiple of
another given value.  aka  roundup(11, 7) would give 14 = 7 * 2.  This new
function does the opposite.  It will round a given number down to the
nearest multiple of the second number: rounddown(11, 7) would give 7.

I need this in some future SELinux code and can carry the macro myself, but
figured I would put it in the core kernel so others might find and use it
if need be.
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

686a0f3d

conntrack: export lsm context rather than internal secid via netlink · 1cc63249

由 Eric Paris 提交于 10月 13, 2010

The conntrack code can export the internal secid to userspace.  These are
dynamic, can change on lsm changes, and have no meaning in userspace.  We
should instead be sending lsm contexts to userspace instead.  This patch sends
the secctx (rather than secid) to userspace over the netlink socket.  We use a
new field CTA_SECCTX and stop using the the old CTA_SECMARK field since it did
not send particularly useful information.
Signed-off-by: NEric Paris <eparis@redhat.com>
Reviewed-by: NPaul Moore <paul.moore@hp.com>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NJames Morris <jmorris@namei.org>

1cc63249

security: secid_to_secctx returns len when data is NULL · d5630b9d

由 Eric Paris 提交于 10月 13, 2010

With the (long ago) interface change to have the secid_to_secctx functions
do the string allocation instead of having the caller do the allocation we
lost the ability to query the security server for the length of the
upcoming string.  The SECMARK code would like to allocate a netlink skb
with enough length to hold the string but it is just too unclean to do the
string allocation twice or to do the allocation the first time and hold
onto the string and slen.  This patch adds the ability to call
security_secid_to_secctx() with a NULL data pointer and it will just set
the slen pointer.
Signed-off-by: NEric Paris <eparis@redhat.com>
Reviewed-by: NPaul Moore <paul.moore@hp.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

d5630b9d

secmark: make secmark object handling generic · 2606fd1f

由 Eric Paris 提交于 10月 13, 2010

Right now secmark has lots of direct selinux calls.  Use all LSM calls and
remove all SELinux specific knowledge.  The only SELinux specific knowledge
we leave is the mode.  The only point is to make sure that other LSMs at
least test this generic code before they assume it works.  (They may also
have to make changes if they do not represent labels as strings)
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NPaul Moore <paul.moore@hp.com>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NJames Morris <jmorris@namei.org>

2606fd1f

security: remove unused parameter from security_task_setscheduler() · b0ae1981

由 KOSAKI Motohiro 提交于 10月 15, 2010

All security modules shouldn't change sched_param parameter of
security_task_setscheduler().  This is not only meaningless, but also
make a harmful result if caller pass a static variable.

This patch remove policy and sched_param parameter from
security_task_setscheduler() becuase none of security module is
using it.

Cc: James Morris <jmorris@namei.org>
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

b0ae1981

G
ceph: add CEPH_MDS_OP_SETDIRLAYOUT and associated ioctl. · 571dba52
由 Greg Farnum 提交于 9月 24, 2010
```
Signed-off-by: NSage Weil <sage@newdream.net>
```
571dba52

ceph: add pagelist_reserve, pagelist_truncate, pagelist_set_cursor · ac0b74d8

由 Greg Farnum 提交于 9月 17, 2010

These facilitate preallocation of pages so that we can encode into the pagelist
in an atomic context.
Signed-off-by: NGreg Farnum <gregf@hq.newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

ac0b74d8

ceph: factor out libceph from Ceph file system · 3d14c5d2

由 Yehuda Sadeh 提交于 4月 06, 2010

This factors out protocol and low-level storage parts of ceph into a
separate libceph module living in net/ceph and include/linux/ceph.  This
is mostly a matter of moving files around.  However, a few key pieces
of the interface change as well:

 - ceph_client becomes ceph_fs_client and ceph_client, where the latter
   captures the mon and osd clients, and the fs_client gets the mds client
   and file system specific pieces.
 - Mount option parsing and debugfs setup is correspondingly broken into
   two pieces.
 - The mon client gets a generic handler callback for otherwise unknown
   messages (mds map, in this case).
 - The basic supported/required feature bits can be expanded (and are by
   ceph_fs_client).

No functional change, aside from some subtle error handling cases that got
cleaned up in the refactoring process.
Signed-off-by: NSage Weil <sage@newdream.net>

3d14c5d2

percpu: Introduce a read-mostly percpu API · c957ef2c

由 Shaohua Li 提交于 10月 20, 2010

Add a new readmostly percpu section and API.  This can be used to
avoid dirtying data lines which are generally not written to, which is
especially important for data which may be accessed by processors
other than the one for which the percpu area belongs to.

[ hpa: moved it *after* the page-aligned section, for obvious
  reasons. ]
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
LKML-Reference: <1287544022.4571.7.camel@sli10-conroe.sh.intel.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

c957ef2c

19 10月, 2010 13 次提交

block: fix accounting bug on cross partition merges · 7681bfee

由 Yasuaki Ishimatsu 提交于 10月 19, 2010

/proc/diskstats would display a strange output as follows.

$ cat /proc/diskstats |grep sda
   8       0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
   8       1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
                                                ~~~~~~~~~~
   8       2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
   8       3 sda3 54 487 2188 92 0 0 0 0 0 88 92
   8       4 sda4 4 0 8 0 0 0 0 0 0 0 0
   8       5 sda5 81 2027 2130 138 0 0 0 0 0 87 137

Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is
merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.

The detailed root cause is as follows.

Assuming that there are two partition, sda1 and sda2.

1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight
   is 0 and sda2's one is 1.

        | hd_struct->in_flight
   ---------------------------
   sda1 |          0
   sda2 |          1
   ---------------------------

2. A bio belongs to sda1 is issued and is merged into the request mentioned on
   step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
   from sda2 region to sda1 region. However the two partition's
   hd_struct->in_flight are not changed.

        | hd_struct->in_flight
   ---------------------------
   sda1 |          0
   sda2 |          1
   ---------------------------

3. The request is finished and blk_account_io_done() is called. In this case,
   sda2's hd_struct->in_flight, not a sda1's one, is decremented.

        | hd_struct->in_flight
   ---------------------------
   sda1 |         -1
   sda2 |          1
   ---------------------------

The patch fixes the problem by caching the partition lookup
inside the request structure, hence making sure that the increment
and decrement will always happen on the same partition struct. This
also speeds up IO with accounting enabled, since it cuts down on
the number of lookups we have to do.

When reloading partition tables, quiesce IO to ensure that no
request references to the partition struct exists. When it is safe
to free the partition table, the IO for that device is restarted
again.
Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

7681bfee

sched: Add IRQ_TIME_ACCOUNTING, finer accounting of irq time · b52bfee4

由 Venkatesh Pallipadi 提交于 10月 04, 2010

s390/powerpc/ia64 have support for CONFIG_VIRT_CPU_ACCOUNTING which does
the fine granularity accounting of user, system, hardirq, softirq times.
Adding that option on archs like x86 will be challenging however, given the
state of TSC reliability on various platforms and also the overhead it will
add in syscall entry exit.

Instead, add a lighter variant that only does finer accounting of
hardirq and softirq times, providing precise irq times (instead of timer tick
based samples). This accounting is added with a new config option
CONFIG_IRQ_TIME_ACCOUNTING so that there won't be any overhead for users not
interested in paying the perf penalty.

This accounting is based on sched_clock, with the code being generic.
So, other archs may find it useful as well.

This patch just adds the core logic and does not enable this logic yet.
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1286237003-12406-5-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b52bfee4

sched: Add a PF flag for ksoftirqd identification · 6cdd5199

由 Venkatesh Pallipadi 提交于 10月 04, 2010

To account softirq time cleanly in scheduler, we need to identify whether
softirq is invoked in ksoftirqd context or softirq at hardirq tail context.
Add PF_KSOFTIRQD for that purpose.

As all PF flag bits are currently taken, create space by moving one of the
infrequently used bits (PF_THREAD_BOUND) down in task_struct to be along
with some other state fields.
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1286237003-12406-4-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6cdd5199

sched: Consolidate account_system_vtime extern declaration · e1e10a26

由 Venkatesh Pallipadi 提交于 10月 04, 2010

Just a minor cleanup patch that makes things easier to the following patches.
No functionality change in this patch.
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1286237003-12406-3-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e1e10a26

sched: Fix softirq time accounting · 75e1056f

由 Venkatesh Pallipadi 提交于 10月 04, 2010

Peter Zijlstra found a bug in the way softirq time is accounted in
VIRT_CPU_ACCOUNTING on this thread:

http://lkml.indiana.edu/hypermail//linux/kernel/1009.2/01366.html

The problem is, softirq processing uses local_bh_disable internally. There
is no way, later in the flow, to differentiate between whether softirq is
being processed or is it just that bh has been disabled. So, a hardirq when bh
is disabled results in time being wrongly accounted as softirq.

Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING
as well. As account_system_time() in normal tick based accouting also uses
softirq_count, which will be set even when not in softirq with bh disabled.

Peter also suggested solution of using 2*SOFTIRQ_OFFSET as irq count
for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq
processing. The patch below does that and adds API in_serving_softirq() which
returns whether we are currently processing softirq or not.

Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c
to in_serving_softirq.

Looks like many usages of in_softirq really want in_serving_softirq. Those
changes can be made individually on a case by case basis.
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1286237003-12406-2-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

75e1056f

jump_label: Add COND_STMT(), reducer wrappery · ebf31f50

由 Peter Zijlstra 提交于 10月 17, 2010

The use of the JUMP_LABEL() construct ends up creating endless silly
wrappers, create a higher level construct to reduce this clutter.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ebf31f50

perf: Optimize sw events · 7e54a5a0

由 Peter Zijlstra 提交于 10月 14, 2010

Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7e54a5a0

perf: Use jump_labels to optimize the scheduler hooks · 82cd6def

由 Peter Zijlstra 提交于 10月 14, 2010

Trades a call + conditional + ret for an unconditional jmp.
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101014203625.501657727@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

82cd6def

jump_label: Add atomic_t interface · 8b92538d

由 Peter Zijlstra 提交于 10月 14, 2010

Add an interface to allow usage of jump_labels with atomic counters.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101014203625.501657727@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8b92538d

jump_label: Use more consistent naming · 3b6e901f

由 Peter Zijlstra 提交于 10月 14, 2010

Now that there's still only a few users around, rename things to make
them more consistent.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101014203625.448565169@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3b6e901f

perf, hw_breakpoint: Fix crash in hw_breakpoint creation · d580ff86

由 Peter Zijlstra 提交于 10月 14, 2010

hw_breakpoint creation needs to account stuff per-task to ensure there
is always sufficient hardware resources to back these things due to
ptrace.

With the perf per pmu context changes the event initialization no
longer has access to the event context, for the simple reason that we
need to first find the pmu (result of initialization) before we can
find the context.

This makes hw_breakpoints unhappy, because it can no longer do per
task accounting, cure this by frobbing a task pointer in the event::hw
bits for now...
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20101014203625.391543667@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d580ff86

irq_work: Add generic hardirq context callbacks · e360adbe

由 Peter Zijlstra 提交于 10月 14, 2010

Provide a mechanism that allows running code in IRQ context. It is
most useful for NMI code that needs to interact with the rest of the
system -- like wakeup a task to drain buffers.

Perf currently has such a mechanism, so extract that and provide it as
a generic feature, independent of perf so that others may also
benefit.

The IRQ context callback is generated through self-IPIs where
possible, or on architectures like powerpc the decrementer (the
built-in timer facility) is set to generate an interrupt immediately.

Architectures that don't have anything like this get to do with a
callback from the timer tick. These architectures can call
irq_work_run() at the tail of any IRQ handlers that might enqueue such
work (like the perf IRQ handler) to avoid undue latencies in
processing the work.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NKyle McMartin <kyle@mcmartin.ca>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
[ various fixes ]
Signed-off-by: NHuang Ying <ying.huang@intel.com>
LKML-Reference: <1287036094.7768.291.camel@yhuang-dev>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e360adbe

lockdep: Add improved subclass caching · 62016250

由 Hitoshi Mitake 提交于 10月 05, 2010

Current lockdep_map only caches one class with subclass == 0,
and looks up hash table of classes when subclass != 0.

It seems that this has no problem because the case of
subclass != 0 is rare. But locks of struct rq are
acquired with subclass == 1 when task migration is executed.
Task migration is high frequent event, so I modified lockdep
to cache subclasses.

I measured the score of perf bench sched messaging.
This patch has slightly but certain (order of milli seconds
or 10 milli seconds) effect when lots of tasks are running.
I'll show the result in the tail of this description.

NR_LOCKDEP_CACHING_CLASSES specifies how many classes can be
cached in the instances of lockdep_map.
I discussed with Peter Zijlstra in LinuxCon Japan about
this approach and he taught me that caching every subclasses(8)
is cleary waste of memory. So number of cached classes
should be configurable.

=== Score comparison of benchmarks ===
# "min" means best score, and "max" means worst score

for i in `seq 1 10`; do ./perf bench -f simple sched messaging; done

before: min: 0.565000, max: 0.583000, avg: 0.572500
after:  min: 0.559000, max: 0.568000, avg: 0.563300

# with more processes
for i in `seq 1 10`; do ./perf bench -f simple sched messaging -g 40; done

before: min: 2.274000, max: 2.298000, avg: 2.286300
after:  min: 2.242000, max: 2.270000, avg: 2.259700
Signed-off-by: NHitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1286269311-28336-2-git-send-email-mitake@dcl.info.waseda.ac.jp>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

62016250

18 10月, 2010 1 次提交

asm-generic/io.h: allow people to override individual funcs · 35dbc0e0

由 Mike Frysinger 提交于 10月 18, 2010

For the Blackfin port, we can use much of the asm-generic/io.h header,
but we still need to declare some of our own versions of functions.
Like the __raw_read* and in/out "string" helpers.  So let people do
this easily for many of these funcs.
Signed-off-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

35dbc0e0

17 10月, 2010 2 次提交

PM: Introduce library for device-specific OPPs (v7) · e1f60b29

由 Nishanth Menon 提交于 10月 13, 2010

SoCs have a standard set of tuples consisting of frequency and
voltage pairs that the device will support per voltage domain. These
are called Operating Performance Points or OPPs. The actual
definitions of OPP varies over silicon versions. For a specific domain,
we can have a set of {frequency, voltage} pairs. As the kernel boots
and more information is available, a default set of these are activated
based on the precise nature of device. Further on operation, based on
conditions prevailing in the system (such as temperature), some OPP
availability may be temporarily controlled by the SoC frameworks.

To implement an OPP, some sort of power management support is necessary
hence this library depends on CONFIG_PM.

Contributions include:
Sanjeev Premi for the initial concept:
	http://patchwork.kernel.org/patch/50998/
Kevin Hilman for converting original design to device-based.
Kevin Hilman and Paul Walmsey for cleaning up many of the function
abstractions, improvements and data structure handling.
Romit Dasgupta for using enums instead of opp pointers.
Thara Gopinath, Eduardo Valentin and Vishwanath BS for fixes and
cleanups.
Linus Walleij for recommending this layer be made generic for usage
in other architectures beyond OMAP and ARM.
Mark Brown, Andrew Morton, Rafael J. Wysocki, Paul E. McKenney for
valuable improvements.

Discussions and comments from:
http://marc.info/?l=linux-omap&m=126033945313269&w=2
http://marc.info/?l=linux-omap&m=125482970102327&w=2
http://marc.info/?t=125809247500002&r=1&w=2
http://marc.info/?l=linux-omap&m=126025973426007&w=2
http://marc.info/?t=128152609200064&r=1&w=2
http://marc.info/?t=128468723000002&r=1&w=2
incorporated.

v1: http://marc.info/?t=128468723000002&r=1&w=2Signed-off-by: NNishanth Menon <nm@ti.com>
Signed-off-by: NKevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

e1f60b29

PM: Add sysfs attr for rechecking dev hash from PM trace · d33ac60b

由 James Hogan 提交于 10月 12, 2010

If the device which fails to resume is part of a loadable kernel module
it won't be checked at startup against the magic number stored in the
RTC.

Add a read-only sysfs attribute /sys/power/pm_trace_dev_match which
contains a list of newline separated devices (usually just the one)
which currently match the last magic number. This allows the device
which is failing to resume to be found after the modules are loaded
again.
Signed-off-by: NJames Hogan <james@albanarts.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

d33ac60b

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功