提交 · 18a073a3acd3a47fbb5e23333df7fad28d576345 · gsplhtlxg / clone-Linux

26 4月, 2011 11 次提交

由 Peter Zijlstra 提交于 4月 26, 2011

Currently the x86 backend incorrectly assumes that any BRANCH_INSN
with sample_period==1 is a BTS request. This is not true when we do
frequency driven profiling such as 'perf record -e branches'.

Solves this error:

  $ perf record -e branches ./array
  Error: sys_perf_event_open() syscall returned with 95 (Operation not supported).
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Reported-by: NIngo Molnar <mingo@elte.hu>
Cc: "Metzger, Markus T" <markus.t.metzger@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/n/tip-rd2y4ct71hjawzz6fpvsy9hg@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

18a073a3

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 · cd2e49e9

由 Linus Torvalds 提交于 4月 25, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
  eCryptfs: Flush dirty pages in setattr
  eCryptfs: Handle failed metadata read in lookup
  eCryptfs: Add reference counting to lower files
  eCryptfs: dput dentries returned from dget_parent
  eCryptfs: Remove extra d_delete in ecryptfs_rmdir

cd2e49e9

Merge branch 'for-torvalds' of... · 71e9e6a5

由 Linus Torvalds 提交于 4月 25, 2011

Merge branch 'for-torvalds' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson

* 'for-torvalds' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-stericsson:
  rtc: fix coh901331 startup crash
  mach-ux500: fix i2c0 device setup regression

71e9e6a5

SELINUX: Make selinux cache VFS RCU walks safe · 9ade0cf4

由 Eric Paris 提交于 4月 25, 2011

Now that the security modules can decide whether they support the
dcache RCU walk or not it's possible to make selinux a bit more
RCU friendly.  The SELinux AVC and security server access decision
code is RCU safe.  A specific piece of the LSM audit code may not
be RCU safe.

This patch makes the VFS RCU walk retry if it would hit the non RCU
safe chunk of code.  It will normally just work under RCU.  This is
done simply by passing the VFS RCU state as a flag down into the
avc_audit() code and returning ECHILD there if it would have an issue.
Based-on-patch-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9ade0cf4

add hlist_bl_lock/unlock helpers · 1879fd6a

由 Christoph Hellwig 提交于 4月 25, 2011

Now that the whole dcache_hash_bucket crap is gone, go all the way and
also remove the weird locking layering violations for locking the hash
buckets.  Add hlist_bl_lock/unlock helpers to move the locking into the
list abstraction instead of requiring each caller to open code it.
After all allowing for the bit locks is the whole point of these helpers
over the plain hlist variant.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1879fd6a

bit_spinlock: don't play preemption games inside the busy loop · 3dd2ee48

由 Linus Torvalds 提交于 4月 25, 2011

When we are waiting for the bit-lock to be released, and are looping
over the 'cpu_relax()' should not be doing anything else - otherwise we
miss the point of trying to do the whole 'cpu_relax()'.

Do the preemption enable/disable around the loop, rather than inside of
it.

Noticed when I was looking at the code generation for the dcache
__d_drop usage, and the code just looked very odd.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3dd2ee48

eCryptfs: Flush dirty pages in setattr · 5be79de2

由 Tyler Hicks 提交于 4月 22, 2011

After 57db4e8d changed eCryptfs to
write-back caching, eCryptfs page writeback updates the lower inode
times due to the use of vfs_write() on the lower file.

To preserve inode metadata changes, such as 'cp -p' does with
utimensat(), we need to flush all dirty pages early in
ecryptfs_setattr() so that the user-updated lower inode metadata isn't
clobbered later in writeback.

https://bugzilla.kernel.org/show_bug.cgi?id=33372Reported-by: NRocko <rockorequin@hotmail.com>
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>

5be79de2

eCryptfs: Handle failed metadata read in lookup · 3aeb86ea

由 Tyler Hicks 提交于 3月 15, 2011

When failing to read the lower file's crypto metadata during a lookup,
eCryptfs must continue on without throwing an error. For example, there
may be a plaintext file in the lower mount point that the user wants to
delete through the eCryptfs mount.

If an error is encountered while reading the metadata in lookup(), the
eCryptfs inode's size could be incorrect. We must be sure to reread the
plaintext inode size from the metadata when performing an open() or
setattr(). The metadata is already being read in those paths, so this
adds minimal performance overhead.

This patch introduces a flag which will track whether or not the
plaintext inode size has been read so that an incorrect i_size can be
fixed in the open() or setattr() paths.

https://bugs.launchpad.net/bugs/509180

Cc: <stable@kernel.org>
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>

3aeb86ea

eCryptfs: Add reference counting to lower files · 332ab16f

由 Tyler Hicks 提交于 4月 14, 2011

For any given lower inode, eCryptfs keeps only one lower file open and
multiplexes all eCryptfs file operations through that lower file. The
lower file was considered "persistent" and stayed open from the first
lookup through the lifetime of the inode.

This patch keeps the notion of a single, per-inode lower file, but adds
reference counting around the lower file so that it is closed when not
currently in use. If the reference count is at 0 when an operation (such
as open, create, etc.) needs to use the lower file, a new lower file is
opened. Since the file is no longer persistent, all references to the
term persistent file are changed to lower file.

Locking is added around the sections of code that opens the lower file
and assign the pointer in the inode info, as well as the code the fputs
the lower file when all eCryptfs users are done with it.

This patch is needed to fix issues, when mounted on top of the NFSv3
client, where the lower file is left silly renamed until the eCryptfs
inode is destroyed.
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>

332ab16f

eCryptfs: dput dentries returned from dget_parent · dd55c898

由 Tyler Hicks 提交于 4月 12, 2011

Call dput on the dentries previously returned by dget_parent() in
ecryptfs_rename(). This is needed for supported eCryptfs mounts on top
of the NFSv3 client.
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>

dd55c898

eCryptfs: Remove extra d_delete in ecryptfs_rmdir · 35ffa948

由 Tyler Hicks 提交于 4月 12, 2011

vfs_rmdir() already calls d_delete() on the lower dentry. That was being
duplicated in ecryptfs_rmdir() and caused a NULL pointer dereference
when NFSv3 was the lower filesystem.
Signed-off-by: NTyler Hicks <tyhicks@linux.vnet.ibm.com>

35ffa948

24 4月, 2011 17 次提交

L
Merge branch 'dcache-cleanup' · 5dd12af0
由 Linus Torvalds 提交于 4月 24, 2011
```
* dcache-cleanup:
  vfs: get rid of insane dentry hashing rules
```
5dd12af0

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev · 8f754468

由 Linus Torvalds 提交于 4月 24, 2011

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  libata: ahci_start_engine compliant to AHCI spec
  ata: pata_at91.c bugfix for initial_timing initialisation
  ata: pata_at91.c bugfix for high master clock
  ahci: AHCI-mode SATA patch for Intel Panther Point DeviceIDs
  ata_piix: IDE-mode SATA patch for Intel Panther Point DeviceIDs
  libata: Pioneer DVR-216D can't do SETXFER
  ahci: don't enable port irq before handler is registered
  libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65
  libata: Kill unused ATA_DFLAG_{H|D}IPM flags
  ahci: EM supported message type sysfs attribute

8f754468

Merge branch 'for-linus' of git://git.infradead.org/ubifs-2.6 · 1f91f48b

由 Linus Torvalds 提交于 4月 24, 2011

* 'for-linus' of git://git.infradead.org/ubifs-2.6:
  UBIFS: fix master node recovery
  UBIFS: fix false assertion warning in case of I/O failures
  UBIFS: fix false space checking failure

1f91f48b

libata: ahci_start_engine compliant to AHCI spec · 270dac35

由 Jian Peng 提交于 4月 22, 2011

At the end of section 10.1 of AHCI spec (rev 1.3), it states

Software shall not set PxCMD.ST to 1 until it is determined that
a functoinal device is present on the port as determined by
PxTFD.STS.BSY=0, PxTFD.STS.DRQ=0 and PxSSTS.DET=3h

Even though most AHCI host controller works without this check,
specific controller will fail under this condition.
Signed-off-by: NJian Peng <jipeng2005@gmail.com>
Signed-off-by: NJeff Garzik <jgarzik@pobox.com>

270dac35

ata: pata_at91.c bugfix for initial_timing initialisation · 792d37af

由 Igor Plyatov 提交于 3月 28, 2011

The "struct ata_timing" must contain 10 members, but ".dmack_hold" member was
forgotten for "initial_timing" initialisation. This patch fixes such a problem.
Signed-off-by: NIgor Plyatov <plyatov@gmail.com>
Signed-off-by: NJeff Garzik <jgarzik@pobox.com>

792d37af

ata: pata_at91.c bugfix for high master clock · 9719b8f5

由 Igor Plyatov 提交于 3月 28, 2011

The AT91SAM9 microcontrollers with master clock higher then 105 MHz
and PIO0, have overflow of the NCS_RD_PULSE value in the MSB. This
lead to "NCS_RD_PULSE" pulse longer then "NRD_CYCLE" pulse and driver
does not detect ATA device.
Signed-off-by: NIgor Plyatov <plyatov@gmail.com>
Signed-off-by: NJeff Garzik <jgarzik@pobox.com>

9719b8f5

ahci: AHCI-mode SATA patch for Intel Panther Point DeviceIDs · 181e3cea

由 Seth Heasley 提交于 4月 20, 2011

The previously submitted patch was word-wrapped.

This patch adds the AHCI-mode SATA DeviceIDs for the Intel Panther Point PCH.
Signed-off-by: NSeth Heasley <seth.heasley@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@pobox.com>

181e3cea

ata_piix: IDE-mode SATA patch for Intel Panther Point DeviceIDs · 4a836c70

由 Seth Heasley 提交于 4月 20, 2011

The previously submitted patch was word-wrapped.

This patch adds the IDE-mode SATA DeviceIDs for the Intel Panther
Point PCH.
Signed-off-by: NSeth Heasley <seth.heasley@intel.com>
Signed-off-by: NJeff Garzik <jgarzik@pobox.com>

4a836c70

libata: Pioneer DVR-216D can't do SETXFER · d69cf28c

由 Jeff Mahoney 提交于 4月 19, 2011

 Commit 4a5610a0 fixed an issue with
 the Pioneer DVR-212D not handling SETXFER correctly. An openSUSE user
 reported a similar issue with his DVR-216D that the NOSETXFER horkage
 worked around for him as well.

 This patch adds the DVR-216D (1.08) to the horkage list for NOSETXFER.

 The issue was reported at:
 https://bugzilla.novell.com/show_bug.cgi?id=679143Reported-by: NVolodymyr Kyrychenko <vladimir.kirichenko@gmail.com>
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NJeff Garzik <jgarzik@pobox.com>

d69cf28c

ahci: don't enable port irq before handler is registered · 7b3a24c5

由 Maxime Bizon 提交于 3月 16, 2011

The ahci_pmp_attach() & ahci_pmp_detach() unmask port irqs, but they
are also called during port initialization, before ahci host irq
handler is registered. On ce4100 platform, this sometimes triggers
"irq 4: nobody cared" message when loading driver.

Fixed this by not touching the register if the port is in frozen
state, and mark all uninitialized port as frozen.
Signed-off-by: NMaxime Bizon <mbizon@freebox.fr>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: NJeff Garzik <jgarzik@pobox.com>

7b3a24c5

libata: Implement ATA_FLAG_NO_DIPM and apply it to mcp65 · ae01b249

由 Tejun Heo 提交于 3月 16, 2011

NVIDIA mcp65 familiy of controllers cause command timeouts when DIPM
is used.  Implement ATA_FLAG_NO_DIPM and apply it.

This problem was reported by Stefan Bader in the following thread.

 http://thread.gmane.org/gmane.linux.ide/48841

stable: applicable to 2.6.37 and 38.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NStefan Bader <stefan.bader@canonical.com>
Cc: stable@kernel.org
Signed-off-by: NJeff Garzik <jgarzik@pobox.com>

ae01b249

libata: Kill unused ATA_DFLAG_{H|D}IPM flags · 3f7ac1d6

由 Tejun Heo 提交于 3月 16, 2011

ATA_DFLAG_{H|D}IPM flags are no longer used.  Kill them.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJeff Garzik <jgarzik@pobox.com>

3f7ac1d6

ahci: EM supported message type sysfs attribute · 6e5fe5b1

由 Hannes Reinecke 提交于 3月 04, 2011

This patch adds an sysfs attribute 'em_message_supported' to the
ahci host device which prints out the supported enclosure management
message types.
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NJeff Garzik <jgarzik@pobox.com>

6e5fe5b1

kconfig: Avoid buffer underrun in choice input · 3ba41621

由 Ben Hutchings 提交于 4月 23, 2011

Commit 40aee729 ('kconfig: fix default value for choice input')
fixed some cases where kconfig would select the wrong option from a
choice with a single valid option and thus enter an infinite loop.

However, this broke the test for user input of the form 'N?', because
when kconfig selects the single valid option the input is zero-length
and the test will read the byte before the input buffer.  If this
happens to contain '?' (as it will in a mips build on Debian unstable
today) then kconfig again enters an infinite loop.
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Cc: stable@kernel.org [2.6.17+]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ba41621

vfs: get rid of insane dentry hashing rules · dea3667b

由 Linus Torvalds 提交于 4月 24, 2011

The dentry hashing rules have been really quite complicated for a long
while, in odd ways. That made functions like __d_drop() very fragile
and non-obvious.

In particular, whether a dentry was hashed or not was indicated with an
explicit DCACHE_UNHASHED bit. That's despite the fact that the hash
abstraction that the dentries use actually have a 'is this entry hashed
or not' model (which is a simple test of the 'pprev' pointer).

The reason that was done is because we used the normal 'is this entry
unhashed' model to mark whether the dentry had _ever_ been hashed in the
dentry hash tables, and that logic goes back many years (commit
b3423415: "dcache: avoid RCU for never-hashed dentries").

That, in turn, meant that __d_drop had totally different unhashing logic
for the dentry hash table case and for the anonymous dcache case,
because in order to use the "is this dentry hashed" logic as a flag for
whether it had ever been on the RCU hash table, we had to unhash such a
dentry differently so that we'd never think that it wasn't 'unhashed'
and wouldn't be free'd correctly.

That's just insane. It made the logic really hard to follow, when there
were two different kinds of "unhashed" states, and one of them (the one
that used "list_bl_unhashed()") really had nothing at all to do with
being unhashed per se, but with a very subtle lifetime rule instead.

So turn all of it around, and make it logical.

Instead of having a DENTRY_UNHASHED bit in d_flags to indicate whether
the dentry is on the hash chains or not, use the hash chain unhashed
logic for that. Suddenly "d_unhashed()" just uses "list_bl_unhashed()",
and everything makes sense.

And for the lifetime rule, just use an explicit DENTRY_RCUACCEES bit.
If we ever insert the dentry into the dentry hash table so that it is
visible to RCU lookup, we mark it DENTRY_RCUACCESS to show that it now
needs the RCU lifetime rules. Now suddently that test at dentry free
time makes sense too.

And because unhashing now is sane and doesn't depend on where the dentry
got unhashed from (because the dentry hash chain details doesn't have
some subtle side effects), we can re-unify the __d_drop() logic and use
common code for the unhashing.

Also fix one more open-coded hash chain bit_spin_lock() that I missed in
the previous chain locking cleanup commit.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dea3667b

Merge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 · 686c4cbb

由 Linus Torvalds 提交于 4月 23, 2011

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: Add missing syscore_suspend() and syscore_resume() calls
  PM: Fix error code paths executed after failing syscore_suspend()

686c4cbb

vfs: get rid of 'struct dcache_hash_bucket' abstraction · b07ad996

由 Linus Torvalds 提交于 4月 23, 2011

It's a useless abstraction for 'hlist_bl_head', and it doesn't actually
help anything - quite the reverse.  All the users end up having to know
about the hlist_bl_head details anyway, using 'struct hlist_bl_node *'
etc. So it just makes the code look confusing.

And the cost of it is extra '&b->head' syntactic noise, but more
importantly it spuriously makes the hash table dentry list look
different from the per-superblock DCACHE_DISCONNECTED dentry list.

As a result, the code ended up using ad-hoc locking for one case and
special helper functions for what is really another totally identical
case in the very same function.

Make it all look and work the same.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b07ad996

23 4月, 2011 5 次提交

Merge branch 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 · 0f1d9f78

由 Linus Torvalds 提交于 4月 22, 2011

* 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6:
  tty/n_gsm: fix bug in CRC calculation for gsm1 mode
  serial/imx: read cts state only after acking cts change irq
  parport_pc.c: correctly release the requested region for the IT887x

0f1d9f78

SECURITY: Move exec_permission RCU checks into security modules · 8c9e80ed

由 Andi Kleen 提交于 4月 21, 2011

Right now all RCU walks fall back to reference walk when CONFIG_SECURITY
is enabled, even though just the standard capability module is active.
This is because security_inode_exec_permission unconditionally fails
RCU walks.

Move this decision to the low level security module. This requires
passing the RCU flags down the security hook. This way at least
the capability module and a few easy cases in selinux/smack work
with RCU walks with CONFIG_SECURITY=y
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8c9e80ed

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6 · 8d082f8f

由 Linus Torvalds 提交于 4月 22, 2011

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
  ALSA: hda - Fix unused warnings when !SND_HDA_NEEDS_RESUME
  ALSA: hda - Add a fix-up for Acer dmic with ALC271x codec
  ASoC: add a module alias to the FSI driver
  ALSA: emu10k1 - Fix "Music" controls to "Synth" controls in documents
  ARM: s3c2440: gta02; Register dfbmcs320 device for BT audio interface
  ASoC: codecs: JZ4740: Fix OOPS
  ASoC: Fix output PGA enabling in wm_hubs CODECs
  ASoC: sn95031: decorate function with __devexit_p()
  ASoC: SAMSUNG: Fix the inverted clocks handling for pcm driver
  ASoC: sst_platform: Fix lock acquring
  ASoC: fsi: driver safely remove for against irq
  ASoC: fsi: modify vague PM control on probe
  ASoC: fsi: take care in failing case of dai register
  MAINTAINERS: Update Samsung ASoC maintainer's id
  ASoC: WM8903: HP and Line out PGA/mixer DAPM fixes
  ASoC: Set left channel volume update bits for WM8994
  ASoC: fix config error path
  ASoC: check channel mismatch between cpu_dai and codec_dai
  ASoC: Tegra: Suspend/resume support

8d082f8f

Merge branch 'perf-fixes-for-linus' of... · 258ba6a5

由 Linus Torvalds 提交于 4月 22, 2011

Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Update/fix Intel Nehalem cache events
  perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow
  x86, perf event: Turn off unstructured raw event access to offcore registers
  perf: Support Xeon E7's via the Westmere PMU driver

258ba6a5

Merge branch 'irq-fixes-for-linus' of... · d6d61c97

由 Linus Torvalds 提交于 4月 22, 2011

Merge branch 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  xtensa: Fixup irq conversion fallout and nmi_count

d6d61c97

22 4月, 2011 7 次提交

perf, x86: Update/fix Intel Nehalem cache events · f4929bd3

由 Peter Zijlstra 提交于 4月 22, 2011

Change the Nehalem cache events to use retired memory instruction counters
(similar to Westmere), this greatly improves the provided stats.

Using:

main ()
{
        int i;

        for (i = 0; i < 1000000000; i++) {
                asm("mov (%%rsp), %%rbx;"
                    "mov %%rbx, (%%rsp);" : : : "rbx");
        }
}

We find:

 $ perf stat --repeat 10 -e instructions:u -e l1-dcache-loads:u -e l1-dcache-stores:u ./loop_1b_loads+stores
  Performance counter stats for './loop_1b_loads+stores' (10 runs):
      4,000,081,056 instructions:u           #      0.000 IPC ( +-   0.000% )
      4,999,502,846 l1-dcache-loads:u          ( +-   0.008% )
      1,000,034,832 l1-dcache-stores:u         ( +-   0.000% )
         1.565184942  seconds time elapsed   ( +-   0.005% )

The 5b is surprising - we'd expect 1b:

 $ perf stat --repeat 10 -e instructions:u -e r10b:u -e l1-dcache-stores:u ./loop_1b_loads+stores
  Performance counter stats for './loop_1b_loads+stores' (10 runs):
      4,000,081,054 instructions:u           #      0.000 IPC ( +-   0.000% )
      1,000,021,961 r10b:u                     ( +-   0.000% )
      1,000,030,951 l1-dcache-stores:u         ( +-   0.000% )
         1.565055422  seconds time elapsed   ( +-   0.003% )

Which this patch thus fixes.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Link: http://lkml.kernel.org/n/tip-q9rtru7b7840tws75xzboapv@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

f4929bd3

perf, x86: P4 PMU - Don't forget to clear cpuc->active_mask on overflow · 1ea5a6af

由 Cyrill Gorcunov 提交于 4月 21, 2011

It's not enough to simply disable event on overflow the
cpuc->active_mask should be cleared as well otherwise counter
may stall in "active" even in real being already disabled (which
potentially may lead to the situation that user may not use this
counter further).

Don pointed out that:

 " I also noticed this patch fixed some unknown NMIs
   on a P4 when I stressed the box".
Tested-by: NLin Ming <ming.m.lin@intel.com>
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NDon Zickus <dzickus@redhat.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Link: http://lkml.kernel.org/r/1303398203-2918-3-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

1ea5a6af

x86, perf event: Turn off unstructured raw event access to offcore registers · b52c55c6

由 Ingo Molnar 提交于 4月 22, 2011

Andi Kleen pointed out that the Intel offcore support patches were merged
without user-space tool support to the functionality:

 |
 | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
 | user space bits were not. This made it impossible to set the extra mask
 | and actually do the OFFCORE profiling
 |

Andi submitted a preliminary patch for user-space support, as an
extension to perf's raw event syntax:

 |
 | Some raw events -- like the Intel OFFCORE events -- support additional
 | parameters. These can be appended after a ':'.
 |
 | For example on a multi socket Intel Nehalem:
 |
 |    perf stat -e r1b7:20ff -a sleep 1
 |
 | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
 | that measures any access to DRAM on another socket.
 |

But this kind of usability is absolutely unacceptable - users should not
be expected to type in magic, CPU and model specific incantations to get
access to useful hardware functionality.

The proper solution is to expose useful offcore functionality via
generalized events - that way users do not have to care which specific
CPU model they are using, they can use the conceptual event and not some
model specific quirky hexa number.

We already have such generalization in place for CPU cache events,
and it's all very extensible.

"Offcore" events measure general DRAM access patters along various
parameters. They are particularly useful in NUMA systems.

We want to support them via generalized DRAM events: either as the
fourth level of cache (after the last-level cache), or as a separate
generalization category.

That way user-space support would be very obvious, memory access
profiling could be done via self-explanatory commands like:

  perf record -e dram ./myapp
  perf record -e dram-remote ./myapp

... to measure DRAM accesses or more expensive cross-node NUMA DRAM
accesses.

These generalized events would work on all CPUs and architectures that
have comparable PMU features.

( Note, these are just examples: actual implementation could have more
  sophistication and more parameter - as long as they center around
  similarly simple usecases. )

Now we do not want to revert *all* of the current offcore bits, as they
are still somewhat useful for generic last-level-cache events, implemented
in this commit:

  e994d7d2: perf: Fix LLC-* events on Intel Nehalem/Westmere

But we definitely do not yet want to expose the unstructured raw events
to user-space, until better generalization and usability is implemented
for these hardware event features.

( Note: after generalization has been implemented raw offcore events can be
  supported as well: there can always be an odd event that is marginally
  useful but not useful enough to generalize. DRAM profiling is definitely
  *not* such a category so generalization must be done first. )

Furthermore, PERF_TYPE_RAW access to these registers was not intended
to go upstream without proper support - it was a side-effect of the above
e994d7d2 commit, not mentioned in the changelog.

As v2.6.39 is nearing release we go for the simplest approach: disable
the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
kernel and becomes an ABI.

Once proper structure is implemented for these hardware events and users
are offered usable solutions we can revisit this issue.
Reported-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

b52c55c6

perf: Support Xeon E7's via the Westmere PMU driver · b2508e82

由 Andi Kleen 提交于 4月 21, 2011

There's a new model number public, 47, for Xeon E7 (aka Westmere EX).
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: a.p.zijlstra@chello.nl
Link: http://lkml.kernel.org/r/1303429715-10202-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

b2508e82

Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block · 91e8549b

由 Linus Torvalds 提交于 4月 21, 2011

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd
  block: don't propagate unlisted DISK_EVENTs to userland
  elevator: check for ELEVATOR_INSERT_SORT_MERGE in !elvpriv case too

91e8549b

ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd and ide-cd · 7eec77a1

由 Tejun Heo 提交于 4月 21, 2011

check_events() implementations in both ide-gd and ide-cd are
inadequate for in-kernel event polling.  Both generate media change
events continuously when certain conditions are met causing infinite
event loop between the driver and userland event handler.

As disk event now supports suppression of unlisted events, simply
de-listing DISK_EVENT_MEDIA_CHANGE from disk->events resolves the
problem.  Internal handling around media revalidation will behave the
same while userland will fall back to userland event polling after
detecting the device doesn't support disk events.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NJens Axboe <jaxboe@fusionio.com>
Acked-by: N"David S. Miller" <davem@davemloft.net>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

7eec77a1

block: don't propagate unlisted DISK_EVENTs to userland · 7c88a168

由 Tejun Heo 提交于 4月 21, 2011

DISK_EVENT_MEDIA_CHANGE is used for both userland visible event and
internal event for revalidation of removeable devices.  Some legacy
drivers don't implement proper event detection and continuously
generate events under certain circumstances.  For example, ide-cd
generates media changed continuously if there's no media in the drive,
which can lead to infinite loop of events jumping back and forth
between the driver and userland event handler.

This patch updates disk event infrastructure such that it never
propagates events not listed in disk->events to userland.  Those
events are processed the same for internal purposes but uevent
generation is suppressed.

This also ensures that userland only gets events which are advertised
in the @events sysfs node lowering risk of confusion.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

7c88a168