提交 · a240f76165e6255384d4bdb8139895fac7988799 · openeuler / raspberrypi-kernel

06 10月, 2011 1 次提交

perf, core: Introduce attrs to count in either host or guest mode · a240f761

由 Joerg Roedel 提交于 10月 05, 2011

The two new attributes exclude_guest and exclude_host can
bes used by user-space to tell the kernel to setup
performance counter to either only count while the CPU is in
guest or in host mode.

An additional check is also introduced to make sure
user-space does not try to exclude guest and host mode from
counting.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NGleb Natapov <gleb@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317816084-18026-2-git-send-email-gleb@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

a240f761

05 10月, 2011 1 次提交

PCI: Disable MPS configuration by default · 5f39e670

由 Jon Mason 提交于 10月 03, 2011

Add the ability to disable PCI-E MPS turning and using the BIOS
configured MPS defaults. Due to the number of issues recently
discovered on some x86 chipsets, make this the default behavior.

Also, add the option for peer to peer DMA MPS configuration. Peer to
peer DMA is outside the scope of this patch, but MPS configuration could
prevent it from working by having the MPS on one root port different
than the MPS on another. To work around this, simply make the system
wide MPS the smallest possible value (128B).
Signed-off-by: NJon Mason <mason@myri.com>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5f39e670

04 10月, 2011 1 次提交

perf: Fix counter of ftrace events · 92e51938

由 Andrew Vagin 提交于 9月 26, 2011

Each event adds some points to its counters. By default it adds 1,
and a number of points may be transmited in event's parameters.

E.g. sched:sched_stat_runtime adds how long process has been running.

But this functionality was broken by v2.6.31-rc5-392-gf413cdb8
and now the event's parameters doesn't affect on a number of points.

TP_perf_assign isn't defined, so __perf_count(c) isn't executed and
__count is always equal to 1.
Signed-off-by: NAndrew Vagin <avagin@openvz.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1317052535-1765247-2-git-send-email-avagin@openvz.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>

92e51938

30 9月, 2011 1 次提交

posix-cpu-timers: Cure SMP wobbles · d670ec13

由 Peter Zijlstra 提交于 9月 01, 2011

David reported:

  Attached below is a watered-down version of rt/tst-cpuclock2.c from
  GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
  similar.

  Run it several times, and you will see cases where the main thread
  will measure a process clock difference before and after the nanosleep
  which is smaller than the cpu-burner thread's individual thread clock
  difference.  This doesn't make any sense since the cpu-burner thread
  is part of the top-level process's thread group.

  I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
  64-bit binaries).

  For example:

  [davem@boricha build-x86_64-linux]$ ./test
  process: before(0.001221967) after(0.498624371) diff(497402404)
  thread:  before(0.000081692) after(0.498316431) diff(498234739)
  self:    before(0.001223521) after(0.001240219) diff(16698)
  [davem@boricha build-x86_64-linux]$ 

  The diff of 'process' should always be >= the diff of 'thread'.

  I make sure to wrap the 'thread' clock measurements the most tightly
  around the nanosleep() call, and that the 'process' clock measurements
  are the outer-most ones.

  ---
  #include <unistd.h>
  #include <stdio.h>
  #include <stdlib.h>
  #include <time.h>
  #include <fcntl.h>
  #include <string.h>
  #include <errno.h>
  #include <pthread.h>

  static pthread_barrier_t barrier;

  static void *chew_cpu(void *arg)
  {
	  pthread_barrier_wait(&barrier);
	  while (1)
		  __asm__ __volatile__("" : : : "memory");
	  return NULL;
  }

  int main(void)
  {
	  clockid_t process_clock, my_thread_clock, th_clock;
	  struct timespec process_before, process_after;
	  struct timespec me_before, me_after;
	  struct timespec th_before, th_after;
	  struct timespec sleeptime;
	  unsigned long diff;
	  pthread_t th;
	  int err;

	  err = clock_getcpuclockid(0, &process_clock);
	  if (err)
		  return 1;

	  err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
	  if (err)
		  return 1;

	  pthread_barrier_init(&barrier, NULL, 2);
	  err = pthread_create(&th, NULL, chew_cpu, NULL);
	  if (err)
		  return 1;

	  err = pthread_getcpuclockid(th, &th_clock);
	  if (err)
		  return 1;

	  pthread_barrier_wait(&barrier);

	  err = clock_gettime(process_clock, &process_before);
	  if (err)
		  return 1;

	  err = clock_gettime(my_thread_clock, &me_before);
	  if (err)
		  return 1;

	  err = clock_gettime(th_clock, &th_before);
	  if (err)
		  return 1;

	  sleeptime.tv_sec = 0;
	  sleeptime.tv_nsec = 500000000;
	  nanosleep(&sleeptime, NULL);

	  err = clock_gettime(th_clock, &th_after);
	  if (err)
		  return 1;

	  err = clock_gettime(my_thread_clock, &me_after);
	  if (err)
		  return 1;

	  err = clock_gettime(process_clock, &process_after);
	  if (err)
		  return 1;

	  diff = process_after.tv_nsec - process_before.tv_nsec;
	  printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 process_before.tv_sec, process_before.tv_nsec,
		 process_after.tv_sec, process_after.tv_nsec, diff);
	  diff = th_after.tv_nsec - th_before.tv_nsec;
	  printf("thread:  before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 th_before.tv_sec, th_before.tv_nsec,
		 th_after.tv_sec, th_after.tv_nsec, diff);
	  diff = me_after.tv_nsec - me_before.tv_nsec;
	  printf("self:    before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
		 me_before.tv_sec, me_before.tv_nsec,
		 me_after.tv_sec, me_after.tv_nsec, diff);

	  return 0;
  }

This is due to us using p->se.sum_exec_runtime in
thread_group_cputime() where we iterate the thread group and sum all
data. This does not take time since the last schedule operation (tick
or otherwise) into account. We can cure this by using
task_sched_runtime() at the cost of having to take locks.

This also means we can (and must) do away with
thread_group_sched_runtime() since the modified thread_group_cputime()
is now more accurate and would deadlock when called from
thread_group_sched_runtime().

Aside of that it makes the function safe on 32 bit systems. The old
code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
64bit value and could be changed on another cpu at the same time.
Reported-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twinsTested-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

d670ec13

29 9月, 2011 1 次提交

ptp: fix L2 event message recognition · f75159e9

由 Richard Cochran 提交于 9月 20, 2011

The IEEE 1588 standard defines two kinds of messages, event and general
messages. Event messages require time stamping, and general do not. When
using UDP transport, two separate ports are used for the two message
types.

The BPF designed to recognize event messages incorrectly classifies L2
general messages as event messages. This commit fixes the issue by
extending the filter to check the message type field for L2 PTP packets.
Event messages are be distinguished from general messages by testing
the "general" bit.
Signed-off-by: NRichard Cochran <richard.cochran@omicron.at>
Cc: <stable@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f75159e9

27 9月, 2011 2 次提交

vfs: remove LOOKUP_NO_AUTOMOUNT flag · b6c8069d

由 Linus Torvalds 提交于 9月 27, 2011

That flag no longer makes sense, since we don't look up automount points
as eagerly any more. Additionally, it turns out that the NO_AUTOMOUNT
handling was buggy to begin with: it would avoid automounting even for
cases where we really *needed* to do the automount handling, and could
return ENOENT for autofs entries that hadn't been instantiated yet.

With our new non-eager automount semantics, one discussion has been
about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the
newfstatat() and fstatat64() system calls), but it's probably not worth
it: you can always force at least directory automounting by simply
adding the final '/' to the filename, which works for *all* of the stat
family system calls, old and new.

So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a
result of our bad default behavior.
Acked-by: NIan Kent <raven@themaw.net>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b6c8069d

vfs pathname lookup: Add LOOKUP_AUTOMOUNT flag · d94c177b

由 Linus Torvalds 提交于 9月 26, 2011

Since we've now turned around and made LOOKUP_FOLLOW *not* force an
automount, we want to add the ability to force an automount event on
lookup even if we don't happen to have one of the other flags that force
it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..)

Most cases will never want to use this, since you'd normally want to
delay automounting as long as possible, which usually implies
LOOKUP_OPEN (when we open a file or directory, we really cannot avoid
the automount any more).

But Trond argued sufficiently forcefully that at a minimum bind mounting
a file and quotactl will want to force the automount lookup.  Some other
cases (like nfs_follow_remote_path()) could use it too, although
LOOKUP_DIRECTORY would work there as well.

This commit just adds the flag and logic, no users yet, though.  It also
doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and
was made irrelevant by the same change that made us not follow on
LOOKUP_FOLLOW.

Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d94c177b

20 9月, 2011 2 次提交

[S390] kvm: extension capability for new address space layout · b6cf8788

由 Christian Borntraeger 提交于 9月 20, 2011

598841ca ([S390] use gmap address
spaces for kvm guest images) changed kvm on s390 to use a separate
address space for kvm guests. We can now put KVM guests anywhere
in the user address mode with a size up to 8PB - as long as the
memory is 1MB-aligned. This change was done without KVM extension
capability bit.
The change was added after 3.0, but we still have a chance to add
a feature bit before 3.1 (keeping the releases in a sane state).
We use number 71 to avoid collisions with other pending kvm patches
as requested by Alexander Graf.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NAvi Kivity <avi@redhat.com>
Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>

b6cf8788

irq: Add declaration of irq_domain_simple_ops to irqdomain.h · 5bd078dd

由 Rob Herring 提交于 9月 14, 2011

irq_domain_simple_ops is exported, but is not declared in irqdomain.h,
so add it.
Signed-off-by: NRob Herring <rob.herring@calxeda.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: marc.zyngier@arm.com
Cc: thomas.abraham@linaro.org
Cc: jamie@jamieiles.com
Cc: b-cousson@ti.com
Cc: shawn.guo@linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: devicetree-discuss@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1316017900-19918-2-git-send-email-robherring2@gmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

5bd078dd

19 9月, 2011 1 次提交

tcp: fix build error if !CONFIG_SYN_COOKIES · e05c82d3

由 Eric Dumazet 提交于 9月 18, 2011

commit 946cedcc (tcp: Change possible SYN flooding messages)
added a build error if CONFIG_SYN_COOKIES=n
Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e05c82d3

17 9月, 2011 3 次提交

net: Handle different key sizes between address families in flow cache · aa1c366e

由 dpward 提交于 9月 05, 2011

With the conversion of struct flowi to a union of AF-specific structs, some
operations on the flow cache need to account for the exact size of the key.
Signed-off-by: NDavid Ward <david.ward@ll.mit.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa1c366e

net: Align AF-specific flowi structs to long · 728871bc

由 David Ward 提交于 9月 05, 2011

AF-specific flowi structs are now passed to flow_key_compare, which must
also be aligned to a long.
Signed-off-by: NDavid Ward <david.ward@ll.mit.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

728871bc

sctp: deal with multiple COOKIE_ECHO chunks · d5ccd496

由 Max Matveev 提交于 8月 29, 2011

Attempt to reduce the number of IP packets emitted in response to single
SCTP packet (2e3216cd) introduced a complication - if a packet contains
two COOKIE_ECHO chunks and nothing else then SCTP state machine corks the
socket while processing first COOKIE_ECHO and then loses the association
and forgets to uncork the socket. To deal with the issue add new SCTP
command which can be used to set association explictly. Use this new
command when processing second COOKIE_ECHO chunk to restore the context
for SCTP state machine.
Signed-off-by: NMax Matveev <makc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5ccd496

16 9月, 2011 2 次提交

net: copy userspace buffers on device forwarding · 48c83012

由 Michael S. Tsirkin 提交于 8月 31, 2011

dev_forward_skb loops an skb back into host networking
stack which might hang on the memory indefinitely.
In particular, this can happen in macvtap in bridged mode.
Copy the userspace fragments to avoid blocking the
sender in that case.

As this patch makes skb_copy_ubufs extern now,
I also added some documentation and made it clear
the SKBTX_DEV_ZEROCOPY flag automatically instead
of doing it in all callers. This can be made into a separate
patch if people feel it's worth it.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48c83012

tcp: Change possible SYN flooding messages · 946cedcc

由 Eric Dumazet 提交于 8月 30, 2011

"Possible SYN flooding on port xxxx " messages can fill logs on servers.

Change logic to log the message only once per listener, and add two new
SNMP counters to track :

TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client

TCPReqQFullDrop : number of times a SYN request was dropped because
syncookies were not enabled.

Based on a prior patch from Tom Herbert, and suggestions from David.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
CC: Tom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

946cedcc

15 9月, 2011 2 次提交

drivers/gpio/gpio-generic.c: fix build errors · 4f5b0480

由 Russell King 提交于 9月 14, 2011

Building a kernel with hotplug disabled results in a link failure:

  `bgpio_remove' referenced in section `___ksymtab_gpl+bgpio_remove' of drivers/built-in.o: defined in discarded section `.devexit.text' of drivers/built-in.o

This is because of bgpio_remove() is exported.  It is illegal to export
symbols which are discarded either at link time or as part of an
init/exit section.

Fix this by dropping the __devexit attributation from bgpio_remove().
Also drop the __devinit attributation from bgpio_init().
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4f5b0480

memcg: Revert "memcg: add memory.vmscan_stat" · 185efc0f

由 Johannes Weiner 提交于 9月 14, 2011

Revert the post-3.0 commit 82f9d486 ("memcg: add
memory.vmscan_stat").

The implementation of per-memcg reclaim statistics violates how memcg
hierarchies usually behave: hierarchically.

The reclaim statistics are accounted to child memcgs and the parent
hitting the limit, but not to hierarchy levels in between.  Usually,
hierarchical statistics are perfectly recursive, with each level
representing the sum of itself and all its children.

Since this exports statistics to userspace, this may lead to confusion
and problems with changing things after the release, so revert it now,
we can try again later.
Signed-off-by: NJohannes Weiner <jweiner@redhat.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ying Han <yinghan@google.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

185efc0f

09 9月, 2011 2 次提交

regulator: fix kernel-doc warning in consumer.h · bff747c5

由 Randy Dunlap 提交于 9月 08, 2011

Fix kernel-doc warning about internal/private data by marking it
as "private:" so that kernel-doc will ignore it.

Warning(include/linux/regulator/consumer.h:128): No description found for parameter 'ret'
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Acked-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bff747c5

wireless: fix kernel-doc warning in net/cfg80211.h · d2f15287

由 Randy Dunlap 提交于 9月 08, 2011

Fix kernel-doc warning in net/cfg80211.h:

  Warning(include/net/cfg80211.h:1884): No description found for parameter 'registered'
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d2f15287

06 9月, 2011 3 次提交

mfd: Fix value of WM8994_CONFIGURE_GPIO · 8efcc57d

由 Mark Brown 提交于 8月 03, 2011

This needs to be an out of band value for the register and on this device
registers are 16 bit so we must shift left one to the 17th bit.
Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
Cc: stable@kernel.org
Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>

8efcc57d

J
fs/9p: Use protocol-defined value for lock/getlock 'type' field. · 51b8b4fb
由 Jim Garlick 提交于 8月 21, 2011
```
Signed-off-by: NJim Garlick <garlick@llnl.gov>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
```
51b8b4fb

fs/9p: Add OS dependent open flags in 9p protocol · f88657ce

由 Aneesh Kumar K.V 提交于 8月 03, 2011

Some of the flags are OS/arch dependent we add a 9p
protocol value which maps to asm-generic/fcntl.h values in Linux
Based on the original patch from Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

f88657ce

31 8月, 2011 2 次提交

writeback: show raw dirtied_when in trace writeback_single_inode · c8ad6206

由 Wu Fengguang 提交于 8月 29, 2011

Save inode->dirtied_when in the raw trace output for reliable scripting,
and to also show in formatted output the relative age in seconds for
easy human reading.

CC: Jan Kara <jack@suse.cz>
Acked-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>

c8ad6206

net: relax PKTINFO non local ipv6 udp xmit check · ec0506db

由 Maciej Żenczykowski 提交于 8月 28, 2011

Allow transparent sockets to be less restrictive about
the source ip of ipv6 udp packets being sent.

Google-Bug-Id: 5018138
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
CC: "Erik Kline" <ek@google.com>
CC: "Lorenzo Colitti" <lorenzo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec0506db

29 8月, 2011 1 次提交

perf events: Fix slow and broken cgroup context switch code · a8d757ef

由 Stephane Eranian 提交于 8月 25, 2011

The current cgroup context switch code was incorrect leading
to bogus counts. Furthermore, as soon as there was an active
cgroup event on a CPU, the context switch cost on that CPU
would increase by a significant amount as demonstrated by a
simple ping/pong example:

 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10684.51 ctxsw/s

Now start a cgroup perf stat:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

$ ./pong
 Both processes pinned to CPU1, running for 10s
 6674.61 ctxsw/s

That's a 37% penalty.

Note that pong is not even in the monitored cgroup.

The results shown by perf stat are bogus:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

 Performance counter stats for 'sleep 100':

 CPU1 <not counted> cycles   test
 CPU1 16,984,189,138 cycles  #    0.000 GHz

The second 'cycles' event should report a count @ CPU clock
(here 2.4GHz) as it is counting across all cgroups.

The patch below fixes the bogus accounting and bypasses any
cgroup switches in case the outgoing and incoming tasks are
in the same cgroup.

With this patch the same test now yields:
 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10775.30 ctxsw/s

Start perf stat with cgroup:

 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

Run pong outside the cgroup:
 $ /pong
 Both processes pinned to CPU1, running for 10s
 10687.80 ctxsw/s

The penalty is now less than 2%.

And the results for perf stat are correct:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 <not counted> cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

Now perf stat reports the correct counts for
for the non cgroup event.

If we run pong inside the cgroup, then we also get the
correct counts:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 22,297,726,205 cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

      10.001457237 seconds time elapsed
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110825135803.GA4697@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>

a8d757ef

27 8月, 2011 1 次提交

All Arch: remove linkage for sys_nfsservctl system call · f5b94099

由 NeilBrown 提交于 8月 26, 2011

The nfsservctl system call is now gone, so we should remove all
linkage for it.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f5b94099

26 8月, 2011 5 次提交

backlight: add a callback 'notify_after' for backlight control · cc7993f6

由 Dilan Lee 提交于 8月 25, 2011

We need a callback to do some things after pwm_enable, pwm_disable
and pwm_config.
Signed-off-by: NDilan Lee <dilee@nvidia.com>
Reviewed-by: NRobert Morell <rmorell@nvidia.com>
Reviewed-by: NArun Murthy <arun.murthy@stericsson.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cc7993f6

rapidio: fix use of non-compatible registers · 284fb68d

由 Alexandre Bounine 提交于 8月 25, 2011

Replace/remove use of RIO v.1.2 registers/bits that are not
forward-compatible with newer versions of RapidIO specification.

RapidIO specification v.1.3 removed Write Port CSR, Doorbell CSR,
Mailbox CSR and Mailbox and Doorbell bits of the PEF CAR.

Use of removed (since RIO v.1.3) register bits affects users of
currently available 1.3 and 2.x compliant devices who may use not so
recent kernel versions.

Removing checks for unsupported bits makes corresponding routines
compatible with all versions of RapidIO specification.  Therefore,
backporting makes stable kernel versions compliant with RIO v.1.3 and
later as well.
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Chul Kim <chul.kim@idt.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

284fb68d

MAINTAINERS: Evgeniy has moved · a8018766

由 Evgeniy Polyakov 提交于 8月 25, 2011

Signed-off-by: NEvgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a8018766

lockdep: Add helper function for dir vs file i_mutex annotation · e096d0c7

由 Josh Boyer 提交于 8月 25, 2011

Purely in-memory filesystems do not use the inode hash as the dcache
tells us if an entry already exists.  As a result, they do not call
unlock_new_inode, and thus directory inodes do not get put into a
different lockdep class for i_sem.

We need the different lockdep classes, because the locking order for
i_mutex is different for directory inodes and regular inodes.  Directory
inodes can do "readdir()", which takes i_mutex *before* possibly taking
mm->mmap_sem (due to a page fault while copying the directory entry to
user space).

In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem
before accessing i_mutex.

The two cases can never happen for the same inode, so no real deadlock
can occur, but without the different lockdep classes, lockdep cannot
understand that.  As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this
can lead to false positives from lockdep like below:

    find/645 is trying to acquire lock:
     (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac

    but task is already holding lock:
     (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>]
    vfs_readdir+0x5b/0xb4

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}:
          [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
          [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361
          [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45
          [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110
          [<ffffffff81111557>] mmap_region+0x258/0x432
          [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306
          [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a
          [<ffffffff8100c858>] sys_mmap+0x22/0x24
          [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b

    -> #0 (&mm->mmap_sem){++++++}:
          [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7
          [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
          [<ffffffff81109541>] might_fault+0x89/0xac
          [<ffffffff81149cff>] filldir+0x6f/0xc7
          [<ffffffff811586ea>] dcache_readdir+0x67/0x205
          [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4
          [<ffffffff8114a073>] sys_getdents+0x7e/0xd1
          [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b

This patch moves the directory vs file lockdep annotation into a helper
function that can be called by in-memory filesystems and has hugetlbfs
call it.
Signed-off-by: NJosh Boyer <jwboyer@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e096d0c7

Add a personality to report 2.6.x version numbers · be27425d

由 Andi Kleen 提交于 8月 19, 2011

I ran into a couple of programs which broke with the new Linux 3.0
version. Some of those were binary only. I tried to use LD_PRELOAD to
work around it, but it was quite difficult and in one case impossible
because of a mix of 32bit and 64bit executables.

For example, all kind of management software from HP doesnt work, unless
we pretend to run a 2.6 kernel.

$ uname -a
Linux svivoipvnx001 3.0.0-08107-g97cd98f #1062 SMP Fri Aug 12 18:11:45 CEST 2011 i686 i686 i386 GNU/Linux

$ hpacucli ctrl all show

Error: No controllers detected.

$ rpm -qf /usr/sbin/hpacucli
hpacucli-8.75-12.0

Another notable case is that Python now reports "linux3" from
sys.platform(); which in turn can break things that were checking
sys.platform() == "linux2":

https://bugzilla.mozilla.org/show_bug.cgi?id=664564

It seems pretty clear to me though it's a bug in the apps that are using
'==' instead of .startswith(), but this allows us to unbreak broken
programs.

This patch adds a UNAME26 personality that makes the kernel report a
2.6.40+x version number instead. The x is the x in 3.x.

I know this is somewhat ugly, but I didn't find a better workaround, and
compatibility to existing programs is important.

Some programs also read /proc/sys/kernel/osrelease. This can be worked
around in user space with mount --bind (and a mount namespace)

To use:

wget ftp://ftp.kernel.org/pub/linux/kernel/people/ak/uname26/uname26.c
gcc -o uname26 uname26.c
./uname26 program
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be27425d

24 8月, 2011 2 次提交

block: simplify force plug flush code a little bit · 56ebdaf2

由 Shaohua Li 提交于 8月 24, 2011

Cleaning up the code a little bit. attempt_plug_merge() traverses the plug
list anyway, we can do the request counting there, so stack size is reduced
a little bit.
The motivation here is I suspect if we should count the requests for each
queue (task could handle multiple disks in the meantime), but my test doesn't
show it's worthy doing. If somebody proves we should do it, below change
will make that more easier.
Signed-off-by: NShaohua Li <shli@kernel.org>
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

56ebdaf2

TTY: pty, fix pty counting · 24d406a6

由 Jiri Slaby 提交于 8月 10, 2011

tty_operations->remove is normally called like:
queue_release_one_tty
 ->tty_shutdown
   ->tty_driver_remove_tty
     ->tty_operations->remove

However tty_shutdown() is called from queue_release_one_tty() only if
tty_operations->shutdown is NULL. But for pty, it is not.
pty_unix98_shutdown() is used there as ->shutdown.

So tty_operations->remove of pty (i.e. pty_unix98_remove()) is never
called. This results in invalid pty_count. I.e. what can be seen in
/proc/sys/kernel/pty/nr.

I see this was already reported at:
  https://lkml.org/lkml/2009/11/5/370
But it was not fixed since then.

This patch is kind of a hackish way. The problem lies in ->install. We
allocate there another tty (so-called tty->link). So ->install is
called once, but ->remove twice, for both tty and tty->link. The fix
here is to count both tty and tty->link and divide the count by 2 for
user.

And to have ->remove called, let's make tty_driver_remove_tty() global
and call that from pty_unix98_shutdown() (tty_operations->shutdown).

While at it, let's document that when ->shutdown is defined,
tty_shutdown() is not called.
Signed-off-by: NJiri Slaby <jslaby@suse.cz>
Cc: Alan Cox <alan@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable <stable@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

24d406a6

23 8月, 2011 5 次提交

block: separate priority boosting from REQ_META · 65299a3b

由 Christoph Hellwig 提交于 8月 23, 2011

Add a new REQ_PRIO to let requests preempt others in the cfq I/O schedule,
and lave REQ_META purely for marking requests as metadata in blktrace.

All existing callers of REQ_META except for XFS are updated to also
set REQ_PRIO for now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

65299a3b

block: remove READ_META and WRITE_META · 5dc06c5a

由 Christoph Hellwig 提交于 8月 23, 2011

Replace all occurnanced of the undocumented READ_META with READ | REQ_META
and remove the unused WRITE_META define.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

5dc06c5a

drivers:misc:ti-st: platform hooks for chip states · 0d7c5f25

由 Pavan Savoy 提交于 8月 10, 2011

Certain platform specific or Host-WiLink Interface specific actions would be
required to be taken when the chip is being enabled and after the chip is
disabled such as configuration of the mux modes for the GPIO of host connected
to the nshutdown of the chip or relinquishing UART after the chip is disabled.

Similar actions can also be taken when the chip is in deep sleep or when the
chip is awake. Performance enhancements such as configuring the host to run
faster when chip is awake and slower when chip is asleep can also be made
here.
Signed-off-by: NPavan Savoy <pavan_savoy@ti.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

0d7c5f25

target: Make standard INQUIRY return 'not connected' for tpg_virt_lun0 · 052605c6

由 Nicholas Bellinger 提交于 7月 26, 2011

This patch changes target_emulate_inquiry_std() to set the 'not connected'
(0x35) bit in standard INQUIRY response data when we are processing a
request to a virtual LUN=0 mapping from struct se_device *g_lun0_dev that
have been setup for us in transport_lookup_cmd_lun().

This addresses an issue where qla2xxx FC clients need to be able
to create demo-mode I_T FC Nexuses by default, but should not be
exposing the default set of TPG LUNs to all FC clients.  This includes
adding an new optional target_core_fabric_ops->tpg_check_demo_mode_login_only()
caller to allow demo_mode nexuses to skip the old default of bulding
a demo-mode MappedLUNs list via core_tpg_add_node_to_devs().

(roland: Add missing tpg_check_demo_mode_login_only check in core_dev_add_lun)
Reported-by: NRoland Dreier <roland@purestorage.com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: NNicholas Bellinger <nab@risingtidesystems.com>

052605c6

mac80211: fix suspend/resume races with unregister hw · ecb44335

由 Stanislaw Gruszka 提交于 8月 12, 2011

Do not call ->suspend, ->resume methods after we unregister wiphy. Also
delete sta_clanup timer after we finish wiphy unregister to avoid this:

WARNING: at lib/debugobjects.c:262 debug_print_object+0x85/0xa0()
Hardware name: 6369CTO
ODEBUG: free active (active state 0) object type: timer_list hint: sta_info_cleanup+0x0/0x180 [mac80211]
Modules linked in: aes_i586 aes_generic fuse bridge stp llc autofs4 sunrpc cpufreq_ondemand acpi_cpufreq mperf ext2 dm_mod uinput thinkpad_acpi hwmon sg arc4 rt2800usb rt2800lib crc_ccitt rt2x00usb rt2x00lib mac80211 cfg80211 i2c_i801 iTCO_wdt iTCO_vendor_support e1000e ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom yenta_socket ahci libahci pata_acpi ata_generic ata_piix i915 drm_kms_helper drm i2c_algo_bit video [last unloaded: microcode]
Pid: 5663, comm: pm-hibernate Not tainted 3.1.0-rc1-wl+ #19
Call Trace:
 [<c0454cfd>] warn_slowpath_common+0x6d/0xa0
 [<c05e05e5>] ? debug_print_object+0x85/0xa0
 [<c05e05e5>] ? debug_print_object+0x85/0xa0
 [<c0454dae>] warn_slowpath_fmt+0x2e/0x30
 [<c05e05e5>] debug_print_object+0x85/0xa0
 [<f8a808e0>] ? sta_info_alloc+0x1a0/0x1a0 [mac80211]
 [<c05e0bd2>] debug_check_no_obj_freed+0xe2/0x180
 [<c051175b>] kfree+0x8b/0x150
 [<f8a126ae>] cfg80211_dev_free+0x7e/0x90 [cfg80211]
 [<f8a13afd>] wiphy_dev_release+0xd/0x10 [cfg80211]
 [<c068d959>] device_release+0x19/0x80
 [<c05d06ba>] kobject_release+0x7a/0x1c0
 [<c07646a8>] ? rtnl_unlock+0x8/0x10
 [<f8a13adb>] ? wiphy_resume+0x6b/0x80 [cfg80211]
 [<c05d0640>] ? kobject_del+0x30/0x30
 [<c05d1a6d>] kref_put+0x2d/0x60
 [<c05d056d>] kobject_put+0x1d/0x50
 [<c08015f4>] ? mutex_lock+0x14/0x40
 [<c068d60f>] put_device+0xf/0x20
 [<c069716a>] dpm_resume+0xca/0x160
 [<c04912bd>] hibernation_snapshot+0xcd/0x260
 [<c04903df>] ? freeze_processes+0x3f/0x90
 [<c049151b>] hibernate+0xcb/0x1e0
 [<c048fdc0>] ? pm_async_store+0x40/0x40
 [<c048fe60>] state_store+0xa0/0xb0
 [<c048fdc0>] ? pm_async_store+0x40/0x40
 [<c05d0200>] kobj_attr_store+0x20/0x30
 [<c0575ea4>] sysfs_write_file+0x94/0xf0
 [<c051e26a>] vfs_write+0x9a/0x160
 [<c0575e10>] ? sysfs_open_file+0x200/0x200
 [<c051e3fd>] sys_write+0x3d/0x70
 [<c080959f>] sysenter_do_call+0x12/0x28

Cc: stable@kernel.org
Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>

ecb44335

19 8月, 2011 1 次提交

squeeze max-pause area and drop pass-good area · bb082295

由 Wu Fengguang 提交于 8月 16, 2011

Revert the pass-good area introduced in ffd1f609 ("writeback:
introduce max-pause and pass-good dirty limits") and make the max-pause
area smaller and safe.

This fixes ~30% performance regression in the ext3 data=writeback
fio_mmap_randwrite_64k/fio_mmap_randrw_64k test cases, where there are
12 JBOD disks, on each disk runs 8 concurrent tasks doing reads+writes.

Using deadline scheduler also has a regression, but not that big as CFQ,
so this suggests we have some write starvation.

The test logs show that

- the disks are sometimes under utilized

- global dirty pages sometimes rush high to the pass-good area for
  several hundred seconds, while in the mean time some bdi dirty pages
  drop to very low value (bdi_dirty << bdi_thresh).  Then suddenly the
  global dirty pages dropped under global dirty threshold and bdi_dirty
  rush very high (for example, 2 times higher than bdi_thresh). During
  which time balance_dirty_pages() is not called at all.

So the problems are

1) The random writes progress so slow that they break the assumption of
   the max-pause logic that "8 pages per 200ms is typically more than
   enough to curb heavy dirtiers".

2) The max-pause logic ignored task_bdi_thresh and thus opens the possibility
   for some bdi's to over dirty pages, leading to (bdi_dirty >> bdi_thresh)
   and then (bdi_thresh >> bdi_dirty) for others.

3) The higher max-pause/pass-good thresholds somehow leads to the bad
   swing of dirty pages.

The fix is to allow the task to slightly dirty over task_bdi_thresh, but
no way to exceed bdi_dirty and/or global dirty_thresh.

Tests show that it fixed the JBOD regression completely (both behavior
and performance), while still being able to cut down large pause times
in balance_dirty_pages() for single-disk cases.
Reported-by: NLi Shaohua <shaohua.li@intel.com>
Tested-by: NLi Shaohua <shaohua.li@intel.com>
Acked-by: NJan Kara <jack@suse.cz>
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>

bb082295

18 8月, 2011 1 次提交

mm: fix __page_to_pfn for a const struct page argument · aa462abe

由 Ian Campbell 提交于 8月 17, 2011

This allows the cast in lowmem_page_address (introduced as a warning
fixup to 33dd4e0e "mm: make some struct page's const") to be
removed.

Propagate const'ness to page_to_section() as well since it is required
by __page_to_pfn.
Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aa462abe