提交 · cf9efce0ce3136fa076f53e53154e98455229514 · openeuler / raspberrypi-kernel

02 9月, 2010 6 次提交

powerpc: Account time using timebase rather than PURR · cf9efce0

由 Paul Mackerras 提交于 8月 26, 2010

Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the
PURR register for measuring the user and system time used by
processes, as well as other related times such as hardirq and
softirq times.  This turns out to be quite confusing for users
because it means that a program will often be measured as taking
less time when run on a multi-threaded processor (SMT2 or SMT4 mode)
than it does when run on a single-threaded processor (ST mode), even
though the program takes longer to finish.  The discrepancy is
accounted for as stolen time, which is also confusing, particularly
when there are no other partitions running.

This changes the accounting to use the timebase instead, meaning that
the reported user and system times are the actual number of real-time
seconds that the program was executing on the processor thread,
regardless of which SMT mode the processor is in.  Thus a program will
generally show greater user and system times when run on a
multi-threaded processor than on a single-threaded processor.

On pSeries systems on POWER5 or later processors, we measure the
stolen time (time when this partition wasn't running) using the
hypervisor dispatch trace log.  We check for new entries in the
log on every entry from user mode and on every transition from
kernel process context to soft or hard IRQ context (i.e. when
account_system_vtime() gets called).  So that we can correctly
distinguish time stolen from user time and time stolen from system
time, without having to check the log on every exit to user mode,
we store separate timestamps for exit to user mode and entry from
user mode.

On systems that have a SPURR (POWER6 and POWER7), we read the SPURR
in account_system_vtime() (as before), and then apportion the SPURR
ticks since the last time we read it between scaled user time and
scaled system time according to the relative proportions of user
time and system time over the same interval.  This avoids having to
read the SPURR on every kernel entry and exit.  On systems that have
PURR but not SPURR (i.e., POWER5), we do the same using the PURR
rather than the SPURR.

This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl
for now since it conflicts with the use of the dispatch trace log
by the time accounting code.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

cf9efce0

powerpc: Dynamically allocate most lppaca structs · 93c22703

由 Paul Mackerras 提交于 8月 12, 2010

This arranges for the lppaca structs for most cpus to be dynamically
allocated in the same manner as the paca structs.  If we don't include
support for legacy iSeries, only the first lppaca is statically
allocated; the rest are dynamically allocated.  If we include legacy
iSeries support, then we statically allocate the first 64 lppaca
structs, since the iSeries hypervisor requires that the lppaca
structs be present in the data section of the kernel image, but
legacy iSeries supports at most 64 cpus.

With CONFIG_NR_CPUS, the kernel image size for a typical pSeries config
went from:

   text    data     bss     dec     hex filename
9524478 4734564 8469944 22728986        15ad11a ../test-1024/vmlinux

to:

   text    data     bss     dec     hex filename
9524482 3751508 8469944 21745934        14bd10e ../test-1024/vmlinux

a reduction of 983052 bytes overall.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

93c22703

powerpc: Abstract indexing of lppaca structs · 8154c5d2

由 Paul Mackerras 提交于 8月 12, 2010

Currently we have the lppaca structs as a simple array of NR_CPUS
entries, taking up space in the data section of the kernel image.
In future we would like to allocate them dynamically, so this
abstracts out the accesses to the array, making it easier to
change how we locate the lppaca for a given cpu in future.
Specifically, lppaca[cpu] changes to lppaca_of(cpu).
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

8154c5d2

powerpc: Feature nop out reservation clear when stcx checks address · f89451fb

由 Anton Blanchard 提交于 8月 11, 2010

The POWER architecture does not require stcx to check that it is operating
on the same address as the larx. This means it is possible for an
an exception handler to execute a larx, get a reservation, decide
not to do the stcx and then return back with an active reservation. If the
interrupted code was in the middle of a larx/stcx sequence the stcx could
incorrectly succeed.

All recent POWER CPUs check the address before letting the stcx succeed
so we can create a CPU feature and nop it out. As Ben suggested, we can
only do this in our syscall path because there is a remote possibility
some kernel code gets interrupted by an exception that ends up operating
on the same cacheline.

Thanks to Paul Mackerras and Derek Williams for the idea.

To test this I used a very simple null syscall (actually getppid) testcase
at http://ozlabs.org/~anton/junkcode/null_syscall.c

I tested against 2.6.35-git10 with the following changes against the
pseries_defconfig:

CONFIG_VIRT_CPU_ACCOUNTING=n
CONFIG_AUDIT=n
CONFIG_PPC_4K_PAGES=n
CONFIG_PPC_64K_PAGES=y
CONFIG_FORCE_MAX_ZONEORDER=9
CONFIG_PPC_SUBPAGE_PROT=n
CONFIG_FUNCTION_TRACER=n
CONFIG_FUNCTION_GRAPH_TRACER=n
CONFIG_IRQSOFF_TRACER=n
CONFIG_STACK_TRACER=n

to remove the overhead of virtual CPU accounting, syscall auditing and
the ftrace mcount tracers. 64kB pages were enabled to minimise TLB misses.

POWER6: +8.2%
POWER7: +7.0%

Another suggestion was to use a larx to something in the L1 instead of a stcx.
This was almost as fast as removing the larx on POWER6, but only 3.5% faster
on POWER7. We can use this to speed up the reservation clear in our
exception exit code.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f89451fb

powerpc: Add 64bit csum_and_copy_to_user · 8c773914

由 Anton Blanchard 提交于 8月 02, 2010

This adds the equivalent of csum_and_copy_from_user for the receive side so we
can copy and checksum in one pass. It is modelled on the generic checksum
routine.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

8c773914

powerpc: Optimise 64bit csum_partial_copy_generic and add csum_and_copy_from_user · fdd374b6

由 Anton Blanchard 提交于 8月 02, 2010

We use the same core loop as the new csum_partial, adding in the
stores and exception handling code. To keep things simple we do all the
exception fixup in csum_and_copy_from_user. This wrapper function is
modelled on the generic checksum code and is careful to always calculate
a complete checksum even if we only copied part of the data to userspace.

To test this I forced checksumming on over loopback and ran socklib (a
simple TCP benchmark). On a POWER6 575 throughput improved by 19% with
this patch. If I forced both the sender and receiver onto the same cpu
(with the hope of shifting the benchmark from being cache bandwidth limited
to cpu limited), adding this patch improved performance by 55%
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fdd374b6

24 8月, 2010 4 次提交

powerpc: Wire up fanotify_init, fanotify_mark, prlimit64 syscalls · bcc30d37

由 Andreas Schwab 提交于 8月 19, 2010

Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

bcc30d37

powerpc: Inline ppc64_runlatch_off · 4138d653

由 Anton Blanchard 提交于 8月 06, 2010

I'm sick of seeing ppc64_runlatch_off in our profiles, so inline it
into the callers. To avoid a mess of circular includes I didn't add
it as an inline function.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Acked-by: NOlof Johansson <olof@lixom.net>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

4138d653

powerpc/mm: Fix vsid_scrample typo · 34692708

由 Anton Blanchard 提交于 8月 02, 2010

The code is wrapped in an #if 0, but it's wrong so we may as well fix it.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

34692708

powerpc: Make rwsem use "long" type · 529b7307

由 Benjamin Herrenschmidt 提交于 8月 24, 2010

This makes the 64-bit kernel use 64-bit signed integers for the counter
(effectively supporting 32-bit of active count in the semaphore), thus
avoiding things like overflow of the mmap_sem if you use a really crazy
number of threads

Note: Ideally the type in the structure should be atomic_long_t rather
than "long". However, there's some nasty issues with that. It needs to
be initialized statically -and- lib/rwsem.c does things like

        sem->count = RWSEM_UNLOCKED_VALUE;

Now, if you mix in the fact that atomic_* types are actually structures
with one member and note typedefs of a scalar, it makes its really nasty.

So I stuck to what we did before using a long and casts for now.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

529b7307

15 8月, 2010 1 次提交

archs: replace unifdef-y with header-y · bf56fba6

由 Sam Ravnborg 提交于 8月 14, 2010

unifdef-y and header-y have same semantic, so drop unifdef-y
Signed-off-by: NSam Ravnborg <sam@ravnborg.org>

bf56fba6

11 8月, 2010 4 次提交

dma-mapping: remove dma_is_consistent API · 3b9c6c11

由 FUJITA Tomonori 提交于 8月 10, 2010

Architectures implement dma_is_consistent() in different ways (some
misinterpret the definition of API in DMA-API.txt).  So it hasn't been so
useful for drivers.  We have only one user of the API in tree.  Unlikely
out-of-tree drivers use the API.

Even if we fix dma_is_consistent() in some architectures, it doesn't look
useful at all.  It was invented long ago for some old systems that can't
allocate coherent memory at all.  It's better to export only APIs that are
definitely necessary for drivers.

Let's remove this API.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3b9c6c11

dma-mapping: unify dma_get_cache_alignment implementations · 4565f017

由 FUJITA Tomonori 提交于 8月 10, 2010

dma_get_cache_alignment returns the minimum DMA alignment.  Architectures
defines it as ARCH_DMA_MINALIGN (formally ARCH_KMALLOC_MINALIGN).  So we
can unify dma_get_cache_alignment implementations.

Note that some architectures implement dma_get_cache_alignment wrongly.
dma_get_cache_alignment() should return the minimum DMA alignment.  So
fully-coherent architectures should return 1.  This patch also fixes this
issue.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4565f017

dma-mapping: rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN · a6eb9fe1

由 FUJITA Tomonori 提交于 8月 10, 2010

Now each architecture has the own dma_get_cache_alignment implementation.

dma_get_cache_alignment returns the minimum DMA alignment.  Architectures
define it as ARCH_KMALLOC_MINALIGN (it's used to make sure that malloc'ed
buffer is DMA-safe; the buffer doesn't share a cache with the others).  So
we can unify dma_get_cache_alignment implementations.

This patch:

dma_get_cache_alignment() needs to know if an architecture defines
ARCH_KMALLOC_MINALIGN or not (needs to know if architecture has DMA
alignment restriction).  However, slab.h define ARCH_KMALLOC_MINALIGN if
architectures doesn't define it.

Let's rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN.
ARCH_KMALLOC_MINALIGN is used only in the internals of slab/slob/slub
(except for crypto).
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a6eb9fe1

tty: Add EXTPROC support for LINEMODE · 26df6d13

由 hyc@symas.com 提交于 6月 22, 2010

This patch is against the 2.6.34 source.

Paraphrased from the 1989 BSD patch by David Borman @ cray.com:

     These are the changes needed for the kernel to support
     LINEMODE in the server.

     There is a new bit in the termios local flag word, EXTPROC.
     When this bit is set, several aspects of the terminal driver
     are disabled.  Input line editing, character echo, and mapping
     of signals are all disabled.  This allows the telnetd to turn
     off these functions when in linemode, but still keep track of
     what state the user wants the terminal to be in.

     New ioctl:
         TIOCSIG         Generate a signal to processes in the
                         current process group of the pty.

     There is a new mode for packet driver, the TIOCPKT_IOCTL bit.
     When packet mode is turned on in the pty, and the EXTPROC bit
     is set, then whenever the state of the pty is changed, the
     next read on the master side of the pty will have the TIOCPKT_IOCTL
     bit set.  This allows the process on the server side of the pty
     to know when the state of the terminal has changed; it can then
     issue the appropriate ioctl to retrieve the new state.

Since the original BSD patches accompanied the source code for telnet
I've left that reference here, but obviously the feature is useful for
any remote terminal protocol, including ssh.

The corresponding feature has existed in the BSD tty driver since 1989.
For historical reference, a good copy of the relevant files can be found
here:

http://anonsvn.mit.edu/viewvc/krb5/trunk/src/appl/telnet/?pathrev=17741Signed-off-by: NHoward Chu <hyc@symas.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

26df6d13

10 8月, 2010 1 次提交

kmap_atomic: make kunmap_atomic() harder to misuse · 597781f3

由 Cesar Eduardo Barros 提交于 8月 09, 2010

kunmap_atomic() is currently at level -4 on Rusty's "Hard To Misuse"
list[1] ("Follow common convention and you'll get it wrong"), except in
some architectures when CONFIG_DEBUG_HIGHMEM is set[2][3].

kunmap() takes a pointer to a struct page; kunmap_atomic(), however, takes
takes a pointer to within the page itself.  This seems to once in a while
trip people up (the convention they are following is the one from
kunmap()).

Make it much harder to misuse, by moving it to level 9 on Rusty's list[4]
("The compiler/linker won't let you get it wrong").  This is done by
refusing to build if the type of its first argument is a pointer to a
struct page.

The real kunmap_atomic() is renamed to kunmap_atomic_notypecheck()
(which is what you would call in case for some strange reason calling it
with a pointer to a struct page is not incorrect in your code).

The previous version of this patch was compile tested on x86-64.

[1] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
[2] In these cases, it is at level 5, "Do it right or it will always
    break at runtime."
[3] At least mips and powerpc look very similar, and sparc also seems to
    share a common ancestor with both; there seems to be quite some
    degree of copy-and-paste coding here. The include/asm/highmem.h file
    for these three archs mention x86 CPUs at its top.
[4] http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html
[5] As an aside, could someone tell me why mn10300 uses unsigned long as
    the first parameter of kunmap_atomic() instead of void *?
Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net>
Cc: Russell King <linux@arm.linux.org.uk> (arch/arm)
Cc: Ralf Baechle <ralf@linux-mips.org> (arch/mips)
Cc: David Howells <dhowells@redhat.com> (arch/frv, arch/mn10300)
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> (arch/mn10300)
Cc: Kyle McMartin <kyle@mcmartin.ca> (arch/parisc)
Cc: Helge Deller <deller@gmx.de> (arch/parisc)
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> (arch/parisc)
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> (arch/powerpc)
Cc: Paul Mackerras <paulus@samba.org> (arch/powerpc)
Cc: "David S. Miller" <davem@davemloft.net> (arch/sparc)
Cc: Thomas Gleixner <tglx@linutronix.de> (arch/x86)
Cc: Ingo Molnar <mingo@redhat.com> (arch/x86)
Cc: "H. Peter Anvin" <hpa@zytor.com> (arch/x86)
Cc: Arnd Bergmann <arnd@arndb.de> (include/asm-generic)
Cc: Rusty Russell <rusty@rustcorp.com.au> ("Hard To Misuse" list)
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

597781f3

08 8月, 2010 1 次提交

remove needless ISA_DMA_THRESHOLD · 7e005f79

由 FUJITA Tomonori 提交于 5月 31, 2010

Architectures don't need to define ISA_DMA_THRESHOLD anymore.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: NJames Bottomley <James.Bottomley@suse.de>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

7e005f79

07 8月, 2010 1 次提交

powerpc/5200: add mpc5200_psc_ac97_gpio_reset · cfa6a88c

由 Eric Millbrandt 提交于 8月 06, 2010

Work around a silicon bug in the ac97 reset functionality of the
mpc5200(b).  The implementation of the ac97 "cold" reset is flawed.
If the sync and output lines are high when reset is asserted the
attached ac97 device may go into test mode.  Avoid this by
reconfiguring the psc to gpio mode and generating the reset manually.

From MPC5200B User's Manual:
"Some AC97 devices goes to a test mode, if the Sync line is high
during the Res line is low (reset phase). To avoid this behavior the
Sync line must be also forced to zero during the reset phase. To do
that, the pin muxing should switch to GPIO mode and the GPIO control
register should be used to control the output lines."
Signed-off-by: NEric Millbrandt <emillbrandt@dekaresearch.com>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

cfa6a88c

02 8月, 2010 1 次提交

powerpc/5121: shared DIU framebuffer support · 4b5006ec

由 Anatolij Gustschin 提交于 7月 23, 2010

MPC5121 DIU configuration/setup as initialized by the boot
loader currently will get lost while booting Linux. As a
result displaying the boot splash is not possible through
the boot process.

To prevent this we reserve configured DIU frame buffer
address range while booting and preserve AOI descriptor
and gamma table so that DIU continues displaying through
the whole boot process. On first open from user space
DIU frame buffer driver releases the reserved frame
buffer area and continues to operate as usual.
Signed-off-by: NJohn Rigby <jcrigby@gmail.com>
Signed-off-by: NAnatolij Gustschin <agust@denx.de>
Acked-by: NTimur Tabi <timur@freescale.com>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

4b5006ec

01 8月, 2010 5 次提交

KVM: Remove unnecessary divide operations · 82855413

由 Joerg Roedel 提交于 7月 01, 2010

This patch converts unnecessary divide and modulo operations
in the KVM large page related code into logical operations.
This allows to convert gfn_t to u64 while not breaking 32
bit builds.
Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

82855413

KVM: PPC: Make use of hash based Shadow MMU · fef093be

由 Alexander Graf 提交于 6月 30, 2010

We just introduced generic functions to handle shadow pages on PPC.
This patch makes the respective backends make use of them, getting
rid of a lot of duplicate code along the way.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

fef093be

KVM: PPC: Remove obsolete kvmppc_mmu_find_pte · a576f7a2

由 Alexander Graf 提交于 6月 21, 2010

Initially we had to search for pte entries to invalidate them. Since
the logic has improved since then, we can just get rid of the search
function.
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

a576f7a2

of/address: Clean up function declarations · 22ae782f

由 Grant Likely 提交于 7月 29, 2010

This patch moves the declaration of of_get_address(), of_get_pci_address(),
and of_pci_address_to_resource() out of arch code and into the common
linux/of_address header file.

This patch also fixes some of the asm/prom.h ordering issues. It still
includes some header files that it ideally shouldn't be, but at least the
ordering is consistent now so that of_* overrides work.
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

22ae782f

KVM: PPC: elide struct thread_struct instances from stack · 49f6be8e

由 Andreas Schwab 提交于 5月 31, 2010

Instead of instantiating a whole thread_struct on the stack use only the
required parts of it.
Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
Tested-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

49f6be8e

31 7月, 2010 1 次提交

powerpc/kexec: Switch to a static PACA on the way out · fc53b420

由 Matt Evans 提交于 7月 07, 2010

With dynamic PACAs, the kexecing CPU's PACA won't lie within the kernel
static data and there is a chance that something may stomp it when preparing
to kexec.  This patch switches this final CPU to a static PACA just before
we pull the switch.
Signed-off-by: NMatt Evans <matt@ozlabs.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fc53b420

30 7月, 2010 1 次提交

of: Provide default of_node_to_nid() implementation. · 559e2b7e

由 Grant Likely 提交于 7月 23, 2010

of_node_to_nid() is only relevant in a few architectures.  Don't force
everyone to implement it anyway.
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

559e2b7e

29 7月, 2010 2 次提交

powerpc: Clean up obsolete code relating to decrementer and timebase · d75d68cf

由 Paul Mackerras 提交于 6月 20, 2010

Since the decrementer and timekeeping code was moved over to using
the generic clockevents and timekeeping infrastructure, several
variables and functions have been obsolete and effectively unused.
This deletes them.

In particular, wakeup_decrementer() is no longer needed since the
generic code reprograms the decrementer as part of the process of
resuming the timekeeping code, which happens during sysdev resume.
Thus the wakeup_decrementer calls in the suspend_enter methods for
52xx platforms have been removed.  The call in the powermac cpu
frequency change code has been replaced by set_dec(1), which will
cause a timer interrupt as soon as interrupts are enabled, and the
generic code will then reprogram the decrementer with the correct
value.

This also simplifies the generic_suspend_en/disable_irqs functions
and makes them static since they are not referenced outside time.c.
The preempt_enable/disable calls are removed because the generic
code has disabled all but the boot cpu at the point where these
functions are called, so we can't be moved to another cpu.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d75d68cf

powerpc: Rework VDSO gettimeofday to prevent time going backwards · 0e469db8

由 Paul Mackerras 提交于 6月 20, 2010

Currently it is possible for userspace to see the result of
gettimeofday() going backwards by 1 microsecond, assuming that
userspace is using the gettimeofday() in the VDSO.  The VDSO
gettimeofday() algorithm computes the time in "xsecs", which are
units of 2^-20 seconds, or approximately 0.954 microseconds,
using the algorithm

	now = (timebase - tb_orig_stamp) * tb_to_xs + stamp_xsec

and then converts the time in xsecs to seconds and microseconds.

The kernel updates the tb_orig_stamp and stamp_xsec values every
tick in update_vsyscall().  If the length of the tick is not an
integer number of xsecs, then some precision is lost in converting
the current time to xsecs.  For example, with CONFIG_HZ=1000, the
tick is 1ms long, which is 1048.576 xsecs.  That means that
stamp_xsec will advance by either 1048 or 1049 on each tick.
With the right conditions, it is possible for userspace to get
(timebase - tb_orig_stamp) * tb_to_xs being 1049 if the kernel is
slightly late in updating the vdso_datapage, and then for stamp_xsec
to advance by 1048 when the kernel does update it, and for userspace
to then see (timebase - tb_orig_stamp) * tb_to_xs being zero due to
integer truncation.  The result is that time appears to go backwards
by 1 microsecond.

To fix this we change the VDSO gettimeofday to use a new field in the
VDSO datapage which stores the nanoseconds part of the time as a
fractional number of seconds in a 0.32 binary fraction format.
(Or put another way, as a 32-bit number in units of 0.23283 ns.)
This is convenient because we can use the mulhwu instruction to
convert it to either microseconds or nanoseconds.

Since it turns out that computing the time of day using this new field
is simpler than either using stamp_xsec (as gettimeofday does) or
stamp_xtime.tv_nsec (as clock_gettime does), this converts both
gettimeofday and clock_gettime to use the new field.  The existing
__do_get_tspec function is converted to use the new field and take
a parameter in r7 that indicates the desired resolution, 1,000,000
for microseconds or 1,000,000,000 for nanoseconds.  The __do_get_xsec
function is then unused and is deleted.

The new algorithm is

	now = ((timebase - tb_orig_stamp) << 12) * tb_to_xs
		+ (stamp_xtime_seconds << 32) + stamp_sec_fraction

with 'now' in units of 2^-32 seconds.  That is then converted to
seconds and either microseconds or nanoseconds with

	seconds = now >> 32
	partseconds = ((now & 0xffffffff) * resolution) >> 32

The 32-bit VDSO code also makes a further simplification: it ignores
the bottom 32 bits of the tb_to_xs value, which is a 0.64 format binary
fraction.  Doing so gets rid of 4 multiply instructions.  Assuming
a timebase frequency of 1GHz or less and an update interval of no
more than 10ms, the upper 32 bits of tb_to_xs will be at least
4503599, so the error from ignoring the low 32 bits will be at most
2.2ns, which is more than an order of magnitude less than the time
taken to do gettimeofday or clock_gettime on our fastest processors,
so there is no possibility of seeing inconsistent values due to this.

This also moves update_gtod() down next to its only caller, and makes
update_vsyscall use the time passed in via the wall_time argument rather
than accessing xtime directly.  At present, wall_time always points to
xtime, but that could change in future.
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0e469db8

24 7月, 2010 3 次提交

of/device: Replace of_device with platform_device in includes and core code · 94a0cb1f

由 Grant Likely 提交于 7月 22, 2010

of_device is currently just an #define alias to platform_device until it
gets removed entirely.  This patch removes references to it from the
include directories and the core drivers/of code.
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
Acked-by: NDavid S. Miller <davem@davemloft.net>

94a0cb1f

of: remove asm/of_device.h · 29596042

由 Grant Likely 提交于 6月 29, 2010

It is mostly unused now.  Sparc has a few defines left in it, but they
can be moved to other headers.  Removing this header means that new
architectures adding CONFIG_OF support don't need to also add this
header file.
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
Acked-by: NDavid S. Miller <davem@davemloft.net>

29596042

of: remove asm/of_platform.h · 129ac799

由 Grant Likely 提交于 6月 29, 2010

Only thing left in it is of_instantiate_rtc() which can be moved to
asm/prom.h on PowerPC and is unused in microblaze.
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
Acked-by: NDavid S. Miller <davem@davemloft.net>

129ac799

23 7月, 2010 2 次提交

powerpc/mm: Add some debug output when hash insertion fails · 4b8692c0

由 Benjamin Herrenschmidt 提交于 7月 23, 2010

This adds some debug output to our MMU hash code to print out some
useful debug data if the hypervisor refuses the insertion (which
should normally never happen).
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
---

4b8692c0

powerpc/kexec: Fix boundary case for book-e kexec memory limits · 23dcab8f

由 Kumar Gala 提交于 7月 22, 2010

The KEXEC_*_MEMORY_LIMITs are inclusive addresses.  We define them as
2Gs as that is what we allow mapping via TLBs.  However, this should be
2G - 1 to be inclusive, otherwise if we have >2G of memory in a system
we fail to boot properly via kexec.
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

23dcab8f

19 7月, 2010 1 次提交

of: Remove unused of_find_device_by_phandle() · c5f5849b

由 Grant Likely 提交于 6月 29, 2010

Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

c5f5849b

14 7月, 2010 3 次提交

lmb: rename to memblock · 95f72d1e

由 Yinghai Lu 提交于 7月 12, 2010

via following scripts

      FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

      sed -i \
        -e 's/lmb/memblock/g' \
        -e 's/LMB/MEMBLOCK/g' \
        $FILES

      for N in $(find . -name lmb.[ch]); do
        M=$(echo $N | sed 's/lmb/memblock/g')
        mv $N $M
      done

and remove some wrong change like lmbench and dlmb etc.

also move memblock.c from lib/ to mm/
Suggested-by: NIngo Molnar <mingo@elte.hu>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

95f72d1e

powerpc/book3e: Adjust the page sizes list based on MMU config · f2b26c92

由 Benjamin Herrenschmidt 提交于 7月 09, 2010

Use the MMU config registers to scan for available direct and
indirect page sizes and print out the result. Will be needed
for future hugetlbfs implementation.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f2b26c92

powerpc/book3e: Add generic 64-bit idle powersave support · 34d97e07

由 Benjamin Herrenschmidt 提交于 7月 14, 2010

We use a similar technique to ppc32: We set a thread local flag
to indicate that we are about to enter or have entered the stop
state, and have fixup code in the async interrupt entry code that
reacts to this flag to make us return to a different location
(sets NIP to LINK in our case).
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
--
v2. Fix lockdep bug
    Re-mask interrupts when coming back from idle

34d97e07

12 7月, 2010 2 次提交

powerpc/cpm1: Mark micropatch code/data static and __init · af71bcfe

由 Anton Vorontsov 提交于 7月 08, 2010

This saves runtime memory and fixes lots of sparse warnings like this:

    CHECK   arch/powerpc/sysdev/micropatch.c
  arch/powerpc/sysdev/micropatch.c:27:6: warning: symbol 'patch_2000'
  was not declared. Should it be static?
  arch/powerpc/sysdev/micropatch.c:146:6: warning: symbol 'patch_2f00'
  was not declared. Should it be static?
  ...
Signed-off-by: NAnton Vorontsov <avorontsov@mvista.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

af71bcfe

powerpc/cpm: Reintroduce global spi_pram struct (fixes build issue) · 56825c88

由 Anton Vorontsov 提交于 7月 08, 2010

spi_t was removed in commit 644b2a68
("powerpc/cpm: Remove SPI defines and spi structs"), the commit assumed
that spi_t isn't used anywhere outside of the spi_mpc8xxx driver. But
it appears that the struct is needed for micropatch code. So, let's
reintroduce the struct.

Fixes the following build issue:

    CC      arch/powerpc/sysdev/micropatch.o
  micropatch.c: In function 'cpm_load_patch':
  micropatch.c:629: error: expected '=', ',', ';', 'asm' or '__attribute__' before '*' token
  micropatch.c:629: error: 'spp' undeclared (first use in this function)
  micropatch.c:629: error: (Each undeclared identifier is reported only once
  micropatch.c:629: error: for each function it appears in.)
Reported-by: NLEROY Christophe <christophe.leroy@c-s.fr>
Reported-by: NTony Breeds <tony@bakeyournoodle.com>
Cc: <stable@kernel.org> [ .33, .34 ]
Signed-off-by: NAnton Vorontsov <avorontsov@mvista.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

56825c88

09 7月, 2010 1 次提交

powerpc/book3e: Resend doorbell exceptions to ourself · 850f22d5

由 Michael Ellerman 提交于 7月 09, 2010

If we are soft disabled and receive a doorbell exception we don't process
it immediately. This means we need to check on the way out of irq restore
if there are any doorbell exceptions to process.

The problem is at that point we don't know what our regs are, and that
in turn makes xmon unhappy. To workaround the problem, instead of checking
for and processing doorbells, we check for any doorbells and if there were
any we send ourselves another.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

850f22d5