提交 · 11743a1db9956a024b9b75292614e9869abb9ec2 · openanolis / cloud-kernel

24 1月, 2013 1 次提交

arch/x86/platform/uv: Fix incorrect tlb flush all issue · 57c4f430

由 Alex Shi 提交于 12月 18, 2012

The flush tlb optimization code has logical issue on UV
platform.  It doesn't flush the full range at all, since it
simply ignores its 'end' parameter (and hence also the "all"
indicator) in uv_flush_tlb_others() function.

Cliff's notes:

 | I tested the patch on a UV.  It has the effect of either
 | clearing 1 or all TLBs in a cpu.  I added some debugging to
 | test for the cases when clearing all TLBs is overkill, and in
 | practice it happens very seldom.
Reported-by: NJan Beulich <jbeulich@suse.com>
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Tested-by: NCliff Wickman <cpw@sgi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

57c4f430

21 12月, 2012 1 次提交

arch/x86/platform/uv: use ARRAY_SIZE where possible · 64441745

由 Sasha Levin 提交于 12月 20, 2012

Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Link: http://lkml.kernel.org/r/1356030701-16284-26-git-send-email-sasha.levin@oracle.com
Cc: Cliff Wickman <cpw@sgi.com>
Cc: Alex Shi <alex.shi@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

64441745

28 6月, 2012 1 次提交

x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range · e7b52ffd

由 Alex Shi 提交于 6月 28, 2012

x86 has no flush_tlb_range support in instruction level. Currently the
flush_tlb_range just implemented by flushing all page table. That is not
the best solution for all scenarios. In fact, if we just use 'invlpg' to
flush few lines from TLB, we can get the performance gain from later
remain TLB lines accessing.

But the 'invlpg' instruction costs much of time. Its execution time can
compete with cr3 rewriting, and even a bit more on SNB CPU.

So, on a 512 4KB TLB entries CPU, the balance points is at:
	(512 - X) * 100ns(assumed TLB refill cost) =
		X(TLB flush entries) * 100ns(assumed invlpg cost)

Here, X is 256, that is 1/2 of 512 entries.

But with the mysterious CPU pre-fetcher and page miss handler Unit, the
assumed TLB refill cost is far lower then 100ns in sequential access. And
2 HT siblings in one core makes the memory access more faster if they are
accessing the same memory. So, in the patch, I just do the change when
the target entries is less than 1/16 of whole active tlb entries.
Actually, I have no data support for the percentage '1/16', so any
suggestions are welcomed.

As to hugetlb, guess due to smaller page table, and smaller active TLB
entries, I didn't see benefit via my benchmark, so no optimizing now.

My micro benchmark show in ideal scenarios, the performance improves 70
percent in reading. And in worst scenario, the reading/writing
performance is similar with unpatched 3.4-rc4 kernel.

Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
'always':

multi thread testing, '-t' paramter is thread number:
	       	        with patch   unpatched 3.4-rc4
./mprotect -t 1           14ns		24ns
./mprotect -t 2           13ns		22ns
./mprotect -t 4           12ns		19ns
./mprotect -t 8           14ns		16ns
./mprotect -t 16          28ns		26ns
./mprotect -t 32          54ns		51ns
./mprotect -t 128         200ns		199ns

Single process with sequencial flushing and memory accessing:

		       	with patch   unpatched 3.4-rc4
./mprotect		    7ns			11ns
./mprotect -p 4096  -l 8 -n 10240
			    21ns		21ns

[ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
  has additional performance numbers. ]
Signed-off-by: NAlex Shi <alex.shi@intel.com>
Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

e7b52ffd

25 6月, 2012 3 次提交

x86/uv: Work around UV2 BAU hangs · 8b6e511e

由 Cliff Wickman 提交于 6月 22, 2012

On SGI's UV2 the BAU (Broadcast Assist Unit) driver can hang
under a heavy load. To cure this:

- Disable the UV2 extended status mode (see UV2_EXT_SHFT), as
  this mode changes BAU behavior in more ways then just delivering
  an extra bit of status.  Revert status to just two meaningful bits,
  like UV1.

- Use no IPI-style resets on UV2.  Just give up the request for
  whatever the reason it failed and let it be accomplished with
  the legacy IPI method.

- Use no alternate sending descriptor (the former UV2 workaround
  bcp->using_desc and handle_uv2_busy() stuff).  Just disable the
  use of the BAU for a period of time in favor of the legacy IPI
  method when the h/w bug leaves a descriptor busy.

  -- new tunable: giveup_limit determines the threshold at which a hub is
     so plugged that it should do all requests with the legacy IPI method for a
     period of time
  -- generalize disable_for_congestion() (renamed disable_for_period()) for
     use whenever a hub should avoid using the BAU for a period of time

Also:

 - Fix find_another_by_swack(), which is part of the UV2 bug workaround

 - Correct and clarify the statistics (new stats s_overipilimit, s_giveuplimit,
   s_enters, s_ipifordisabled, s_plugged, s_congested)
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20120622131459.GC31884@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

8b6e511e

x86/uv: Implement UV BAU runtime enable and disable control via /proc/sgi_uv/ · 26ef8577

由 Cliff Wickman 提交于 6月 22, 2012

This patch enables the BAU to be turned on or off dynamically.

  echo "on"  > /proc/sgi_uv/ptc_statistics
  echo "off" > /proc/sgi_uv/ptc_statistics

The system may be booted with or without the nobau option.

Whether the system currently has the BAU off can be seen in
the /proc file -- normally with the baustats script.
Each cpu will have a 1 in the bauoff field if the BAU was turned
off, so baustats will give a count of cpus that have it off.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20120622131330.GB31884@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

26ef8577

x86/uv: Fix the UV BAU destination timeout period · 11cab711

由 Cliff Wickman 提交于 6月 22, 2012

Correct the calculation of a destination timeout period, which
is used to distinguish between a destination timeout and the
situation where all the target software ack resources are full
and a request is returned immediately.

The problem is that integer arithmetic was overflowing, yielding
a very large result.

Without this fix destination timeouts are identified as resource
'plugged' events and an ipi method of resource releasing is
unnecessarily employed.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20120622131212.GA31884@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

11cab711

08 6月, 2012 1 次提交

x86/uv: Fix UV2 BAU legacy mode · d5d2d2ee

由 Cliff Wickman 提交于 6月 07, 2012

The SGI Altix UV2 BAU (Broadcast Assist Unit) as used for
tlb-shootdown (selective broadcast mode) always uses UV2
broadcast descriptor format. There is no need to clear the
'legacy' (UV1) mode, because the hardware always uses UV2 mode
for selective broadcast.

But the BIOS uses general broadcast and legacy mode, and the
hardware pays attention to the legacy mode bit for general
broadcast. So the kernel must not clear that mode bit.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/E1SccoO-0002Lh-Cb@eag09.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d5d2d2ee

26 1月, 2012 1 次提交

x86/uv: Fix uninitialized spinlocks · d2ebc71d

由 Cliff Wickman 提交于 1月 18, 2012

Initialize two spinlocks in tlb_uv.c and also properly define/initialize
the uv_irq_lock.

The lack of explicit initialization seems to be functionally
harmless, but it is diagnosed when these are turned on:

        CONFIG_DEBUG_SPINLOCK=y
        CONFIG_DEBUG_MUTEXES=y
        CONFIG_DEBUG_LOCK_ALLOC=y
        CONFIG_LOCKDEP=y
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
Cc: Dimitri Sivanich <sivanich@sgi.com>
Link: http://lkml.kernel.org/r/E1RnXd1-0003wU-PM@eag09.americas.sgi.com
[ Added the uv_irq_lock initialization fix by Dimitri Sivanich ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d2ebc71d

17 1月, 2012 6 次提交

x86/UV2: Add accounting for BAU strong nacks · b54bd9be

由 Cliff Wickman 提交于 1月 16, 2012

This patch adds separate accounting of UV2 message "strong
nack's" in the BAU statistics.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20120116212238.GF5767@sgi.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

b54bd9be

x86/UV2: Ack BAU interrupt earlier · 88ed9dd7

由 Cliff Wickman 提交于 1月 16, 2012

This patch moves the ack of the BAU interrupt to the beginning
of the interrupt handler so that there is less possibility of a
lost interrupt and slower response to a shootdown message.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20120116212146.GE5767@sgi.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

88ed9dd7

x86/UV2: Remove stale no-resources test for UV2 BAU · 478c6e52

由 Cliff Wickman 提交于 1月 16, 2012

This patch removes an unnecessary test for a
no-destination-resources-available condition that looks like a
destination timeout in UV1, but is separately distinguishable in
UV2.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20120116212050.GD5767@sgi.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

478c6e52

x86/UV2: Work around BAU bug · c5d35d39

由 Cliff Wickman 提交于 1月 16, 2012

This patch implements a workaround for a UV2 hardware bug.
The bug is a non-atomic update of a memory-mapped register. When
hardware message delivery and software message acknowledge occur
simultaneously the pending message acknowledge for the arriving
message may be lost.  This causes the sender's message status to
stay busy.

Part of the workaround is to not acknowledge a completed message
until it is verified that no other message is actually using the
resource that is mistakenly recorded in the completed message.

Part of the workaround is to test for long elapsed time in such
a busy condition, then handle it by using a spare sending
descriptor. The stay-busy condition is eventually timed out by
hardware, and then the original sending descriptor can be
re-used. Most of that logic change is in keeping track of the
current descriptor and the state of the spares.

The occurrences of the workaround are added to the BAU
statistics.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20120116211947.GC5767@sgi.com
Cc: <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c5d35d39

x86/UV2: Fix BAU destination timeout initialization · d059f9fa

由 Cliff Wickman 提交于 1月 16, 2012

Move the call to enable_timeouts() forward so that
BAU_MISC_CONTROL is initialized before using it in
calculate_destination_timeout().

Fix the calculation of a BAU destination timeout
for UV2 (in calculate_destination_timeout()).
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20120116211848.GB5767@sgi.com
Cc: <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d059f9fa

x86/UV2: Fix new UV2 hardware by using native UV2 broadcast mode · da87c937

由 Cliff Wickman 提交于 1月 16, 2012

Update the use of the Broadcast Assist Unit on SGI Altix UV2 to
the use of native UV2 mode on new hardware (not the legacy mode).

UV2 native mode has a different format for a broadcast message.
We also need quick differentiaton between UV1 and UV2.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20120116211750.GA5767@sgi.com
Cc: <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

da87c937

21 9月, 2011 1 次提交

x86: uv2: Workaround for UV2 Hub bug (system global address format) · 6a469e46

由 Jack Steiner 提交于 9月 20, 2011

This is a workaround for a UV2 hub bug that affects the format of system
global addresses.

The GRU API for UV2 was inadvertently broken by a hardware change.  The
format of the physical address used for TLB dropins and for addresses used
with instructions running in unmapped mode has changed.  This change was
not documented and became apparent only when diags failed running on
system simulators.

For UV1, TLB and GRU instruction physical addresses are identical to
socket physical addresses (although high NASID bits must be OR'ed into the
address).

For UV2, socket physical addresses need to be converted.  The NODE portion
of the physical address needs to be shifted so that the low bit is in bit
39 or bit 40, depending on an MMR value.

It is not yet clear if this bug will be fixed in a silicon respin.  If it
is fixed, the hub revision will be incremented & the workaround disabled.
Signed-off-by: NJack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

6a469e46

21 6月, 2011 6 次提交

x86, UV: Correct failed topology memory leak · bbd270e6

由 cpw@sgi.com 提交于 6月 21, 2011

Fix a memory leak in init_per_cpu() when the topology check
fails.

The leak should never occur on deployed systems. It would only occur
in an unexpected topology that would make the BAU unuseable as a result.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20110621122242.981533045@sgi.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

bbd270e6

x86, UV: Remove cpumask_t from the stack · 442d3924

由 cpw@sgi.com 提交于 6月 21, 2011

Remove the large stack-resident cpumask_t from
reset_with_ipi()'s stack by allocating one per uvhub.

Due to the limited size of the stack the potentially huge cpumask_t may
cause stack overrun. We haven't seen it happen yet, but we need to make it
a practice not to push such structures onto the stack.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Link: http://lkml.kernel.org/r/20110621122242.832589130@sgi.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

442d3924

x86, UV: Rename hubmask to pnmask · a456eaab

由 cpw@sgi.com 提交于 6月 21, 2011

Rename 'bau_targ_hubmask' to 'pnmask' for clarity.

The BAU distribution bit mask is indexed by pnode number, not hub or
blade number.  This important fact is not clear while the mask is
called a 'hubmask'.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20110621122242.630995969@sgi.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

a456eaab

x86, UV: Correct reset_with_ipi() · 485f07d3

由 cpw@sgi.com 提交于 6月 21, 2011

Fix reset_with_ipi() to look up a cpu for a blade based on the
distribution map being indexed by the potentially sparsely
numbered pnode.

This patch is critical to tlb shootdown on a partitioned UV
system, or one with nonconsecutive blade numbers.

The distribution map bits represent pnodes relative to the partition base
pnode. Previous to this patch it had been assuming bits based on 0-based,
consecutive blade ids.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20110621122242.497700003@sgi.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

485f07d3

x86, UV: Allow for non-consecutive sockets · 9c9153db

由 cpw@sgi.com 提交于 6月 21, 2011

Fix for the topology in which there is a socket 1 on a blade
with no socket 0.

Only call make_per_cpu_thp() for present sockets.
We have only seen this fail for internal configurations.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20110621122242.363757364@sgi.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

9c9153db

x86, UV: Fix smp_processor_id() use in a preemptable region · 00b30cf0

由 cpw@sgi.com 提交于 6月 21, 2011

Fix a call by tunables_write() to smp_processor_id() within a
preemptable region.

Call get_cpu()/put_cpu() around the region where the returned
cpu number is actually used, which makes it non-preemptable.

A DEBUG_PREEMPT warning is prevented.

UV does not support cpu hotplug yet, but this is a step toward
that ability as well.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Link: http://lkml.kernel.org/r/20110621122242.086384966@sgi.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

00b30cf0

25 5月, 2011 2 次提交

x86, UV: Clean up uv_tlb.c · f073cc8f

由 Cliff Wickman 提交于 5月 24, 2011

SGI UV's uv_tlb.c driver has become rather hard to read, with overly large
functions, non-standard coding style and (way) too long variable, constant
and function names and non-obvious code flow sequences.

This patch improves the readability and maintainability of the driver
significantly, by doing the following strict code cleanups with no side
effects:

 - Split long functions into shorter logical functions.

 - Shortened some variable and structure member names.

 - Added special functions for reads and writes of MMR regs with
   very long names.

 - Added the 'tunables' table to shortened tunables_write().

 - Added the 'stat_description' table to shorten uv_ptc_proc_write().

 - Pass fewer 'stat' arguments where it can be derived from the 'bcp'
   argument.

 - Function definitions consistent on one line, and inline in few (short) cases.

 - Moved some small structures and an atomic inline function to the header file.

 - Moved some local variables to the blocks where they are used.

 - Updated the copyright date.

 - Shortened uv_write_global_mmr64() etc. using some aliasing; no
   line breaks. Renamed many uv_.. functions that are not exported.

 - Aligned structure fields.
    [ note that not all structures are aligned the same way though; I'd like
      to keep the extensive commenting in some of them. ]

 - Shortened some long structure names.

 - Standard pass/fail exit from init_per_cpu()

 - Vertical alignment for mass initializations.

 - More separation between blocks of code.

Tested on a 16-processor Altix UV.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Cc: penberg@kernel.org
Link: http://lkml.kernel.org/r/E1QOw12-0004MN-Lp@eag09.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

f073cc8f

x86, UV: Add support for SGI UV2 hub chip · 2a919596

由 Jack Steiner 提交于 5月 11, 2011

This patch adds support for a new version of the SGI UV hub
chip. The hub chip is the node controller that connects multiple
blades into a larger coherent SSI.

For the most part, UV2 is compatible with UV1. The majority of
the changes are in the addresses of MMRs and in a few cases, the
contents of MMRs. These changes are the result in changes in the
system topology such as node configuration, processor types,
maximum nodes, physical address sizes, etc.
Signed-off-by: NJack Steiner <steiner@sgi.com>
Link: http://lkml.kernel.org/r/20110511175028.GA18006@sgi.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

2a919596

13 5月, 2011 1 次提交

x86: Fix UV BAU for non-consecutive nasids · 77ed23f8

由 Cliff Wickman 提交于 5月 10, 2011

This is a fix for the SGI Altix-UV Broadcast Assist Unit code,
which is used for TLB flushing.

Certain hardware configurations (that customers are ordering)
cause nasids (numa address space id's) to be non-consecutive.
Specifically, once you have more than 4 blades in a IRU
(Individual Rack Unit - or 1/2 rack) but less than the maximum
of 16, the nasid numbering becomes non-consecutive.  This
currently results in a 'catastrophic error' (CATERR) detected by
the firmware during OS boot.  The BAU is generating an 'INTD'
request that is targeting a non-existent nasid value. Such
configurations may also occur when a blade is configured off
because of hardware errors. (There is one UV hub per blade.)

This patch is required to support such configurations.

The problem with the tlb_uv.c code is that is using the
consecutive hub numbers as indices to the BAU distribution bit
map. These are simply the ordinal position of the hub or blade
within its partition.  It should be using physical node numbers
(pnodes), which correspond to the physical nasid values. Use of
the hub number only works as long as the nasids in the partition
are consecutive and increase with a stride of 1.

This patch changes the index to be the pnode number, thus
allowing nasids to be non-consecutive.
It also provides a table in local memory for each cpu to
translate target cpu number to target pnode and nasid.
And it improves naming to properly reflect 'node' and 'uvhub'
versus 'nasid'.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/E1QJmxX-0002Mz-Fk@eag09.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@elte.hu>

77ed23f8

29 3月, 2011 1 次提交

x86: Stop including <linux/delay.h> in two asm header files · ca444564

由 Jean Delvare 提交于 3月 25, 2011

Stop including <linux/delay.h> in x86 header files which don't
need it. This will let the compiler complain when this header is
not included by source files when it should, so that
contributors can fix the problem before building on other
architectures starts to fail.

Credits go to Geert for the idea.
Signed-off-by: NJean Delvare <khali@linux-fr.org>
Cc: James E.J. Bottomley <James.Bottomley@suse.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <20110325152014.297890ec@endymion.delvare>
[ this also fixes an upstream build bug in drivers/media/rc/ite-cir.c ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ca444564

09 3月, 2011 1 次提交

x86, UV: Initialize the broadcast assist unit base destination node id properly · 54712622

由 Cliff Wickman 提交于 3月 09, 2011

The BAU's initialization of the broadcast description header is
lacking the coherence domain (high bits) in the nasid.  This
causes a catastrophic system failure when running on a system
with multiple coherence domains.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
LKML-Reference: <E1PxKBB-0005F0-3U@eag09.americas.sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

54712622

04 1月, 2011 1 次提交

x86, UV, BAU: Extend for more than 16 cpus per socket · cfa60917

由 Cliff Wickman 提交于 1月 03, 2011

Fix a hard-coded limit of a maximum of 16 cpu's per socket.

The UV Broadcast Assist Unit code initializes by scanning the
cpu topology of the system and assigning a master cpu for each
socket and UV hub. That scan had an assumption of a limit of 16
cpus per socket. With Westmere we are going over that limit.
The UV hub hardware will allow up to 32.

If the scan finds the system has gone over that limit it returns
an error and we print a warning and fall back to doing TLB
shootdowns without the BAU.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org> # .37.x
LKML-Reference: <E1PZol7-0000mM-77@eag09.americas.sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cfa60917

18 11月, 2010 1 次提交

x86: UV: Address interrupt/IO port operation conflict · 8191c9f6

由 Dimitri Sivanich 提交于 11月 16, 2010

This patch for SGI UV systems addresses a problem whereby
interrupt transactions being looped back from a local IOH,
through the hub to a local CPU can (erroneously) conflict with
IO port operations and other transactions.

To workaound this we set a high bit in the APIC IDs used for
interrupts. This bit appears to be ignored by the sockets, but
it avoids the conflict in the hub.
Signed-off-by: NDimitri Sivanich <sivanich@sgi.com>
LKML-Reference: <20101116222352.GA8155@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
___

 arch/x86/include/asm/uv/uv_hub.h   |    4 ++++
 arch/x86/include/asm/uv/uv_mmrs.h  |   19 ++++++++++++++++++-
 arch/x86/kernel/apic/x2apic_uv_x.c |   25 +++++++++++++++++++++++--
 arch/x86/platform/uv/tlb_uv.c      |    2 +-
 arch/x86/platform/uv/uv_time.c     |    4 +++-
 5 files changed, 49 insertions(+), 5 deletions(-)

8191c9f6

10 11月, 2010 1 次提交

x86: Remove unnecessary casts of void ptr returning alloc function return values · 8e5e9521

由 Jesper Juhl 提交于 11月 09, 2010

The [vk][cmz]alloc(_node) family of functions return void
pointers which it's completely unnecessary/pointless to cast to
other pointer types since that happens implicitly.

This patch removes such casts from arch/x86.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Cc: trivial@kernel.org
Cc: amd64-microcode@amd64.org
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <alpine.LNX.2.00.1011082310220.23697@swampdragon.chaosbits.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8e5e9521

27 10月, 2010 1 次提交

x86: Move uv to platform · 329b84e4

由 Thomas Gleixner 提交于 10月 23, 2010

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Mike Travis <travis@sgi.com>

329b84e4

15 10月, 2010 1 次提交

llseek: automatically add .llseek fop · 6038f373

由 Arnd Bergmann 提交于 8月 15, 2010

All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time.  Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
//   but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
   *off = E
|
   *off += E
|
   func(..., off, ...)
|
   E = *off
)
...+>
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
  *off = E
|
  *off += E
|
  func(..., off, ...)
|
  E = *off
)
...+>
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
 ...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
 .llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
 .read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
 .write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
 .open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
...  .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
...  .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
...  .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+	.llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
 .write = write_f,
 .read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>

6038f373

30 9月, 2010 1 次提交

x86, UV: Use allocated buffer in tlb_uv.c:tunables_read() · b365a85c

由 Dan Carpenter 提交于 9月 29, 2010

The original code didn't check that the value returned from
snprintf() was less than the size of the buffer.  Although it
didn't cause a runtime bug in this case, it makes the static
checkers complain.

Andrew Morton suggested a dynamically sized buffer would be
cleaner.
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDan Carpenter <error27@gmail.com>
Cc: Cliff Wickman <cpw@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
LKML-Reference: <20100929083118.GA6376@bicker>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b365a85c

01 8月, 2010 1 次提交

x86, UV: Initialize BAU hub map · c4026cfd

由 Cliff Wickman 提交于 7月 30, 2010

Fix uninitialized uvhub_mask:

- An unitialized bit map variable was causing initialization of
  non-existant hubs (this one causes boot panics).

- And the bit map was too small for large machines.  This patch
  makes it dynamic in size.

- Fix the case where socket 0 has no enabled cpu's. Don't assume
  every hub has a socket 0.

- uv_init_per_cpu() should be __init.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org> # for .35.x
LKML-Reference: <E1Oeuyt-0004XS-0y@eag09.americas.sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c4026cfd

17 7月, 2010 1 次提交

x86, UV: Initialize BAU MMRs only on hubs with cpus · 93a7ca0c

由 Cliff Wickman 提交于 7月 16, 2010

Remove the initialization of MMRs
UVH_LB_BAU_SB_ACTIVATION_CONTROL and UVH_BAU_DATA_BROADCAST on
UV hubs that have no active cpus. Such initialization on hubs
with no active cpus would result in a kernel page fault.

This is not of real high priority, because we don't have any
such systems (with UV hubs that have no active cpus).  But they
will be coming.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
LKML-Reference: <E1OZmZN-0006cW-RC@eag09.americas.sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

93a7ca0c

09 6月, 2010 6 次提交

x86, UV: Modularize BAU send and wait · f6d8a566

由 Cliff Wickman 提交于 6月 02, 2010

Streamline the large uv_flush_send_and_wait() function by use of
a couple of helper functions.

And remove some excess comments.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004ay-IH@eag09.americas.sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f6d8a566

x86, UV: BAU broadcast to the local hub · 450a007e

由 Cliff Wickman 提交于 6月 02, 2010

Make the Broadcast Assist Unit driver use the BAU for TLB
shootdowns of cpu's on the local uvhub.

It was previously thought that IPI might be faster to the cpu's
on the local hub.  But the IPI operation would have to follow
the completion of the BAU broadcast anyway.  So we broadcast to
the local uvhub in all cases except when the current cpu was the
only local cpu in the mask.

This simplifies uv_flush_send_and_wait() in that it returns
either all shootdowns complete, or none.

Adjust the statistics to account for shootdowns on the local
uvhub.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004aq-G7@eag09.americas.sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

450a007e

x86, UV: Correct BAU regular message type · 7fba1bcd

由 Cliff Wickman 提交于 6月 02, 2010

The Broadcast Assist Unit messages have a regular or retry
message type. The regular type was not being set, but needs to
be, because the lack of a message type is sometimes used to
identify an unused entry in the message queue.
Also removing some excess comments.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004ak-Dy@eag09.americas.sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7fba1bcd

x86, UV: Remove BAU check for stay-busy · 90cc7d94

由 Cliff Wickman 提交于 6月 02, 2010

Remove a faulty assumption that a long running BAU request has
encountered a hardware problem and will never finish.

Numalink congestion can make a request appear to have
encountered such a problem, but it is not safe to cancel the
request.  If such a cancel is done but a reply is later received
we can miss a TLB shootdown.

We depend upon the max_bau_concurrent 'throttle' to prevent the
stay-busy case from happening.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004ad-BV@eag09.americas.sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

90cc7d94

x86, UV: Correct BAU discovery of hubs and sockets · a8328ee5

由 Cliff Wickman 提交于 6月 02, 2010

Correct the initialization-time assumption of contigous blade
numbers and of sockets numbered from zero.

There may be hubs present with no cpu's enabled.
There may be disabled sockets such that the active socket is not
number zero.

And assign a 'socket master' by assuming that a socket is a
node. (it is not safe to extract socket number from an apicid)
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004aW-9S@eag09.americas.sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a8328ee5

x86, UV: Correct BAU software acknowledge · 39847e7f

由 Cliff Wickman 提交于 6月 02, 2010

Correct the acknowledgment and the reset of a BAU
software-acknowledged message.

A retry message should be testing only for timed-out resources
(mask << 8). (And we delete a log message that might cause
unnecessary concern) The acknowledge MMR is
|--timed-out--|---pending--|,  each is 8 bits.

The IPI-driven reset of software acknowledge resources frees
both timed out and pending resources.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Cc: gregkh@suse.de
LKML-Reference: <E1OJvNy-0004aP-7O@eag09.americas.sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

39847e7f

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功