1. 02 3月, 2015 3 次提交
  2. 14 2月, 2015 1 次提交
    • A
      mm: vmalloc: pass additional vm_flags to __vmalloc_node_range() · cb9e3c29
      Andrey Ryabinin 提交于
      For instrumenting global variables KASan will shadow memory backing memory
      for modules.  So on module loading we will need to allocate memory for
      shadow and map it at address in shadow that corresponds to the address
      allocated in module_alloc().
      
      __vmalloc_node_range() could be used for this purpose, except it puts a
      guard hole after allocated area.  Guard hole in shadow memory should be a
      problem because at some future point we might need to have a shadow memory
      at address occupied by guard hole.  So we could fail to allocate shadow
      for module_alloc().
      
      Now we have VM_NO_GUARD flag disabling guard page, so we need to pass into
      __vmalloc_node_range().  Add new parameter 'vm_flags' to
      __vmalloc_node_range() function.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: NAndrey Konovalov <adech.fo@gmail.com>
      Cc: Yuri Gribov <tetra2005@gmail.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb9e3c29
  3. 13 2月, 2015 1 次提交
    • A
      all arches, signal: move restart_block to struct task_struct · f56141e3
      Andy Lutomirski 提交于
      If an attacker can cause a controlled kernel stack overflow, overwriting
      the restart block is a very juicy exploit target.  This is because the
      restart_block is held in the same memory allocation as the kernel stack.
      
      Moving the restart block to struct task_struct prevents this exploit by
      making the restart_block harder to locate.
      
      Note that there are other fields in thread_info that are also easy
      targets, at least on some architectures.
      
      It's also a decent simplification, since the restart code is more or less
      identical on all architectures.
      
      [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Miller <davem@davemloft.net>
      Acked-by: NRichard Weinberger <richard@nod.at>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Chen Liqin <liqin.linux@gmail.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f56141e3
  4. 17 1月, 2015 1 次提交
  5. 14 12月, 2014 1 次提交
  6. 12 12月, 2014 2 次提交
  7. 23 11月, 2014 2 次提交
    • T
      PCI/MSI: Rename mask/unmask_msi_irq treewide · 280510f1
      Thomas Gleixner 提交于
      The PCI/MSI irq chip callbacks mask/unmask_msi_irq have been renamed
      to pci_msi_mask/unmask_irq to mark them PCI specific. Rename all usage
      sites. The conversion helper functions are kept around to avoid
      conflicts in next and will be removed after merging into mainline.
      
      Coccinelle assisted conversion. No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: x86@kernel.org
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: Murali Karicheri <m-karicheri2@ti.com>
      Cc: Thierry Reding <thierry.reding@gmail.com>
      Cc: Mohit Kumar <mohit.kumar@st.com>
      Cc: Simon Horman <horms@verge.net.au>
      Cc: Michal Simek <michal.simek@xilinx.com>
      Cc: Yijing Wang <wangyijing@huawei.com>
      280510f1
    • J
      PCI/MSI: Rename write_msi_msg() to pci_write_msi_msg() · 83a18912
      Jiang Liu 提交于
      Rename write_msi_msg() to pci_write_msi_msg() to mark it as PCI
      specific.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
      Cc: Yijing Wang <wangyijing@huawei.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      83a18912
  8. 08 11月, 2014 1 次提交
    • D
      sparc64: Do irq_{enter,exit}() around generic_smp_call_function*(). · ab5c7809
      David S. Miller 提交于
      Otherwise rcu_irq_{enter,exit}() do not happen and we get dumps like:
      
      ====================
      [  188.275021] ===============================
      [  188.309351] [ INFO: suspicious RCU usage. ]
      [  188.343737] 3.18.0-rc3-00068-g20f3963d-dirty #54 Not tainted
      [  188.394786] -------------------------------
      [  188.429170] include/linux/rcupdate.h:883 rcu_read_lock() used
      illegally while idle!
      [  188.505235]
      other info that might help us debug this:
      
      [  188.554230]
      RCU used illegally from idle CPU!
      rcu_scheduler_active = 1, debug_locks = 0
      [  188.637587] RCU used illegally from extended quiescent state!
      [  188.690684] 3 locks held by swapper/7/0:
      [  188.721932]  #0:  (&x->wait#11){......}, at: [<0000000000495de8>] complete+0x8/0x60
      [  188.797994]  #1:  (&p->pi_lock){-.-.-.}, at: [<000000000048510c>] try_to_wake_up+0xc/0x400
      [  188.881343]  #2:  (rcu_read_lock){......}, at: [<000000000048a910>] select_task_rq_fair+0x90/0xb40
      [  188.973043]stack backtrace:
      [  188.993879] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.18.0-rc3-00068-g20f3963d-dirty #54
      [  189.076187] Call Trace:
      [  189.089719]  [0000000000499360] lockdep_rcu_suspicious+0xe0/0x100
      [  189.147035]  [000000000048a99c] select_task_rq_fair+0x11c/0xb40
      [  189.202253]  [00000000004852d8] try_to_wake_up+0x1d8/0x400
      [  189.252258]  [000000000048554c] default_wake_function+0xc/0x20
      [  189.306435]  [0000000000495554] __wake_up_common+0x34/0x80
      [  189.356448]  [00000000004955b4] __wake_up_locked+0x14/0x40
      [  189.406456]  [0000000000495e08] complete+0x28/0x60
      [  189.448142]  [0000000000636e28] blk_end_sync_rq+0x8/0x20
      [  189.496057]  [0000000000639898] __blk_mq_end_request+0x18/0x60
      [  189.550249]  [00000000006ee014] scsi_end_request+0x94/0x180
      [  189.601286]  [00000000006ee334] scsi_io_completion+0x1d4/0x600
      [  189.655463]  [00000000006e51c4] scsi_finish_command+0xc4/0xe0
      [  189.708598]  [00000000006ed958] scsi_softirq_done+0x118/0x140
      [  189.761735]  [00000000006398ec] __blk_mq_complete_request_remote+0xc/0x20
      [  189.827383]  [00000000004c75d0] generic_smp_call_function_single_interrupt+0x150/0x1c0
      [  189.906581]  [000000000043e514] smp_call_function_single_client+0x14/0x40
      ====================
      
      Based almost entirely upon a patch by Paul E. McKenney.
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab5c7809
  9. 01 11月, 2014 1 次提交
    • D
      sparc64: Fix crashes in schizo_pcierr_intr_other(). · 7da89a2a
      David S. Miller 提交于
      Meelis Roos reports crashes during bootup on a V480 that look like
      this:
      
      ====================
      [   61.300577] PCI: Scanning PBM /pci@9,600000
      [   61.304867] schizo f009b070: PCI host bridge to bus 0003:00
      [   61.310385] pci_bus 0003:00: root bus resource [io  0x7ffe9000000-0x7ffe9ffffff] (bus address [0x0000-0xffffff])
      [   61.320515] pci_bus 0003:00: root bus resource [mem 0x7fb00000000-0x7fbffffffff] (bus address [0x00000000-0xffffffff])
      [   61.331173] pci_bus 0003:00: root bus resource [bus 00]
      [   61.385344] Unable to handle kernel NULL pointer dereference
      [   61.390970] tsk->{mm,active_mm}->context = 0000000000000000
      [   61.396515] tsk->{mm,active_mm}->pgd = fff000b000002000
      [   61.401716]               \|/ ____ \|/
      [   61.401716]               "@'/ .. \`@"
      [   61.401716]               /_| \__/ |_\
      [   61.401716]                  \__U_/
      [   61.416362] swapper/0(0): Oops [#1]
      [   61.419837] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc1-00422-g2cc91884-dirty #24
      [   61.427975] task: fff000b0fd8e9c40 ti: fff000b0fd928000 task.ti: fff000b0fd928000
      [   61.435426] TSTATE: 0000004480e01602 TPC: 00000000004455e4 TNPC: 00000000004455e8 Y: 00000000    Not tainted
      [   61.445230] TPC: <schizo_pcierr_intr+0x104/0x560>
      [   61.449897] g0: 0000000000000000 g1: 0000000000000000 g2: 0000000000a10f78 g3: 000000000000000a
      [   61.458563] g4: fff000b0fd8e9c40 g5: fff000b0fdd82000 g6: fff000b0fd928000 g7: 000000000000000a
      [   61.467229] o0: 000000000000003d o1: 0000000000000000 o2: 0000000000000006 o3: fff000b0ffa5fc7e
      [   61.475894] o4: 0000000000060000 o5: c000000000000000 sp: fff000b0ffa5f3c1 ret_pc: 00000000004455cc
      [   61.484909] RPC: <schizo_pcierr_intr+0xec/0x560>
      [   61.489500] l0: fff000b0fd8e9c40 l1: 0000000000a20800 l2: 0000000000000000 l3: 000000000119a430
      [   61.498164] l4: 0000000001742400 l5: 00000000011cfbe0 l6: 00000000011319c0 l7: fff000b0fd8ea348
      [   61.506830] i0: 0000000000000000 i1: fff000b0fdb34000 i2: 0000000320000000 i3: 0000000000000000
      [   61.515497] i4: 00060002010b003f i5: 0000040004e02000 i6: fff000b0ffa5f481 i7: 00000000004a9920
      [   61.524175] I7: <handle_irq_event_percpu+0x40/0x140>
      [   61.529099] Call Trace:
      [   61.531531]  [00000000004a9920] handle_irq_event_percpu+0x40/0x140
      [   61.537681]  [00000000004a9a58] handle_irq_event+0x38/0x80
      [   61.543145]  [00000000004ac77c] handle_fasteoi_irq+0xbc/0x200
      [   61.548860]  [00000000004a9084] generic_handle_irq+0x24/0x40
      [   61.554500]  [000000000042be0c] handler_irq+0xac/0x100
      ====================
      
      The problem is that pbm->pci_bus->self is NULL.
      
      This code is trying to go through the standard PCI config space
      interfaces to read the PCI controller's PCI_STATUS register.
      
      This doesn't work, because we more often than not do not enumerate
      the PCI controller as a bonafide PCI device during the OF device
      node scan.  Therefore bus->self remains NULL.
      
      Existing common code for PSYCHO and PSYCHO-like PCI controllers
      handles this properly, by doing the config space access directly.
      
      Do the same here, pbm->pci_ops->{read,write}().
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7da89a2a
  10. 29 10月, 2014 1 次提交
  11. 25 10月, 2014 1 次提交
    • D
      sparc64: Fix register corruption in top-most kernel stack frame during boot. · ef3e035c
      David S. Miller 提交于
      Meelis Roos reported that kernels built with gcc-4.9 do not boot, we
      eventually narrowed this down to only impacting machines using
      UltraSPARC-III and derivitive cpus.
      
      The crash happens right when the first user process is spawned:
      
      [   54.451346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
      [   54.451346]
      [   54.571516] CPU: 1 PID: 1 Comm: init Not tainted 3.16.0-rc2-00211-gd7933ab7 #96
      [   54.666431] Call Trace:
      [   54.698453]  [0000000000762f8c] panic+0xb0/0x224
      [   54.759071]  [000000000045cf68] do_exit+0x948/0x960
      [   54.823123]  [000000000042cbc0] fault_in_user_windows+0xe0/0x100
      [   54.902036]  [0000000000404ad0] __handle_user_windows+0x0/0x10
      [   54.978662] Press Stop-A (L1-A) to return to the boot prom
      [   55.050713] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
      
      Further investigation showed that compiling only per_cpu_patch() with
      an older compiler fixes the boot.
      
      Detailed analysis showed that the function is not being miscompiled by
      gcc-4.9, but it is using a different register allocation ordering.
      
      With the gcc-4.9 compiled function, something during the code patching
      causes some of the %i* input registers to get corrupted.  Perhaps
      we have a TLB miss path into the firmware that is deep enough to
      cause a register window spill and subsequent restore when we get
      back from the TLB miss trap.
      
      Let's plug this up by doing two things:
      
      1) Stop using the firmware stack for client interface calls into
         the firmware.  Just use the kernel's stack.
      
      2) As soon as we can, call into a new function "start_early_boot()"
         to put a one-register-window buffer between the firmware's
         deepest stack frame and the top-most initial kernel one.
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef3e035c
  12. 20 10月, 2014 1 次提交
  13. 19 10月, 2014 1 次提交
    • D
      sparc64: Fix corrupted thread fault code. · 84bd6d8b
      David S. Miller 提交于
      Every path that ends up at do_sparc64_fault() must install a valid
      FAULT_CODE_* bitmask in the per-thread fault code byte.
      
      Two paths leading to the label winfix_trampoline (which expects the
      FAULT_CODE_* mask in register %g4) were not doing so:
      
      1) For pre-hypervisor TLB protection violation traps, if we took
         the 'winfix_trampoline' path we wouldn't have %g4 initialized
         with the FAULT_CODE_* value yet.  Resulting in using the
         TLB_TAG_ACCESS register address value instead.
      
      2) In the TSB miss path, when we notice that we are going to use a
         hugepage mapping, but we haven't allocated the hugepage TSB yet, we
         still have to take the window fixup case into consideration and
         in that particular path we leave %g4 not setup properly.
      
      Errors on this sort were largely invisible previously, but after
      commit 4ccb9272 ("sparc64: sun4v TLB
      error power off events") we now have a fault_code mask bit
      (FAULT_CODE_BAD_RA) that triggers due to this bug.
      
      FAULT_CODE_BAD_RA triggers because this bit is set in TLB_TAG_ACCESS
      (see #1 above) and thus we get seemingly random bus errors triggered
      for user processes.
      
      Fixes: 4ccb9272 ("sparc64: sun4v TLB error power off events")
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84bd6d8b
  14. 06 10月, 2014 6 次提交
    • D
      sparc64: Kill unnecessary tables and increase MAX_BANKS. · d195b71b
      David S. Miller 提交于
      swapper_low_pmd_dir and swapper_pud_dir are actually completely
      useless and unnecessary.
      
      We just need swapper_pg_dir[].  Naturally the other page table chunks
      will be allocated on an as-needed basis.  Since the kernel actually
      accesses these tables in the PAGE_OFFSET view, there is not even a TLB
      locality advantage of placing them in the kernel image.
      
      Use the hard coded vmlinux.ld.S slot for swapper_pg_dir which is
      naturally page aligned.
      
      Increase MAX_BANKS to 1024 in order to handle heavily fragmented
      virtual guests.
      
      Even with this MAX_BANKS increase, the kernel is 20K+ smaller.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      d195b71b
    • B
      sparc64: sparse irq · ee6a9333
      bob picco 提交于
      This patch attempts to do a few things. The highlights are: 1) enable
      SPARSE_IRQ unconditionally, 2) kills off !SPARSE_IRQ code 3) allocates
      ivector_table at boot time and 4) default to cookie only VIRQ mechanism
      for supported firmware. The first firmware with cookie only support for
      me appears on T5. You can optionally force the HV firmware to not cookie
      only mode which is the sysino support.
      
      The sysino is a deprecated HV mechanism according to the most recent
      SPARC Virtual Machine Specification. HV_GRP_INTR is what controls the
      cookie/sysino firmware versioning.
      
      The history of this interface is:
      
      1) Major version 1.0 only supported sysino based interrupt interfaces.
      
      2) Major version 2.0 added cookie based VIRQs, however due to the fact
         that OSs were using the VIRQs without negoatiating major version
         2.0 (Linux and Solaris are both guilty), the VIRQs calls were
         allowed even with major version 1.0
      
         To complicate things even further, the VIRQ interfaces were only
         actually hooked up in the hypervisor for LDC interrupt sources.
         VIRQ calls on other device types would result in HV_EINVAL errors.
      
         So effectively, major version 2.0 is unusable.
      
      3) Major version 3.0 was created to signal use of VIRQs and the fact
         that the hypervisor has these calls hooked up for all interrupt
         sources, not just those for LDC devices.
      
      A new boot option is provided should cookie only HV support have issues.
      hvirq - this is the version for HV_GRP_INTR. This is related to HV API
      versioning.  The code attempts major=3 first by default. The option can
      be used to override this default.
      
      I've tested with SPARSE_IRQ on T5-8, M7-4 and T4-X and Jalap?no.
      Signed-off-by: NBob Picco <bob.picco@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee6a9333
    • D
      sparc64: Adjust vmalloc region size based upon available virtual address bits. · bb4e6e85
      David S. Miller 提交于
      In order to accomodate embedded per-cpu allocation with large numbers
      of cpus and numa nodes, we have to use as much virtual address space
      as possible for the vmalloc region.  Otherwise we can get things like:
      
      PERCPU: max_distance=0x380001c10000 too large for vmalloc space 0xff00000000
      
      So, once we select a value for PAGE_OFFSET, derive the size of the
      vmalloc region based upon that.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      bb4e6e85
    • D
      sparc64: Use kernel page tables for vmemmap. · c06240c7
      David S. Miller 提交于
      For sparse memory configurations, the vmemmap array behaves terribly
      and it takes up an inordinate amount of space in the BSS section of
      the kernel image unconditionally.
      
      Just build huge PMDs and look them up just like we do for TLB misses
      in the vmalloc area.
      
      Kernel BSS shrinks by about 2MB.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      c06240c7
    • D
      sparc64: Fix physical memory management regressions with large max_phys_bits. · 0dd5b7b0
      David S. Miller 提交于
      If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
      DEBUG_PAGEALLOC stop working because the 3-level page tables only
      can cover up to 43 bits.
      
      Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
      47, several statically allocated tables became enormous.
      
      Compounding this is that we will need to support up to 49 bits of
      physical addressing for M7 chips.
      
      The two tables in question are sparc64_valid_addr_bitmap and
      kpte_linear_bitmap.
      
      The first holds a bitmap, with 1 bit for each 4MB chunk of physical
      memory, indicating whether that chunk actually exists in the machine
      and is valid.
      
      The second table is a set of 2-bit values which tell how large of a
      mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
      chunk of ram in the system.
      
      These tables are huge and take up an enormous amount of the BSS
      section of the sparc64 kernel image.  Specifically, the
      sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.
      
      So let's solve the space wastage and the DEBUG_PAGEALLOC problem
      at the same time, by using the kernel page tables (as designed) to
      manage this information.
      
      We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
      and we do this by encoding huge PMDs and PUDs.
      
      On a T4-2 with 256GB of ram the kernel page table takes up 16K with
      DEBUG_PAGEALLOC disabled and 256MB with it enabled.  Furthermore, this
      memory is dynamically allocated at run time rather than coded
      statically into the kernel image.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      0dd5b7b0
    • D
      sparc64: Switch to 4-level page tables. · ac55c768
      David S. Miller 提交于
      This has become necessary with chips that support more than 43-bits
      of physical addressing.
      
      Based almost entirely upon a patch by Bob Picco.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      ac55c768
  15. 01 10月, 2014 4 次提交
  16. 24 9月, 2014 1 次提交
    • E
      ARCH: AUDIT: audit_syscall_entry() should not require the arch · 91397401
      Eric Paris 提交于
      We have a function where the arch can be queried, syscall_get_arch().
      So rather than have every single piece of arch specific code use and/or
      duplicate syscall_get_arch(), just have the audit code use the
      syscall_get_arch() code.
      Based-on-patch-by: NRichard Briggs <rgb@redhat.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Cc: linux-alpha@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-ia64@vger.kernel.org
      Cc: microblaze-uclinux@itee.uq.edu.au
      Cc: linux-mips@linux-mips.org
      Cc: linux@lists.openrisc.net
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-s390@vger.kernel.org
      Cc: linux-sh@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Cc: linux-xtensa@linux-xtensa.org
      Cc: x86@kernel.org
      91397401
  17. 17 9月, 2014 4 次提交
    • S
      sparc64: Move request_irq() from ldc_bind() to ldc_alloc() · c21c4ab0
      Sowmini Varadhan 提交于
      The request_irq() needs to be done from ldc_alloc()
      to avoid the following (caught by lockdep)
      
       [00000000004a0738] __might_sleep+0xf8/0x120
       [000000000058bea4] kmem_cache_alloc_trace+0x184/0x2c0
       [00000000004faf80] request_threaded_irq+0x80/0x160
       [000000000044f71c] ldc_bind+0x7c/0x220
       [0000000000452454] vio_port_up+0x54/0xe0
       [00000000101f6778] probe_disk+0x38/0x220 [sunvdc]
       [00000000101f6b8c] vdc_port_probe+0x22c/0x300 [sunvdc]
       [0000000000451a88] vio_device_probe+0x48/0x60
       [000000000074c56c] really_probe+0x6c/0x300
       [000000000074c83c] driver_probe_device+0x3c/0xa0
       [000000000074c92c] __driver_attach+0x8c/0xa0
       [000000000074a6ec] bus_for_each_dev+0x6c/0xa0
       [000000000074c1dc] driver_attach+0x1c/0x40
       [000000000074b0fc] bus_add_driver+0xbc/0x280
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c21c4ab0
    • B
      sparc64: T5 PMU · 05aa1651
      bob picco 提交于
      The T5 (niagara5) has different PCR related HV fast trap values and a new
      HV API Group. This patch utilizes these and shares when possible with niagara4.
      
      We use the same sparc_pmu niagara4_pmu. Should there be new effort to
      obtain the MCU perf statistics then this would have to be changed.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NBob Picco <bob.picco@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      05aa1651
    • B
      sparc64: mem boot option correction · 7c21d533
      bob picco 提交于
      The "mem" boot option can result in many unexpected consequences. This patch
      attempts to prevent boot hangs which have been experienced on T4-4 and T5-8.
      Basically the boot loader allocates vmlinuz and initrd higher in available
      OBP physical memory. For example, on a 2Tb T5-8 it isn't possible to boot
      with mem=20G.
      
      The patch utilizes memblock to avoid reserved regions and trim memory which
      is only free. Other improvements are possible for a multi-node machine.
      
      This is a snippet of the boot log with mem=20G on T5-8 with the patch applied:
      MEMBLOCK configuration:	<- before memory reduction
       memory size = 0x1ffad6ce000 reserved size = 0xa1adf44
       memory.cnt  = 0xb
       memory[0x0]    [0x00000030400000-0x00003fdde47fff], 0x3fada48000 bytes
       memory[0x1]    [0x00003fdde4e000-0x00003fdde4ffff], 0x2000 bytes
       memory[0x2]    [0x00080000000000-0x00083fffffffff], 0x4000000000 bytes
       memory[0x3]    [0x00100000000000-0x00103fffffffff], 0x4000000000 bytes
       memory[0x4]    [0x00180000000000-0x00183fffffffff], 0x4000000000 bytes
       memory[0x5]    [0x00200000000000-0x00203fffffffff], 0x4000000000 bytes
       memory[0x6]    [0x00280000000000-0x00283fffffffff], 0x4000000000 bytes
       memory[0x7]    [0x00300000000000-0x00303fffffffff], 0x4000000000 bytes
       memory[0x8]    [0x00380000000000-0x00383fffc71fff], 0x3fffc72000 bytes
       memory[0x9]    [0x00383fffc92000-0x00383fffca1fff], 0x10000 bytes
       memory[0xa]    [0x00383fffcb4000-0x00383fffcb5fff], 0x2000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes
       reserved[0x1]  [0x00380004000000-0x0038000d02f74a], 0x902f74b bytes
      ...
      MEMBLOCK configuration:	<- after reduction of memory
       memory size = 0x50a1adf44 reserved size = 0xa1adf44
       memory.cnt  = 0x4
       memory[0x0]    [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes
       memory[0x1]    [0x00380004000000-0x0038050d01d74a], 0x50901d74b bytes
       memory[0x2]    [0x00383fffc92000-0x00383fffca1fff], 0x10000 bytes
       memory[0x3]    [0x00383fffcb4000-0x00383fffcb5fff], 0x2000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes
       reserved[0x1]  [0x00380004000000-0x0038000d02f74a], 0x902f74b bytes
      ...
      Early memory node ranges
        node   7: [mem 0x380000000000-0x38000117dfff]
        node   7: [mem 0x380004000000-0x380f0d01bfff]
        node   7: [mem 0x383fffc92000-0x383fffca1fff]
        node   7: [mem 0x383fffcb4000-0x383fffcb5fff]
      Could not find start_pfn for node 0
      Could not find start_pfn for node 1
      Could not find start_pfn for node 2
      Could not find start_pfn for node 3
      Could not find start_pfn for node 4
      Could not find start_pfn for node 5
      Could not find start_pfn for node 6
      .
      
      The patch was tested on T4-1, T5-8 and Jalap?no.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NBob Picco <bob.picco@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c21d533
    • B
      sparc64: sun4v TLB error power off events · 4ccb9272
      bob picco 提交于
      We've witnessed a few TLB events causing the machine to power off because
      of prom_halt. In one case it was some nfs related area during rmmod. Another
      was an mmapper of /dev/mem. A more recent one is an ITLB issue with
      a bad pagesize which could be a hardware bug. Bugs happen but we should
      attempt to not power off the machine and/or hang it when possible.
      
      This is a DTLB error from an mmapper of /dev/mem:
      [root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1
      SUN4V-DTLB: TPC<0xfffff80100903e6c>
      SUN4V-DTLB: O7[fffff801081979d0]
      SUN4V-DTLB: O7<0xfffff801081979d0>
      SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2]
      .
      
      This is recent mainline for ITLB:
      [ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc>
      [ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8]
      [ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8>
      [ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4]
      .
      
      Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call
      prom_halt() and drop us to OF command prompt "ok". This isn't the case for
      LDOMs and the machine powers off.
      
      For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause
      a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask
      of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal
      one("1"). Otherwise, for %tl > 1,  we proceed eventually to die_if_kernel().
      
      The logic of this patch was partially inspired by David Miller's feedback.
      
      Power off of large sparc64 machines is painful. Plus die_if_kernel provides
      more context. A reset sequence isn't a brief period on large sparc64 but
      better than power-off/power-on sequence.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: NBob Picco <bob.picco@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ccb9272
  18. 11 9月, 2014 1 次提交
  19. 10 9月, 2014 4 次提交
  20. 27 8月, 2014 1 次提交
    • C
      sparc: Replace __get_cpu_var uses · 494fc421
      Christoph Lameter 提交于
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      At the end of the patch set all uses of __get_cpu_var have been removed so
      the macro is removed too.
      
      The patch set includes passes over all arches as well. Once these operations
      are used throughout then specialized macros can be defined in non -x86
      arches as well in order to optimize per cpu access by f.e.  using a global
      register that may be set to the per cpu base.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: sparclinux@vger.kernel.org
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      494fc421
  21. 14 8月, 2014 2 次提交