1. 12 12月, 2014 3 次提交
  2. 11 12月, 2014 2 次提交
  3. 09 12月, 2014 1 次提交
  4. 06 12月, 2014 1 次提交
    • A
      net: sock: allow eBPF programs to be attached to sockets · 89aa0758
      Alexei Starovoitov 提交于
      introduce new setsockopt() command:
      
      setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))
      
      where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
      and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER
      
      setsockopt() calls bpf_prog_get() which increments refcnt of the program,
      so it doesn't get unloaded while socket is using the program.
      
      The same eBPF program can be attached to multiple sockets.
      
      User task exit automatically closes socket which calls sk_filter_uncharge()
      which decrements refcnt of eBPF program
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89aa0758
  5. 27 11月, 2014 1 次提交
  6. 23 11月, 2014 2 次提交
    • T
      PCI/MSI: Rename mask/unmask_msi_irq treewide · 280510f1
      Thomas Gleixner 提交于
      The PCI/MSI irq chip callbacks mask/unmask_msi_irq have been renamed
      to pci_msi_mask/unmask_irq to mark them PCI specific. Rename all usage
      sites. The conversion helper functions are kept around to avoid
      conflicts in next and will be removed after merging into mainline.
      
      Coccinelle assisted conversion. No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: x86@kernel.org
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: Murali Karicheri <m-karicheri2@ti.com>
      Cc: Thierry Reding <thierry.reding@gmail.com>
      Cc: Mohit Kumar <mohit.kumar@st.com>
      Cc: Simon Horman <horms@verge.net.au>
      Cc: Michal Simek <michal.simek@xilinx.com>
      Cc: Yijing Wang <wangyijing@huawei.com>
      280510f1
    • J
      PCI/MSI: Rename write_msi_msg() to pci_write_msi_msg() · 83a18912
      Jiang Liu 提交于
      Rename write_msi_msg() to pci_write_msi_msg() to mark it as PCI
      specific.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Yingjoe Chen <yingjoe.chen@mediatek.com>
      Cc: Yijing Wang <wangyijing@huawei.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      83a18912
  7. 19 11月, 2014 1 次提交
    • A
      sparc: io: remove duplicate relaxed accessors on sparc32 · 7c3969c3
      Arnd Bergmann 提交于
      Commit 1191ccb3 ("sparc: io: implement dummy relaxed accessor
      macros for writes") added the relaxed accessors (readl_relaxed etc) in
      a file that is shared between sparc32 and sparc64. However, the earlier
      e1039fb4 ("sparc32: introduce asm-generic/io.h") had already changed
      the sparc32 implementation to use asm-generic/io.h, which provides the
      same macros, resulting in lots of build errors.
      
      This moves the definitions from the shared sparc file into the
      sparc64-only file to fix the sparc32 build regression.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Fixes: 1191ccb3 ("sparc: io: implement dummy relaxed accessor macros for writes")
      7c3969c3
  8. 17 11月, 2014 1 次提交
  9. 12 11月, 2014 1 次提交
    • E
      net: introduce SO_INCOMING_CPU · 2c8c56e1
      Eric Dumazet 提交于
      Alternative to RPS/RFS is to use hardware support for multiple
      queues.
      
      Then split a set of million of sockets into worker threads, each
      one using epoll() to manage events on its own socket pool.
      
      Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
      know after accept() or connect() on which queue/cpu a socket is managed.
      
      We normally use one cpu per RX queue (IRQ smp_affinity being properly
      set), so remembering on socket structure which cpu delivered last packet
      is enough to solve the problem.
      
      After accept(), connect(), or even file descriptor passing around
      processes, applications can use :
      
       int cpu;
       socklen_t len = sizeof(cpu);
      
       getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);
      
      And use this information to put the socket into the right silo
      for optimal performance, as all networking stack should run
      on the appropriate cpu, without need to send IPI (RPS/RFS).
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c8c56e1
  10. 08 11月, 2014 2 次提交
    • A
      sparc32: Implement xchg and atomic_xchg using ATOMIC_HASH locks · 1a17fdc4
      Andreas Larsson 提交于
      Atomicity between xchg and cmpxchg cannot be guaranteed when xchg is
      implemented with a swap and cmpxchg is implemented with locks.
      Without this, e.g. mcs_spin_lock and mcs_spin_unlock are broken.
      Signed-off-by: NAndreas Larsson <andreas@gaisler.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a17fdc4
    • D
      sparc64: Do irq_{enter,exit}() around generic_smp_call_function*(). · ab5c7809
      David S. Miller 提交于
      Otherwise rcu_irq_{enter,exit}() do not happen and we get dumps like:
      
      ====================
      [  188.275021] ===============================
      [  188.309351] [ INFO: suspicious RCU usage. ]
      [  188.343737] 3.18.0-rc3-00068-g20f3963d-dirty #54 Not tainted
      [  188.394786] -------------------------------
      [  188.429170] include/linux/rcupdate.h:883 rcu_read_lock() used
      illegally while idle!
      [  188.505235]
      other info that might help us debug this:
      
      [  188.554230]
      RCU used illegally from idle CPU!
      rcu_scheduler_active = 1, debug_locks = 0
      [  188.637587] RCU used illegally from extended quiescent state!
      [  188.690684] 3 locks held by swapper/7/0:
      [  188.721932]  #0:  (&x->wait#11){......}, at: [<0000000000495de8>] complete+0x8/0x60
      [  188.797994]  #1:  (&p->pi_lock){-.-.-.}, at: [<000000000048510c>] try_to_wake_up+0xc/0x400
      [  188.881343]  #2:  (rcu_read_lock){......}, at: [<000000000048a910>] select_task_rq_fair+0x90/0xb40
      [  188.973043]stack backtrace:
      [  188.993879] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.18.0-rc3-00068-g20f3963d-dirty #54
      [  189.076187] Call Trace:
      [  189.089719]  [0000000000499360] lockdep_rcu_suspicious+0xe0/0x100
      [  189.147035]  [000000000048a99c] select_task_rq_fair+0x11c/0xb40
      [  189.202253]  [00000000004852d8] try_to_wake_up+0x1d8/0x400
      [  189.252258]  [000000000048554c] default_wake_function+0xc/0x20
      [  189.306435]  [0000000000495554] __wake_up_common+0x34/0x80
      [  189.356448]  [00000000004955b4] __wake_up_locked+0x14/0x40
      [  189.406456]  [0000000000495e08] complete+0x28/0x60
      [  189.448142]  [0000000000636e28] blk_end_sync_rq+0x8/0x20
      [  189.496057]  [0000000000639898] __blk_mq_end_request+0x18/0x60
      [  189.550249]  [00000000006ee014] scsi_end_request+0x94/0x180
      [  189.601286]  [00000000006ee334] scsi_io_completion+0x1d4/0x600
      [  189.655463]  [00000000006e51c4] scsi_finish_command+0xc4/0xe0
      [  189.708598]  [00000000006ed958] scsi_softirq_done+0x118/0x140
      [  189.761735]  [00000000006398ec] __blk_mq_complete_request_remote+0xc/0x20
      [  189.827383]  [00000000004c75d0] generic_smp_call_function_single_interrupt+0x150/0x1c0
      [  189.906581]  [000000000043e514] smp_call_function_single_client+0x14/0x40
      ====================
      
      Based almost entirely upon a patch by Paul E. McKenney.
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab5c7809
  11. 01 11月, 2014 1 次提交
    • D
      sparc64: Fix crashes in schizo_pcierr_intr_other(). · 7da89a2a
      David S. Miller 提交于
      Meelis Roos reports crashes during bootup on a V480 that look like
      this:
      
      ====================
      [   61.300577] PCI: Scanning PBM /pci@9,600000
      [   61.304867] schizo f009b070: PCI host bridge to bus 0003:00
      [   61.310385] pci_bus 0003:00: root bus resource [io  0x7ffe9000000-0x7ffe9ffffff] (bus address [0x0000-0xffffff])
      [   61.320515] pci_bus 0003:00: root bus resource [mem 0x7fb00000000-0x7fbffffffff] (bus address [0x00000000-0xffffffff])
      [   61.331173] pci_bus 0003:00: root bus resource [bus 00]
      [   61.385344] Unable to handle kernel NULL pointer dereference
      [   61.390970] tsk->{mm,active_mm}->context = 0000000000000000
      [   61.396515] tsk->{mm,active_mm}->pgd = fff000b000002000
      [   61.401716]               \|/ ____ \|/
      [   61.401716]               "@'/ .. \`@"
      [   61.401716]               /_| \__/ |_\
      [   61.401716]                  \__U_/
      [   61.416362] swapper/0(0): Oops [#1]
      [   61.419837] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc1-00422-g2cc91884-dirty #24
      [   61.427975] task: fff000b0fd8e9c40 ti: fff000b0fd928000 task.ti: fff000b0fd928000
      [   61.435426] TSTATE: 0000004480e01602 TPC: 00000000004455e4 TNPC: 00000000004455e8 Y: 00000000    Not tainted
      [   61.445230] TPC: <schizo_pcierr_intr+0x104/0x560>
      [   61.449897] g0: 0000000000000000 g1: 0000000000000000 g2: 0000000000a10f78 g3: 000000000000000a
      [   61.458563] g4: fff000b0fd8e9c40 g5: fff000b0fdd82000 g6: fff000b0fd928000 g7: 000000000000000a
      [   61.467229] o0: 000000000000003d o1: 0000000000000000 o2: 0000000000000006 o3: fff000b0ffa5fc7e
      [   61.475894] o4: 0000000000060000 o5: c000000000000000 sp: fff000b0ffa5f3c1 ret_pc: 00000000004455cc
      [   61.484909] RPC: <schizo_pcierr_intr+0xec/0x560>
      [   61.489500] l0: fff000b0fd8e9c40 l1: 0000000000a20800 l2: 0000000000000000 l3: 000000000119a430
      [   61.498164] l4: 0000000001742400 l5: 00000000011cfbe0 l6: 00000000011319c0 l7: fff000b0fd8ea348
      [   61.506830] i0: 0000000000000000 i1: fff000b0fdb34000 i2: 0000000320000000 i3: 0000000000000000
      [   61.515497] i4: 00060002010b003f i5: 0000040004e02000 i6: fff000b0ffa5f481 i7: 00000000004a9920
      [   61.524175] I7: <handle_irq_event_percpu+0x40/0x140>
      [   61.529099] Call Trace:
      [   61.531531]  [00000000004a9920] handle_irq_event_percpu+0x40/0x140
      [   61.537681]  [00000000004a9a58] handle_irq_event+0x38/0x80
      [   61.543145]  [00000000004ac77c] handle_fasteoi_irq+0xbc/0x200
      [   61.548860]  [00000000004a9084] generic_handle_irq+0x24/0x40
      [   61.554500]  [000000000042be0c] handler_irq+0xac/0x100
      ====================
      
      The problem is that pbm->pci_bus->self is NULL.
      
      This code is trying to go through the standard PCI config space
      interfaces to read the PCI controller's PCI_STATUS register.
      
      This doesn't work, because we more often than not do not enumerate
      the PCI controller as a bonafide PCI device during the OF device
      node scan.  Therefore bus->self remains NULL.
      
      Existing common code for PSYCHO and PSYCHO-like PCI controllers
      handles this properly, by doing the config space access directly.
      
      Do the same here, pbm->pci_ops->{read,write}().
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7da89a2a
  12. 29 10月, 2014 1 次提交
  13. 25 10月, 2014 2 次提交
    • D
      sparc64: Implement __get_user_pages_fast(). · 06090e8e
      David S. Miller 提交于
      It is not sufficient to only implement get_user_pages_fast(), you
      must also implement the atomic version __get_user_pages_fast()
      otherwise you end up using the weak symbol fallback implementation
      which simply returns zero.
      
      This is dangerous, because it causes the futex code to loop forever
      if transparent hugepages are supported (see get_futex_key()).
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06090e8e
    • D
      sparc64: Fix register corruption in top-most kernel stack frame during boot. · ef3e035c
      David S. Miller 提交于
      Meelis Roos reported that kernels built with gcc-4.9 do not boot, we
      eventually narrowed this down to only impacting machines using
      UltraSPARC-III and derivitive cpus.
      
      The crash happens right when the first user process is spawned:
      
      [   54.451346] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
      [   54.451346]
      [   54.571516] CPU: 1 PID: 1 Comm: init Not tainted 3.16.0-rc2-00211-gd7933ab7 #96
      [   54.666431] Call Trace:
      [   54.698453]  [0000000000762f8c] panic+0xb0/0x224
      [   54.759071]  [000000000045cf68] do_exit+0x948/0x960
      [   54.823123]  [000000000042cbc0] fault_in_user_windows+0xe0/0x100
      [   54.902036]  [0000000000404ad0] __handle_user_windows+0x0/0x10
      [   54.978662] Press Stop-A (L1-A) to return to the boot prom
      [   55.050713] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004
      
      Further investigation showed that compiling only per_cpu_patch() with
      an older compiler fixes the boot.
      
      Detailed analysis showed that the function is not being miscompiled by
      gcc-4.9, but it is using a different register allocation ordering.
      
      With the gcc-4.9 compiled function, something during the code patching
      causes some of the %i* input registers to get corrupted.  Perhaps
      we have a TLB miss path into the firmware that is deep enough to
      cause a register window spill and subsequent restore when we get
      back from the TLB miss trap.
      
      Let's plug this up by doing two things:
      
      1) Stop using the firmware stack for client interface calls into
         the firmware.  Just use the kernel's stack.
      
      2) As soon as we can, call into a new function "start_early_boot()"
         to put a one-register-window buffer between the firmware's
         deepest stack frame and the top-most initial kernel one.
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef3e035c
  14. 21 10月, 2014 1 次提交
  15. 19 10月, 2014 2 次提交
    • D
      sparc64: Do not define thread fpregs save area as zero-length array. · e2653143
      David S. Miller 提交于
      This breaks the stack end corruption detection facility.
      
      What that facility does it write a magic value to "end_of_stack()"
      and checking to see if it gets overwritten.
      
      "end_of_stack()" is "task_thread_info(p) + 1", which for sparc64 is
      the beginning of the FPU register save area.
      
      So once the user uses the FPU, the magic value is overwritten and the
      debug checks trigger.
      
      Fix this by making the size explicit.
      
      Due to the size we use for the fpsaved[], gsr[], and xfsr[] arrays we
      are limited to 7 levels of FPU state saves.  So each FPU register set
      is 256 bytes, allocate 256 * 7 for the fpregs area.
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2653143
    • D
      sparc64: Fix corrupted thread fault code. · 84bd6d8b
      David S. Miller 提交于
      Every path that ends up at do_sparc64_fault() must install a valid
      FAULT_CODE_* bitmask in the per-thread fault code byte.
      
      Two paths leading to the label winfix_trampoline (which expects the
      FAULT_CODE_* mask in register %g4) were not doing so:
      
      1) For pre-hypervisor TLB protection violation traps, if we took
         the 'winfix_trampoline' path we wouldn't have %g4 initialized
         with the FAULT_CODE_* value yet.  Resulting in using the
         TLB_TAG_ACCESS register address value instead.
      
      2) In the TSB miss path, when we notice that we are going to use a
         hugepage mapping, but we haven't allocated the hugepage TSB yet, we
         still have to take the window fixup case into consideration and
         in that particular path we leave %g4 not setup properly.
      
      Errors on this sort were largely invisible previously, but after
      commit 4ccb9272 ("sparc64: sun4v TLB
      error power off events") we now have a fault_code mask bit
      (FAULT_CODE_BAD_RA) that triggers due to this bug.
      
      FAULT_CODE_BAD_RA triggers because this bit is set in TLB_TAG_ACCESS
      (see #1 above) and thus we get seemingly random bus errors triggered
      for user processes.
      
      Fixes: 4ccb9272 ("sparc64: sun4v TLB error power off events")
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84bd6d8b
  16. 15 10月, 2014 1 次提交
    • D
      sparc64: Fix FPU register corruption with AES crypto offload. · f4da3628
      David S. Miller 提交于
      The AES loops in arch/sparc/crypto/aes_glue.c use a scheme where the
      key material is preloaded into the FPU registers, and then we loop
      over and over doing the crypt operation, reusing those pre-cooked key
      registers.
      
      There are intervening blkcipher*() calls between the crypt operation
      calls.  And those might perform memcpy() and thus also try to use the
      FPU.
      
      The sparc64 kernel FPU usage mechanism is designed to allow such
      recursive uses, but with a catch.
      
      There has to be a trap between the two FPU using threads of control.
      
      The mechanism works by, when the FPU is already in use by the kernel,
      allocating a slot for FPU saving at trap time.  Then if, within the
      trap handler, we try to use the FPU registers, the pre-trap FPU
      register state is saved into the slot.  Then at trap return time we
      notice this and restore the pre-trap FPU state.
      
      Over the long term there are various more involved ways we can make
      this work, but for a quick fix let's take advantage of the fact that
      the situation where this happens is very limited.
      
      All sparc64 chips that support the crypto instructiosn also are using
      the Niagara4 memcpy routine, and that routine only uses the FPU for
      large copies where we can't get the source aligned properly to a
      multiple of 8 bytes.
      
      We look to see if the FPU is already in use in this context, and if so
      we use the non-large copy path which only uses integer registers.
      
      Furthermore, we also limit this special logic to when we are doing
      kernel copy, rather than a user copy.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4da3628
  17. 11 10月, 2014 1 次提交
    • D
      sparc64: Fix lockdep warnings on reboot on Ultra-5 · bdcf81b6
      David S. Miller 提交于
      Inconsistently, the raw_* IRQ routines do not interact with and update
      the irqflags tracing and lockdep state, whereas the raw_* spinlock
      interfaces do.
      
      This causes problems in p1275_cmd_direct() because we disable hardirqs
      by hand using raw_local_irq_restore() and then do a raw_spin_lock()
      which triggers a lockdep trace because the CPU's hw IRQ state doesn't
      match IRQ tracing's internal software copy of that state.
      
      The CPU's irqs are disabled, yet current->hardirqs_enabled is true.
      
      ====================
      reboot: Restarting system
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3536 check_flags+0x7c/0x240()
      DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
      Modules linked in: openpromfs
      CPU: 0 PID: 1 Comm: systemd-shutdow Tainted: G        W      3.17.0-dirty #145
      Call Trace:
       [000000000045919c] warn_slowpath_common+0x5c/0xa0
       [0000000000459210] warn_slowpath_fmt+0x30/0x40
       [000000000048f41c] check_flags+0x7c/0x240
       [0000000000493280] lock_acquire+0x20/0x1c0
       [0000000000832b70] _raw_spin_lock+0x30/0x60
       [000000000068f2fc] p1275_cmd_direct+0x1c/0x60
       [000000000068ed28] prom_reboot+0x28/0x40
       [000000000043610c] machine_restart+0x4c/0x80
       [000000000047d2d4] kernel_restart+0x54/0x80
       [000000000047d618] SyS_reboot+0x138/0x200
       [00000000004060b4] linux_sparc_syscall32+0x34/0x60
      ---[ end trace 5c439fe81c05a100 ]---
      possible reason: unannotated irqs-off.
      irq event stamp: 2010267
      hardirqs last  enabled at (2010267): [<000000000049a358>] vprintk_emit+0x4b8/0x580
      hardirqs last disabled at (2010266): [<0000000000499f08>] vprintk_emit+0x68/0x580
      softirqs last  enabled at (2010046): [<000000000045d278>] __do_softirq+0x378/0x4a0
      softirqs last disabled at (2010039): [<000000000042bf08>] do_softirq_own_stack+0x28/0x40
      Resetting ...
      ====================
      
      Use local_* variables of the hw IRQ interfaces so that IRQ tracing sees
      all of our changes.
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdcf81b6
  18. 10 10月, 2014 1 次提交
  19. 08 10月, 2014 1 次提交
  20. 06 10月, 2014 9 次提交
    • D
      sparc64: Kill unnecessary tables and increase MAX_BANKS. · d195b71b
      David S. Miller 提交于
      swapper_low_pmd_dir and swapper_pud_dir are actually completely
      useless and unnecessary.
      
      We just need swapper_pg_dir[].  Naturally the other page table chunks
      will be allocated on an as-needed basis.  Since the kernel actually
      accesses these tables in the PAGE_OFFSET view, there is not even a TLB
      locality advantage of placing them in the kernel image.
      
      Use the hard coded vmlinux.ld.S slot for swapper_pg_dir which is
      naturally page aligned.
      
      Increase MAX_BANKS to 1024 in order to handle heavily fragmented
      virtual guests.
      
      Even with this MAX_BANKS increase, the kernel is 20K+ smaller.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      d195b71b
    • B
      sparc64: sparse irq · ee6a9333
      bob picco 提交于
      This patch attempts to do a few things. The highlights are: 1) enable
      SPARSE_IRQ unconditionally, 2) kills off !SPARSE_IRQ code 3) allocates
      ivector_table at boot time and 4) default to cookie only VIRQ mechanism
      for supported firmware. The first firmware with cookie only support for
      me appears on T5. You can optionally force the HV firmware to not cookie
      only mode which is the sysino support.
      
      The sysino is a deprecated HV mechanism according to the most recent
      SPARC Virtual Machine Specification. HV_GRP_INTR is what controls the
      cookie/sysino firmware versioning.
      
      The history of this interface is:
      
      1) Major version 1.0 only supported sysino based interrupt interfaces.
      
      2) Major version 2.0 added cookie based VIRQs, however due to the fact
         that OSs were using the VIRQs without negoatiating major version
         2.0 (Linux and Solaris are both guilty), the VIRQs calls were
         allowed even with major version 1.0
      
         To complicate things even further, the VIRQ interfaces were only
         actually hooked up in the hypervisor for LDC interrupt sources.
         VIRQ calls on other device types would result in HV_EINVAL errors.
      
         So effectively, major version 2.0 is unusable.
      
      3) Major version 3.0 was created to signal use of VIRQs and the fact
         that the hypervisor has these calls hooked up for all interrupt
         sources, not just those for LDC devices.
      
      A new boot option is provided should cookie only HV support have issues.
      hvirq - this is the version for HV_GRP_INTR. This is related to HV API
      versioning.  The code attempts major=3 first by default. The option can
      be used to override this default.
      
      I've tested with SPARSE_IRQ on T5-8, M7-4 and T4-X and Jalap?no.
      Signed-off-by: NBob Picco <bob.picco@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee6a9333
    • D
      sparc64: Adjust vmalloc region size based upon available virtual address bits. · bb4e6e85
      David S. Miller 提交于
      In order to accomodate embedded per-cpu allocation with large numbers
      of cpus and numa nodes, we have to use as much virtual address space
      as possible for the vmalloc region.  Otherwise we can get things like:
      
      PERCPU: max_distance=0x380001c10000 too large for vmalloc space 0xff00000000
      
      So, once we select a value for PAGE_OFFSET, derive the size of the
      vmalloc region based upon that.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      bb4e6e85
    • D
      sparc64: Increase MAX_PHYS_ADDRESS_BITS to 53. · 7c0fa0f2
      David S. Miller 提交于
      Make sure, at compile time, that the kernel can properly support
      whatever MAX_PHYS_ADDRESS_BITS is defined to.
      
      On M7 chips, use a max_phys_bits value of 49.
      
      Based upon a patch by Bob Picco.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      7c0fa0f2
    • D
      sparc64: Use kernel page tables for vmemmap. · c06240c7
      David S. Miller 提交于
      For sparse memory configurations, the vmemmap array behaves terribly
      and it takes up an inordinate amount of space in the BSS section of
      the kernel image unconditionally.
      
      Just build huge PMDs and look them up just like we do for TLB misses
      in the vmalloc area.
      
      Kernel BSS shrinks by about 2MB.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      c06240c7
    • D
      sparc64: Fix physical memory management regressions with large max_phys_bits. · 0dd5b7b0
      David S. Miller 提交于
      If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
      DEBUG_PAGEALLOC stop working because the 3-level page tables only
      can cover up to 43 bits.
      
      Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
      47, several statically allocated tables became enormous.
      
      Compounding this is that we will need to support up to 49 bits of
      physical addressing for M7 chips.
      
      The two tables in question are sparc64_valid_addr_bitmap and
      kpte_linear_bitmap.
      
      The first holds a bitmap, with 1 bit for each 4MB chunk of physical
      memory, indicating whether that chunk actually exists in the machine
      and is valid.
      
      The second table is a set of 2-bit values which tell how large of a
      mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
      chunk of ram in the system.
      
      These tables are huge and take up an enormous amount of the BSS
      section of the sparc64 kernel image.  Specifically, the
      sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.
      
      So let's solve the space wastage and the DEBUG_PAGEALLOC problem
      at the same time, by using the kernel page tables (as designed) to
      manage this information.
      
      We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
      and we do this by encoding huge PMDs and PUDs.
      
      On a T4-2 with 256GB of ram the kernel page table takes up 16K with
      DEBUG_PAGEALLOC disabled and 256MB with it enabled.  Furthermore, this
      memory is dynamically allocated at run time rather than coded
      statically into the kernel image.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      0dd5b7b0
    • D
      sparc64: Adjust KTSB assembler to support larger physical addresses. · 8c82dc0e
      David S. Miller 提交于
      As currently coded the KTSB accesses in the kernel only support up to
      47 bits of physical addressing.
      
      Adjust the instruction and patching sequence in order to support
      arbitrary 64 bits addresses.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      8c82dc0e
    • D
      sparc64: Define VA hole at run time, rather than at compile time. · 4397bed0
      David S. Miller 提交于
      Now that we use 4-level page tables, we can provide up to 53-bits of
      virtual address space to the user.
      
      Adjust the VA hole based upon the capabilities of the cpu type probed.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      4397bed0
    • D
      sparc64: Switch to 4-level page tables. · ac55c768
      David S. Miller 提交于
      This has become necessary with chips that support more than 43-bits
      of physical addressing.
      
      Based almost entirely upon a patch by Bob Picco.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBob Picco <bob.picco@oracle.com>
      ac55c768
  21. 05 10月, 2014 1 次提交
    • D
      sparc64: Fix reversed start/end in flush_tlb_kernel_range() · 473ad7f4
      David S. Miller 提交于
      When we have to split up a flush request into multiple pieces
      (in order to avoid the firmware range) we don't specify the
      arguments in the right order for the second piece.
      
      Fix the order, or else we get hangs as the code tries to
      flush "a lot" of entries and we get lockups like this:
      
      [ 4422.981276] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [expect:117032]
      [ 4422.996130] Modules linked in: ipv6 loop usb_storage igb ptp sg sr_mod ehci_pci ehci_hcd pps_core n2_rng rng_core
      [ 4423.016617] CPU: 12 PID: 117032 Comm: expect Not tainted 3.17.0-rc4+ #1608
      [ 4423.030331] task: fff8003cc730e220 ti: fff8003d99d54000 task.ti: fff8003d99d54000
      [ 4423.045282] TSTATE: 0000000011001602 TPC: 00000000004521e8 TNPC: 00000000004521ec Y: 00000000    Not tainted
      [ 4423.064905] TPC: <__flush_tlb_kernel_range+0x28/0x40>
      [ 4423.074964] g0: 000000000052fd10 g1: 00000001295a8000 g2: ffffff7176ffc000 g3: 0000000000002000
      [ 4423.092324] g4: fff8003cc730e220 g5: fff8003dfedcc000 g6: fff8003d99d54000 g7: 0000000000000006
      [ 4423.109687] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000000000003 o3: 00000000f0000000
      [ 4423.127058] o4: 0000000000000080 o5: 00000001295a8000 sp: fff8003d99d56d01 ret_pc: 000000000052ff54
      [ 4423.145121] RPC: <__purge_vmap_area_lazy+0x314/0x3a0>
      [ 4423.155185] l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000a38040 l3: 0000000000000000
      [ 4423.172559] l4: fff8003dae8965e0 l5: ffffffffffffffff l6: 0000000000000000 l7: 00000000f7e2b138
      [ 4423.189913] i0: fff8003d99d576a0 i1: fff8003d99d576a8 i2: fff8003d99d575e8 i3: 0000000000000000
      [ 4423.207284] i4: 0000000000008008 i5: fff8003d99d575c8 i6: fff8003d99d56df1 i7: 0000000000530c24
      [ 4423.224640] I7: <free_vmap_area_noflush+0x64/0x80>
      [ 4423.234193] Call Trace:
      [ 4423.239051]  [0000000000530c24] free_vmap_area_noflush+0x64/0x80
      [ 4423.251029]  [0000000000531a7c] remove_vm_area+0x5c/0x80
      [ 4423.261628]  [0000000000531b80] __vunmap+0x20/0x120
      [ 4423.271352]  [000000000071cf18] n_tty_close+0x18/0x40
      [ 4423.281423]  [00000000007222b0] tty_ldisc_close+0x30/0x60
      [ 4423.292183]  [00000000007225a4] tty_ldisc_reinit+0x24/0xa0
      [ 4423.303120]  [0000000000722ab4] tty_ldisc_hangup+0xd4/0x1e0
      [ 4423.314232]  [0000000000719aa0] __tty_hangup+0x280/0x3c0
      [ 4423.324835]  [0000000000724cb4] pty_close+0x134/0x1a0
      [ 4423.334905]  [000000000071aa24] tty_release+0x104/0x500
      [ 4423.345316]  [00000000005511d0] __fput+0x90/0x1e0
      [ 4423.354701]  [000000000047fa54] task_work_run+0x94/0xe0
      [ 4423.365126]  [0000000000404b44] __handle_signal+0xc/0x2c
      
      Fixes: 4ca9a237 ("sparc64: Guard against flushing openfirmware mappings.")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      473ad7f4
  22. 03 10月, 2014 1 次提交
  23. 01 10月, 2014 3 次提交
    • S
      sparc64: Add vio_set_intr() to enable/disable Rx interrupts · ca605b7d
      Sowmini Varadhan 提交于
      The vio_set_intr() API should be used by VIO consumers to enable/disable
      Rx interrupts to facilitate deferred processing in softirq/bottom-half
      context.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca605b7d
    • D
      vio: fix reuse of vio_dring slot · d0aedcd4
      Dwight Engen 提交于
      vio_dring_avail() will allow use of every dring entry, but when the last
      entry is allocated then dr->prod == dr->cons which is indistinguishable from
      the ring empty condition. This causes the next allocation to reuse an entry.
      When this happens in sunvdc, the server side vds driver begins nack'ing the
      messages and ends up resetting the ldc channel. This problem does not effect
      sunvnet since it checks for < 2.
      
      The fix here is to just never allocate the very last dring slot so that full
      and empty are not the same condition. The request start path was changed to
      check for the ring being full a bit earlier, and to stop the blk_queue if
      there is no space left. The blk_queue will be restarted once the ring is
      only half full again. The number of ring entries was increased to 512 which
      matches the sunvnet and Solaris vdc drivers, and greatly reduces the
      frequency of hitting the ring full condition and the associated blk_queue
      stop/starting. The checks in sunvent were adjusted to account for
      vio_dring_avail() returning 1 less.
      
      Orabug: 19441666
      OraBZ: 14983
      Signed-off-by: NDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0aedcd4
    • A
      sunvdc: add cdrom and v1.1 protocol support · 9bce2182
      Allen Pais 提交于
      Interpret the media type from v1.1 protocol to support CDROM/DVD.
      
      For v1.0 protocol, a disk's size continues to be calculated from the
      geometry returned by the vdisk server. The geometry returned by the server
      can be less than the actual number of sectors available in the backing
      image/device due to the rounding in the division used to compute the
      geometry in the vdisk server.
      
      In v1.1 protocol a disk's actual size in sectors is returned during the
      handshake. Use this size when v1.1 protocol is negotiated. Since this size
      will always be larger than the former geometry computed size, disks created
      under v1.0 will be forwards compatible to v1.1, but not vice versa.
      Signed-off-by: NDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bce2182