1. 20 9月, 2016 2 次提交
    • A
      powerpc/mm: Update FORCE_MAX_ZONEORDER range to allow hugetlb w/4K · d5a1e42c
      Aneesh Kumar K.V 提交于
      For hugetlb to work with 4K page size, we need MAX_ORDER to be 13 or
      more. When switching from a 64K page size to 4K linux page size using
      make oldconfig, we end up with a CONFIG_FORCE_MAX_ZONEORDER value of 9.
      This results in a 16M hugepage beiing considered as a gigantic huge page
      which in turn results in failure to setup hugepages if gigantic hugepage
      support is not enabled.
      
      This also results in kernel crash with 4K radix configuration. We
      hit the below BUG_ON on radix:
      
        kernel BUG at mm/huge_memory.c:364!
        Oops: Exception in kernel mode, sig: 5 [#1]
        SMP NR_CPUS=2048 NUMA PowerNV
        CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.8.0-rc1-00006-gbae9cc6 #1
        task: c0000000f1af8000 task.stack: c0000000f1aec000
        NIP: c000000000c5fa0c LR: c000000000c5f9d8 CTR: c000000000c5f9a4
        REGS: c0000000f1aef920 TRAP: 0700   Not tainted (4.8.0-rc1-00006-gbae9cc6)
        MSR: 9000000102029033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE,TM[E]>  CR: 24000844  XER: 00000000
        CFAR: c000000000c5f9e0 SOFTE: 1
        ....
        NIP [c000000000c5fa0c] hugepage_init+0x68/0x238
        LR [c000000000c5f9d8] hugepage_init+0x34/0x238
      
      Fixes: a7ee5395 ("powerpc/Kconfig: Update config option based on page size")
      Cc: stable@vger.kernel.org # v4.7+
      Reported-by: NSanthosh <santhog4@linux.vnet.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d5a1e42c
    • N
      powerpc/64: Replay hypervisor maintenance interrupt first · e0e0d6b7
      Nicholas Piggin 提交于
      The HMI (Hypervisor Maintenance Interrupt) is defined by the
      architecture to be higher priority than other maskable interrupts, so
      replay it first, as a best-effort to replay according to hardware
      priorities.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e0e0d6b7
  2. 19 9月, 2016 4 次提交
    • M
      powerpc: Ensure .mem(init|exit).text are within _stext/_etext · 7de3b27b
      Michael Ellerman 提交于
      In our linker script we open code the list of text sections, because we
      need to include the __ftr_alt sections, which are arch-specific.
      
      This means we can't use TEXT_TEXT as defined in vmlinux.lds.h, and so we
      don't have the MEM_KEEP() logic for memory hotplug sections.
      
      If we build the kernel with the gold linker, and with CONFIG_MEMORY_HOTPLUG=y,
      we see that functions marked __meminit can end up outside of the
      _stext/_etext range, and also outside of _sinittext/_einittext, eg:
      
          c000000000000000 T _stext
          c0000000009e0000 A _etext
          c0000000009e3f18 T hash__vmemmap_create_mapping
          c000000000ca0000 T _sinittext
          c000000000d00844 T _einittext
      
      This causes them to not be recognised as text by is_kernel_text(), and
      prevents them being patched by jump_label (and presumably ftrace/kprobes
      etc.).
      
      Fix it by adding MEM_KEEP() directives, mirroring what TEXT_TEXT does.
      
      This isn't a problem when CONFIG_MEMORY_HOTPLUG=n, because we use the
      standard INIT_TEXT_SECTION() and EXIT_TEXT macros from vmlinux.lds.h.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      7de3b27b
    • M
      powerpc: Don't change the section in _GLOBAL() · bea2dccc
      Michael Ellerman 提交于
      Currently the _GLOBAL() macro unilaterally sets the assembler section to
      ".text" at the start of the macro. This is rude as the caller may be
      using a different section.
      
      So let the caller decide which section to emit the code into. On big
      endian we do need to switch to the ".opd" section to emit the OPD, but
      do that with pushsection/popsection, thereby leaving the original
      section intact.
      
      I verified that the order of all entries in System.map is unchanged
      after this patch. The actual addresses shift around slightly so you
      can't just diff the System.map.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      bea2dccc
    • N
      powerpc/kernel: Use kprobe blacklist for asm functions · 6f698df1
      Nicholas Piggin 提交于
      Rather than forcing the whole function into the ".kprobes.text" section,
      just add the symbol's address to the kprobe blacklist.
      
      This also lets us drop the three versions of the_KPROBE macro, in
      exchange for just one version of _ASM_NOKPROBE_SYMBOL - which is a good
      cleanup.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6f698df1
    • N
      powerpc: Use kprobe blacklist for exception handlers · 03465f89
      Nicholas Piggin 提交于
      Currently we mark the C implementations of some exception handlers as
      __kprobes. This has the effect of putting them in the ".kprobes.text"
      section, which separates them from the rest of the text.
      
      Instead we can use the blacklist macros to add the symbols to a
      blacklist which kprobes will check. This allows the linker to move
      exception handler functions close to callers and avoids trampolines in
      larger kernels.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      [mpe: Reword change log a bit]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      03465f89
  3. 13 9月, 2016 23 次提交
  4. 10 9月, 2016 1 次提交
  5. 03 9月, 2016 3 次提交
    • E
      x86/AMD: Apply erratum 665 on machines without a BIOS fix · d1992996
      Emanuel Czirai 提交于
      AMD F12h machines have an erratum which can cause DIV/IDIV to behave
      unpredictably. The workaround is to set MSRC001_1029[31] but sometimes
      there is no BIOS update containing that workaround so let's do it
      ourselves unconditionally. It is simple enough.
      
      [ Borislav: Wrote commit message. ]
      Signed-off-by: NEmanuel Czirai <icanrealizeum@gmail.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Yaowu Xu <yaowu@google.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160902053550.18097-1-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      d1992996
    • S
      x86/paravirt: Do not trace _paravirt_ident_*() functions · 15301a57
      Steven Rostedt 提交于
      Łukasz Daniluk reported that on a RHEL kernel that his machine would lock up
      after enabling function tracer. I asked him to bisect the functions within
      available_filter_functions, which he did and it came down to three:
      
        _paravirt_nop(), _paravirt_ident_32() and _paravirt_ident_64()
      
      It was found that this is only an issue when noreplace-paravirt is added
      to the kernel command line.
      
      This means that those functions are most likely called within critical
      sections of the funtion tracer, and must not be traced.
      
      In newer kenels _paravirt_nop() is defined within gcc asm(), and is no
      longer an issue.  But both _paravirt_ident_{32,64}() causes the
      following splat when they are traced:
      
       mm/pgtable-generic.c:33: bad pmd ffff8800d2435150(0000000001d00054)
       mm/pgtable-generic.c:33: bad pmd ffff8800d3624190(0000000001d00070)
       mm/pgtable-generic.c:33: bad pmd ffff8800d36a5110(0000000001d00054)
       mm/pgtable-generic.c:33: bad pmd ffff880118eb1450(0000000001d00054)
       NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [systemd-journal:469]
       Modules linked in: e1000e
       CPU: 2 PID: 469 Comm: systemd-journal Not tainted 4.6.0-rc4-test+ #513
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
       task: ffff880118f740c0 ti: ffff8800d4aec000 task.ti: ffff8800d4aec000
       RIP: 0010:[<ffffffff81134148>]  [<ffffffff81134148>] queued_spin_lock_slowpath+0x118/0x1a0
       RSP: 0018:ffff8800d4aefb90  EFLAGS: 00000246
       RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff88011eb16d40
       RDX: ffffffff82485760 RSI: 000000001f288820 RDI: ffffea0000008030
       RBP: ffff8800d4aefb90 R08: 00000000000c0000 R09: 0000000000000000
       R10: ffffffff821c8e0e R11: 0000000000000000 R12: ffff880000200fb8
       R13: 00007f7a4e3f7000 R14: ffffea000303f600 R15: ffff8800d4b562e0
       FS:  00007f7a4e3d7840(0000) GS:ffff88011eb00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007f7a4e3f7000 CR3: 00000000d3e71000 CR4: 00000000001406e0
       Call Trace:
         _raw_spin_lock+0x27/0x30
         handle_pte_fault+0x13db/0x16b0
         handle_mm_fault+0x312/0x670
         __do_page_fault+0x1b1/0x4e0
         do_page_fault+0x22/0x30
         page_fault+0x28/0x30
         __vfs_read+0x28/0xe0
         vfs_read+0x86/0x130
         SyS_read+0x46/0xa0
         entry_SYSCALL_64_fastpath+0x1e/0xa8
       Code: 12 48 c1 ea 0c 83 e8 01 83 e2 30 48 98 48 81 c2 40 6d 01 00 48 03 14 c5 80 6a 5d 82 48 89 0a 8b 41 08 85 c0 75 09 f3 90 8b 41 08 <85> c0 74 f7 4c 8b 09 4d 85 c9 74 08 41 0f 18 09 eb 02 f3 90 8b
      Reported-by: NŁukasz Daniluk <lukasz.daniluk@intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15301a57
    • J
      arm64: kernel: Fix unmasked debug exceptions when restoring mdscr_el1 · 744c6c37
      James Morse 提交于
      Changes to make the resume from cpu_suspend() code behave more like
      secondary boot caused debug exceptions to be unmasked early by
      __cpu_setup(). We then go on to restore mdscr_el1 in cpu_do_resume(),
      potentially taking break or watch points based on uninitialised registers.
      
      Mask debug exceptions in cpu_do_resume(), which is specific to resume
      from cpu_suspend(). Debug exceptions will be restored to their original
      state by local_dbg_restore() in cpu_suspend(), which runs after
      hw_breakpoint_restore() has re-initialised the other registers.
      Reported-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Fixes: cabe1c81 ("arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va")
      Cc: <stable@vger.kernel.org> # 4.7+
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      744c6c37
  6. 02 9月, 2016 1 次提交
  7. 31 8月, 2016 1 次提交
    • J
      mm/usercopy: get rid of CONFIG_DEBUG_STRICT_USER_COPY_CHECKS · 0d025d27
      Josh Poimboeuf 提交于
      There are three usercopy warnings which are currently being silenced for
      gcc 4.6 and newer:
      
      1) "copy_from_user() buffer size is too small" compile warning/error
      
         This is a static warning which happens when object size and copy size
         are both const, and copy size > object size.  I didn't see any false
         positives for this one.  So the function warning attribute seems to
         be working fine here.
      
         Note this scenario is always a bug and so I think it should be
         changed to *always* be an error, regardless of
         CONFIG_DEBUG_STRICT_USER_COPY_CHECKS.
      
      2) "copy_from_user() buffer size is not provably correct" compile warning
      
         This is another static warning which happens when I enable
         __compiletime_object_size() for new compilers (and
         CONFIG_DEBUG_STRICT_USER_COPY_CHECKS).  It happens when object size
         is const, but copy size is *not*.  In this case there's no way to
         compare the two at build time, so it gives the warning.  (Note the
         warning is a byproduct of the fact that gcc has no way of knowing
         whether the overflow function will be called, so the call isn't dead
         code and the warning attribute is activated.)
      
         So this warning seems to only indicate "this is an unusual pattern,
         maybe you should check it out" rather than "this is a bug".
      
         I get 102(!) of these warnings with allyesconfig and the
         __compiletime_object_size() gcc check removed.  I don't know if there
         are any real bugs hiding in there, but from looking at a small
         sample, I didn't see any.  According to Kees, it does sometimes find
         real bugs.  But the false positive rate seems high.
      
      3) "Buffer overflow detected" runtime warning
      
         This is a runtime warning where object size is const, and copy size >
         object size.
      
      All three warnings (both static and runtime) were completely disabled
      for gcc 4.6 with the following commit:
      
        2fb0815c ("gcc4: disable __compiletime_object_size for GCC 4.6+")
      
      That commit mistakenly assumed that the false positives were caused by a
      gcc bug in __compiletime_object_size().  But in fact,
      __compiletime_object_size() seems to be working fine.  The false
      positives were instead triggered by #2 above.  (Though I don't have an
      explanation for why the warnings supposedly only started showing up in
      gcc 4.6.)
      
      So remove warning #2 to get rid of all the false positives, and re-enable
      warnings #1 and #3 by reverting the above commit.
      
      Furthermore, since #1 is a real bug which is detected at compile time,
      upgrade it to always be an error.
      
      Having done all that, CONFIG_DEBUG_STRICT_USER_COPY_CHECKS is no longer
      needed.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: "H . Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0d025d27
  8. 29 8月, 2016 4 次提交
    • R
      net: smc91x: fix SMC accesses · 2fb04fdf
      Russell King 提交于
      Commit b70661c7 ("net: smc91x: use run-time configuration on all ARM
      machines") broke some ARM platforms through several mistakes.  Firstly,
      the access size must correspond to the following rule:
      
      (a) at least one of 16-bit or 8-bit access size must be supported
      (b) 32-bit accesses are optional, and may be enabled in addition to
          the above.
      
      Secondly, it provides no emulation of 16-bit accesses, instead blindly
      making 16-bit accesses even when the platform specifies that only 8-bit
      is supported.
      
      Reorganise smc91x.h so we can make use of the existing 16-bit access
      emulation already provided - if 16-bit accesses are supported, use
      16-bit accesses directly, otherwise if 8-bit accesses are supported,
      use the provided 16-bit access emulation.  If neither, BUG().  This
      exactly reflects the driver behaviour prior to the commit being fixed.
      
      Since the conversion incorrectly cut down the available access sizes on
      several platforms, we also need to go through every platform and fix up
      the overly-restrictive access size: Arnd assumed that if a platform can
      perform 32-bit, 16-bit and 8-bit accesses, then only a 32-bit access
      size needed to be specified - not so, all available access sizes must
      be specified.
      
      This likely fixes some performance regressions in doing this: if a
      platform does not support 8-bit accesses, 8-bit accesses have been
      emulated by performing a 16-bit read-modify-write access.
      
      Tested on the Intel Assabet/Neponset platform, which supports only 8-bit
      accesses, which was broken by the original commit.
      
      Fixes: b70661c7 ("net: smc91x: use run-time configuration on all ARM machines")
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Tested-by: NRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2fb04fdf
    • C
      powerpc: signals: Discard transaction state from signal frames · 78a3e888
      Cyril Bur 提交于
      Userspace can begin and suspend a transaction within the signal
      handler which means they might enter sys_rt_sigreturn() with the
      processor in suspended state.
      
      sys_rt_sigreturn() wants to restore process context (which may have
      been in a transaction before signal delivery). To do this it must
      restore TM SPRS. To achieve this, any transaction initiated within the
      signal frame must be discarded in order to be able to restore TM SPRs
      as TM SPRs can only be manipulated non-transactionally..
      >From the PowerPC ISA:
        TM Bad Thing Exception [Category: Transactional Memory]
         An attempt is made to execute a mtspr targeting a TM register in
         other than Non-transactional state.
      
      Not doing so results in a TM Bad Thing:
      [12045.221359] Kernel BUG at c000000000050a40 [verbose debug info unavailable]
      [12045.221470] Unexpected TM Bad Thing exception at c000000000050a40 (msr 0x201033)
      [12045.221540] Oops: Unrecoverable exception, sig: 6 [#1]
      [12045.221586] SMP NR_CPUS=2048 NUMA PowerNV
      [12045.221634] Modules linked in: xt_CHECKSUM iptable_mangle ipt_MASQUERADE
       nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
       xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter
       ebtables ip6table_filter ip6_tables iptable_filter ip_tables x_tables kvm_hv kvm
       uio_pdrv_genirq ipmi_powernv uio powernv_rng ipmi_msghandler autofs4 ses enclosure
       scsi_transport_sas bnx2x ipr mdio libcrc32c
      [12045.222167] CPU: 68 PID: 6178 Comm: sigreturnpanic Not tainted 4.7.0 #34
      [12045.222224] task: c0000000fce38600 ti: c0000000fceb4000 task.ti: c0000000fceb4000
      [12045.222293] NIP: c000000000050a40 LR: c0000000000163bc CTR: 0000000000000000
      [12045.222361] REGS: c0000000fceb7ac0 TRAP: 0700   Not tainted (4.7.0)
      [12045.222418] MSR: 9000000300201033 <SF,HV,ME,IR,DR,RI,LE,TM[SE]> CR: 28444280  XER: 20000000
      [12045.222625] CFAR: c0000000000163b8 SOFTE: 0 PACATMSCRATCH: 900000014280f033
      GPR00: 01100000b8000001 c0000000fceb7d40 c00000000139c100 c0000000fce390d0
      GPR04: 900000034280f033 0000000000000000 0000000000000000 0000000000000000
      GPR08: 0000000000000000 b000000000001033 0000000000000001 0000000000000000
      GPR12: 0000000000000000 c000000002926400 0000000000000000 0000000000000000
      GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR24: 0000000000000000 00003ffff98cadd0 00003ffff98cb470 0000000000000000
      GPR28: 900000034280f033 c0000000fceb7ea0 0000000000000001 c0000000fce390d0
      [12045.223535] NIP [c000000000050a40] tm_restore_sprs+0xc/0x1c
      [12045.223584] LR [c0000000000163bc] tm_recheckpoint+0x5c/0xa0
      [12045.223630] Call Trace:
      [12045.223655] [c0000000fceb7d80] [c000000000026e74] sys_rt_sigreturn+0x494/0x6c0
      [12045.223738] [c0000000fceb7e30] [c0000000000092e0] system_call+0x38/0x108
      [12045.223806] Instruction dump:
      [12045.223841] 7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
      [12045.223955] 4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020
      [12045.224074] ---[ end trace cb8002ee240bae76 ]---
      
      It isn't clear exactly if there is really a use case for userspace
      returning with a suspended transaction, however, doing so doesn't (on
      its own) constitute a bad frame. As such, this patch simply discards
      the transactional state of the context calling the sigreturn and
      continues.
      Reported-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: NCyril Bur <cyrilbur@gmail.com>
      Tested-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Reviewed-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Acked-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      78a3e888
    • M
      powerpc/powernv : Drop reference added by kset_find_obj() · a9cbf0b2
      Mukesh Ojha 提交于
      In a situation, where Linux kernel gets notified about duplicate error log
      from OPAL, it is been observed that kernel fails to remove sysfs entries
      (/sys/firmware/opal/elog/0xXXXXXXXX) of such error logs. This is because,
      we currently search the error log/dump kobject in the kset list via
      'kset_find_obj()' routine. Which eventually increment the reference count
      by one, once it founds the kobject.
      
      So, unless we decrement the reference count by one after it found the kobject,
      we would not be able to release the kobject properly later.
      
      This patch adds the 'kobject_put()' which was missing earlier.
      Signed-off-by: NMukesh Ojha <mukesh02@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NVasant Hegde <hegdevasant@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a9cbf0b2
    • N
      powerpc/tm: do not use r13 for tabort_syscall · cc7786d3
      Nicholas Piggin 提交于
      tabort_syscall runs with RI=1, so a nested recoverable machine
      check will load the paca into r13 and overwrite what we loaded
      it with, because exceptions returning to privileged mode do not
      restore r13.
      
      Fixes: b4b56f9e (powerpc/tm: Abort syscalls in active transactions)
      Cc: stable@vger.kernel.org
      Signed-off-by: NNick Piggin <npiggin@gmail.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      cc7786d3
  9. 27 8月, 2016 1 次提交