1. 06 5月, 2016 3 次提交
    • D
      parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls · f0b22d1b
      Dmitry V. Levin 提交于
      Do not load one entry beyond the end of the syscall table when the
      syscall number of a traced process equals to __NR_Linux_syscalls.
      Similar bug with regular processes was fixed by commit 3bb457af
      ("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls").
      
      This bug was found by strace test suite.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
      Acked-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      f0b22d1b
    • C
      x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO · 886123fb
      Chen Yu 提交于
      Currently we read the tsc radio: ratio = (MSR_PLATFORM_INFO >> 8) & 0x1f;
      
      Thus we get bit 8-12 of MSR_PLATFORM_INFO, however according to the SDM
      (35.5), the ratio bits are bit 8-15.
      
      Ignoring the upper bits can result in an incorrect tsc ratio, which causes the
      TSC calibration and the Local APIC timer frequency to be incorrect.
      
      Fix this problem by masking 0xff instead.
      
      [ tglx: Massaged changelog ]
      
      Fixes: 7da7c156 "x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs"
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: stable@vger.kernel.org
      Cc: Bin Gao <bin.gao@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: http://lkml.kernel.org/r/1462505619-5516-1-git-send-email-yu.c.chen@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      886123fb
    • A
      mm: thp: kvm: fix memory corruption in KVM with THP enabled · 127393fb
      Andrea Arcangeli 提交于
      After the THP refcounting change, obtaining a compound pages from
      get_user_pages() no longer allows us to assume the entire compound page
      is immediately mappable from a secondary MMU.
      
      A secondary MMU doesn't want to call get_user_pages() more than once for
      each compound page, in order to know if it can map the whole compound
      page.  So a secondary MMU needs to know from a single get_user_pages()
      invocation when it can map immediately the entire compound page to avoid
      a flood of unnecessary secondary MMU faults and spurious
      atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier
      users).
      
      Ideally instead of the page->_mapcount < 1 check, get_user_pages()
      should return the granularity of the "page" mapping in the "mm" passed
      to get_user_pages().  However it's non trivial change to pass the "pmd"
      status belonging to the "mm" walked by get_user_pages up the stack (up
      to the caller of get_user_pages).  So the fix just checks if there is
      not a single pte mapping on the page returned by get_user_pages, and in
      turn if the caller can assume that the whole compound page is mapped in
      the current "mm" (in a pmd_trans_huge()).  In such case the entire
      compound page is safe to map into the secondary MMU without additional
      get_user_pages() calls on the surrounding tail/head pages.  In addition
      of being faster, not having to run other get_user_pages() calls also
      reduces the memory footprint of the secondary MMU fault in case the pmd
      split happened as result of memory pressure.
      
      Without this fix after a MADV_DONTNEED (like invoked by QEMU during
      postcopy live migration or balloning) or after generic swapping (with a
      failure in split_huge_page() that would only result in pmd splitting and
      not a physical page split), KVM would map the whole compound page into
      the shadow pagetables, despite regular faults or userfaults (like
      UFFDIO_COPY) may map regular pages into the primary MMU as result of the
      pte faults, leading to the guest mode and userland mode going out of
      sync and not working on the same memory at all times.
      
      Any other secondary MMU notifier manager (KVM is just one of the many
      MMU notifier users) will need the same information if it doesn't want to
      run a flood of get_user_pages_fast and it can support multiple
      granularity in the secondary MMU mappings, so I think it is justified to
      be exposed not just to KVM.
      
      The other option would be to move transparent_hugepage_adjust to
      mm/huge_memory.c but that currently has all kind of KVM data structures
      in it, so it's definitely not a cut-and-paste work, so I couldn't do a
      fix as cleaner as this one for 4.6.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: "Li, Liang Z" <liang.z.li@intel.com>
      Cc: Amit Shah <amit.shah@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      127393fb
  2. 05 5月, 2016 7 次提交
    • W
      x86/sysfb_efi: Fix valid BAR address range check · c10fcb14
      Wang YanQing 提交于
      The code for checking whether a BAR address range is valid will break
      out of the loop when a start address of 0x0 is encountered.
      
      This behaviour is wrong since by breaking out of the loop we may miss
      the BAR that describes the EFI frame buffer in a later iteration.
      
      Because of this bug I can't use video=efifb: boot parameter to get
      efifb on my new ThinkPad E550 for my old linux system hard disk with
      3.10 kernel. In 3.10, efifb is the only choice due to DRM/I915 not
      supporting the GPU.
      
      This patch also add a trivial optimization to break out after we find
      the frame buffer address range without testing later BARs.
      Signed-off-by: NWang YanQing <udknight@gmail.com>
      [ Rewrote changelog. ]
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: NPeter Jones <pjones@redhat.com>
      Cc: <stable@vger.kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: David Herrmann <dh.herrmann@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1462454061-21561-2-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c10fcb14
    • V
      ARC: support HIGHMEM even without PAE40 · 26f9d5fd
      Vineet Gupta 提交于
      Initial HIGHMEM support on ARC was introduced for PAE40 where the low
      memory (0x8000_0000 based) and high memory (0x1_0000_0000) were
      physically contiguous. So CONFIG_FLATMEM sufficed (despite a peipheral
      hole in the middle, which wasted a bit of struct page memory, but things
      worked).
      
      However w/o PAE, highmem was not possible and we could only reach
      ~1.75GB of DDR. Now there is a use case to access ~4GB of DDR w/o PAE40
      The idea is to have low memory at canonical 0x8000_0000 and highmem
      at 0 so enire 4GB address space is available for physical addressing
      This needs additional platform/interconnect mapping to convert
      the non contiguous physical addresses into linear bus adresses.
      
      From Linux point of view, non contiguous divide means FLATMEM no
      longer works and DISCONTIGMEM is needed to track the pfns in the 2
      regions.
      
      This scheme would also work for PAE40, only better in that we don't
      waste struct page memory for the peripheral hole.
      
      The DT description will be something like
      
          memory {
              ...
              reg = <0x80000000 0x200000000   /* 512MB: lowmem */
                     0x00000000 0x10000000>;  /* 256MB: highmem */
         }
      Signed-off-by: NNoam Camus <noamc@ezchip.com>
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      26f9d5fd
    • V
      ARC: Fix PAE40 boot failures due to PTE truncation · 2519d753
      Vineet Gupta 提交于
      So a benign looking cleanup which macro'ized PAGE_SHIFT shifts turned
      out to be bad (since it was done non-sensically across the board).
      
      It caused boot failures with PAE40 as forced cast to (unsigned long)
      from newly introduced virt_to_pfn() was causing truncatiion of the
      (long long) pte/paddr values.
      
      It is OK to use this in accessors dealing with kernel virtual address,
      pointers etc, but not for PTE values themelves.
      
      Fixes: cJ2ff5cf2735c ("ARC: mm: Use virt_to_pfn() for addr >> PAGE_SHIFT pattern)
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      2519d753
    • V
      ARC: Add missing io barriers to io{read,write}{16,32}be() · e5bc0478
      Vineet Gupta 提交于
      While reviewing a different change to asm-generic/io.h Arnd spotted that
      ARC ioread32 and ioread32be both of which come from asm-generic versions
      are not symmetrical in terms of calling the io barriers.
      
      generic ioread32   -> ARC readl()                  [ has barriers]
      generic ioread32be -> __be32_to_cpu(__raw_readl()) [ lacks barriers]
      
      While generic ioread32be is being remediated to call readl(), that involves
      a swab32(), causing double swaps on ioread32be() on Big Endian systems.
      
      So provide our versions of big endian IO accessors to ensure io barrier
      calls while also keeping them optimal
      Suggested-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Cc: stable@vger.kernel.org  [4.2+]
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      e5bc0478
    • P
      perf/x86/amd/iommu: Do not register a task ctx for uncore like PMUs · 8482716b
      Peter Zijlstra 提交于
      The new sanity check introduced by:
      
        26657848 ("perf/core: Verify we have a single perf_hw_context PMU")
      
      ... triggered on the AMD IOMMU driver.
      
      IOMMUs are not per logical CPU, they cannot have per-task counters. Fix it.
      Reported-by: NBorislav Petkov <bp@alien8.de>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: jroedel@suse.de
      Cc: suravee.suthikulpanit@amd.com
      Link: http://lkml.kernel.org/r/20160423224255.GB3430@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8482716b
    • A
      x86/platform/UV: Bring back the call to map_low_mmrs in uv_system_init · 08914f43
      Alex Thorlton 提交于
      A while back the following commit:
      
        d394f2d9 ("x86/platform/UV: Remove EFI memmap quirk for UV2+")
      
      changed uv_system_init() to only call map_low_mmrs() on older UV1 hardware,
      which requires EFI_OLD_MEMMAP to be set in order to boot.
      
      The recent changes to the EFI memory mapping code in:
      
        d2f7cbe7 ("x86/efi: Runtime services virtual mapping")
      
      exposed some issues with the fact that we were relying on the EFI memory
      mapping mechanisms to map in our MMRs for us, after commit d394f2d9.
      
      Rather than revert the entire commit and go back to forcing
      EFI_OLD_MEMMAP on all UVs, we're going to add the call to map_low_mmrs()
      back into uv_system_init(), and then fix up our EFI runtime calls to use
      the appropriate page table.
      
      For now, UV2+ will still need efi=old_map to boot, but there will be
      other changes soon that should eliminate the need for this.
      Signed-off-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Link: http://lkml.kernel.org/r/1462401592-120735-1-git-send-email-athorlton@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      08914f43
    • A
      perf/x86: Add model numbers for Kabylake CPUs · cba1b379
      Andi Kleen 提交于
      Everything the same as Skylake, just new model numbers.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1461977748-17616-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cba1b379
  3. 04 5月, 2016 1 次提交
    • J
      x86/efi-bgrt: Switch all pr_err() to pr_notice() for invalid BGRT · 7f9b474c
      Josh Boyer 提交于
      The promise of pretty boot splashes from firmware via BGRT was at
      best only that; a promise.  The kernel diligently checks to make
      sure the BGRT data firmware gives it is valid, and dutifully warns
      the user when it isn't.  However, it does so via the pr_err log
      level which seems unnecessary.  The user cannot do anything about
      this and there really isn't an error on the part of Linux to
      correct.
      
      This lowers the log level by using pr_notice instead.  Users will
      no longer have their boot process uglified by the kernel reminding
      us that firmware can and often is broken when the 'quiet' kernel
      parameter is specified.  Ironic, considering BGRT is supposed to
      make boot pretty to begin with.
      Signed-off-by: NJosh Boyer <jwboyer@fedoraproject.org>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Môshe van der Sterre <me@moshe.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1462303781-8686-4-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7f9b474c
  4. 02 5月, 2016 1 次提交
    • A
      powerpc: Fix bad inline asm constraint in create_zero_mask() · b4c11211
      Anton Blanchard 提交于
      In create_zero_mask() we have:
      
      	addi	%1,%2,-1
      	andc	%1,%1,%2
      	popcntd	%0,%1
      
      using the "r" constraint for %2. r0 is a valid register in the "r" set,
      but addi X,r0,X turns it into an li:
      
      	li	r7,-1
      	andc	r7,r7,r0
      	popcntd	r4,r7
      
      Fix this by using the "b" constraint, for which r0 is not a valid
      register.
      
      This was found with a kernel build using gcc trunk, narrowed down to
      when -frename-registers was enabled at -O2. It is just luck however
      that we aren't seeing this on older toolchains.
      
      Thanks to Segher for working with me to find this issue.
      
      Cc: stable@vger.kernel.org
      Fixes: d0cebfa6 ("powerpc: word-at-a-time optimization for 64-bit Little Endian")
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b4c11211
  5. 30 4月, 2016 1 次提交
    • A
      ARM: davinci: only use NVMEM when available · 04b9665b
      Arnd Bergmann 提交于
      The davinci platform contains code that calls into the nvmem
      subsystem, but that might be a loadable module, causing a
      link error:
      
      arch/arm/mach-davinci/built-in.o: In function `davinci_get_mac_addr':
      :(.text+0x1088): undefined reference to `nvmem_device_read'
      arch/arm/mach-davinci/built-in.o: In function `read_factory_config':
      :(.text+0x214c): undefined reference to `nvmem_device_read'
      
      Also, when NVMEM is completely disabled, the functions fail with
      nonobvious error messages.
      
      This ensures we only call the API functions when the code is actually
      reachable from the board file, and otherwise prints a unique log
      message.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: bec3c11b ("misc: at24: replace memory_accessor with nvmem_device_read")
      Signed-off-by: NSekhar Nori <nsekhar@ti.com>
      Signed-off-by: NKevin Hilman <khilman@baylibre.com>
      04b9665b
  6. 28 4月, 2016 6 次提交
  7. 27 4月, 2016 10 次提交
  8. 26 4月, 2016 1 次提交
  9. 25 4月, 2016 1 次提交
    • K
      ARM: EXYNOS: Properly skip unitialized parent clock in power domain on · a0a966b8
      Krzysztof Kozlowski 提交于
      We want to skip reparenting a clock on turning on power domain, if we
      do not have the parent yet. The parent is obtained when turning the
      domain off. However due to a typo, the loop is continued on IS_ERR() of
      clock being reparented, not on the IS_ERR() of the parent.
      
      Theoretically this could lead to OOPS on first turn on of a power
      domain, if there was no turn off before. Practically that should never
      happen because all power domains are turned on by default (reset value,
      bootloader does not turn off them usually) so the first action will be
      always turn off.
      
      Fixes: 29e5eea0 ("ARM: EXYNOS: Get current parent clock for power domain on/off")
      Reported-by: NVladimir Zapolskiy <vz@mleia.com>
      Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      a0a966b8
  10. 24 4月, 2016 1 次提交
  11. 23 4月, 2016 3 次提交
    • S
      perf/x86/intel/rapl: Add missing Haswell model · e1089602
      Srinivas Pandruvada 提交于
      Added one missing Haswell model.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: bp@alien8.de
      Cc: hpa@zytor.com
      Link: http://lkml.kernel.org/r/1460907809-11897-1-git-send-email-srinivas.pandruvada@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e1089602
    • A
      perf/x86/intel: Add model number for Skylake Server to perf · b89c1737
      Andi Kleen 提交于
      Everything the same as base Skylake, just a new model number.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1460751933-2264-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b89c1737
    • R
      xen/qspinlock: Don't kick CPU if IRQ is not initialized · 707e59ba
      Ross Lagerwall 提交于
      The following commit:
      
        1fb3a8b2 ("xen/spinlock: Fix locking path engaging too soon under PVHVM.")
      
      ... moved the initalization of the kicker interrupt until after
      native_cpu_up() is called.
      
      However, when using qspinlocks, a CPU may try to kick another CPU that is
      spinning (because it has not yet initialized its kicker interrupt), resulting
      in the following crash during boot:
      
        kernel BUG at /build/linux-Ay7j_C/linux-4.4.0/drivers/xen/events/events_base.c:1210!
        invalid opcode: 0000 [#1] SMP
        ...
        RIP: 0010:[<ffffffff814c97c9>]  [<ffffffff814c97c9>] xen_send_IPI_one+0x59/0x60
        ...
        Call Trace:
         [<ffffffff8102be9e>] xen_qlock_kick+0xe/0x10
         [<ffffffff810cabc2>] __pv_queued_spin_unlock+0xb2/0xf0
         [<ffffffff810ca6d1>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
         [<ffffffff81052936>] ? check_tsc_warp+0x76/0x150
         [<ffffffff81052aa6>] check_tsc_sync_source+0x96/0x160
         [<ffffffff81051e28>] native_cpu_up+0x3d8/0x9f0
         [<ffffffff8102b315>] xen_hvm_cpu_up+0x35/0x80
         [<ffffffff8108198c>] _cpu_up+0x13c/0x180
         [<ffffffff81081a4a>] cpu_up+0x7a/0xa0
         [<ffffffff81f80dfc>] smp_init+0x7f/0x81
         [<ffffffff81f5a121>] kernel_init_freeable+0xef/0x212
         [<ffffffff81817f30>] ? rest_init+0x80/0x80
         [<ffffffff81817f3e>] kernel_init+0xe/0xe0
         [<ffffffff8182488f>] ret_from_fork+0x3f/0x70
         [<ffffffff81817f30>] ? rest_init+0x80/0x80
      
      To fix this, only send the kick if the target CPU's interrupt has been
      initialized. This check isn't racy, because the target is waiting for
      the spinlock, so it won't have initialized the interrupt in the
      meantime.
      Signed-off-by: NRoss Lagerwall <ross.lagerwall@citrix.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      707e59ba
  12. 22 4月, 2016 5 次提交