1. 24 7月, 2016 1 次提交
    • A
      x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs · 530dd8d4
      Andy Lutomirski 提交于
      Valdis Kletnieks bisected a boot failure back to this recent commit:
      
        360cb4d1 ("x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated")
      
      I broke the case where a PUD table got allocated -- populate_pud()
      would wander off a pgd_none entry and get lost.  I'm not sure how
      this survived my testing.
      
      Fix the original issue in a much simpler way.  The problem
      was that, if we allocated a PUD table, failed to populate it, and
      freed it, another CPU could potentially keep using the PGD entry we
      installed (either by copying it via vmalloc_fault or by speculatively
      caching it).  There's a straightforward fix: simply leave the
      top-level entry in place if this happens.  This can't waste any
      significant amount of memory -- there are at most 256 entries like
      this systemwide and, as a practical matter, if we hit this failure
      path repeatedly, we're likely to reuse the same page anyway.
      
      For context, this is a reversion with this hunk added in:
      
      	if (ret < 0) {
      +		/*
      +		 * Leave the PUD page in place in case some other CPU or thread
      +		 * already found it, but remove any useless entries we just
      +		 * added to it.
      +		 */
      -		unmap_pgd_range(cpa->pgd, addr,
      +		unmap_pud_range(pgd_entry, addr,
      			        addr + (cpa->numpages << PAGE_SHIFT));
      		return ret;
      	}
      
      This effectively open-codes what the now-deleted unmap_pgd_range()
      function used to do except that unmap_pgd_range() used to try to
      free the page as well.
      Reported-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Mike Krinkin <krinkin.m.u@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Link: http://lkml.kernel.org/r/21cbc2822aa18aa812c0215f4231dbf5f65afa7f.1469249789.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      530dd8d4
  2. 15 7月, 2016 14 次提交
  3. 13 7月, 2016 4 次提交
    • D
      x86/mm: Use pte_none() to test for empty PTE · dcb32d99
      Dave Hansen 提交于
      The page table manipulation code seems to have grown a couple of
      sites that are looking for empty PTEs.  Just in case one of these
      entries got a stray bit set, use pte_none() instead of checking
      for a zero pte_val().
      
      The use pte_same() makes me a bit nervous.  If we were doing a
      pte_same() check against two cleared entries and one of them had
      a stray bit set, it might fail the pte_same() check.  But, I
      don't think we ever _do_ pte_same() for cleared entries.  It is
      almost entirely used for checking for races in fault-in paths.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dave.hansen@intel.com
      Cc: linux-mm@kvack.org
      Cc: mhocko@suse.com
      Link: http://lkml.kernel.org/r/20160708001915.813703D9@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dcb32d99
    • D
      x86/mm: Disallow running with 32-bit PTEs to work around erratum · e4a84be6
      Dave Hansen 提交于
      The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
      Landing) has an erratum where a processor thread setting the Accessed
      or Dirty bits may not do so atomically against its checks for the
      Present bit.  This may cause a thread (which is about to page fault)
      to set A and/or D, even though the Present bit had already been
      atomically cleared.
      
      These bits are truly "stray".  In the case of the Dirty bit, the
      thread associated with the stray set was *not* allowed to write to
      the page.  This means that we do not have to launder the bit(s); we
      can simply ignore them.
      
      If the PTE is used for storing a swap index or a NUMA migration index,
      the A bit could be misinterpreted as part of the swap type.  The stray
      bits being set cause a software-cleared PTE to be interpreted as a
      swap entry.  In some cases (like when the swap index ends up being
      for a non-existent swapfile), the kernel detects the stray value
      and WARN()s about it, but there is no guarantee that the kernel can
      always detect it.
      
      When we have 64-bit PTEs (64-bit mode or 32-bit PAE), we were able
      to move the swap PTE format around to avoid these troublesome bits.
      But, 32-bit non-PAE is tight on bits.  So, disallow it from running
      on this hardware.  I can't imagine anyone wanting to run 32-bit
      non-highmem kernels on this hardware, but disallowing them from
      running entirely is surely the safe thing to do.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dave.hansen@intel.com
      Cc: linux-mm@kvack.org
      Cc: mhocko@suse.com
      Link: http://lkml.kernel.org/r/20160708001914.D0B50110@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e4a84be6
    • D
      x86/mm: Ignore A/D bits in pte/pmd/pud_none() · 97e3c602
      Dave Hansen 提交于
      The erratum we are fixing here can lead to stray setting of the
      A and D bits.  That means that a pte that we cleared might
      suddenly have A/D set.  So, stop considering those bits when
      determining if a pte is pte_none().  The same goes for the
      other pmd_none() and pud_none().  pgd_none() can be skipped
      because it is not affected; we do not use PGD entries for
      anything other than pagetables on affected configurations.
      
      This adds a tiny amount of overhead to all pte_none() checks.
      I doubt we'll be able to measure it anywhere.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dave.hansen@intel.com
      Cc: linux-mm@kvack.org
      Cc: mhocko@suse.com
      Link: http://lkml.kernel.org/r/20160708001912.5216F89C@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      97e3c602
    • D
      x86/mm: Move swap offset/type up in PTE to work around erratum · 00839ee3
      Dave Hansen 提交于
      This erratum can result in Accessed/Dirty getting set by the hardware
      when we do not expect them to be (on !Present PTEs).
      
      Instead of trying to fix them up after this happens, we just
      allow the bits to get set and try to ignore them.  We do this by
      shifting the layout of the bits we use for swap offset/type in
      our 64-bit PTEs.
      
      It looks like this:
      
       bitnrs: |     ...            | 11| 10|  9|8|7|6|5| 4| 3|2|1|0|
       names:  |     ...            |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P|
       before: |         OFFSET (9-63)          |0|X|X| TYPE(1-5) |0|
        after: | OFFSET (14-63)  |  TYPE (9-13) |0|X|X|X| X| X|X|X|0|
      
      Note that D was already a don't care (X) even before.  We just
      move TYPE up and turn its old spot (which could be hit by the
      A bit) into all don't cares.
      
      We take 5 bits away from the offset, but that still leaves us
      with 50 bits which lets us index into a 62-bit swapfile (4 EiB).
      I think that's probably fine for the moment.  We could
      theoretically reclaim 5 of the bits (1, 2, 3, 4, 7) but it
      doesn't gain us anything.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dave.hansen@intel.com
      Cc: linux-mm@kvack.org
      Cc: mhocko@suse.com
      Link: http://lkml.kernel.org/r/20160708001911.9A3FD2B6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      00839ee3
  4. 10 7月, 2016 2 次提交
  5. 09 7月, 2016 9 次提交
  6. 08 7月, 2016 10 次提交
    • L
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 977dcf0c
      Linus Torvalds 提交于
      Pull irq fixes from Ingo Molnar:
       "Two MIPS-GIC irqchip driver fixes to unbreak certain MIPS boards"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/mips-gic: Match IPI IRQ domain by bus token only
        irqchip/mips-gic: Map to VPs using HW VPNum
      977dcf0c
    • L
      Merge tag 'gpio-v4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 18b16676
      Linus Torvalds 提交于
      Pull GPIO fixes from Linus Walleij:
       "I don't like to toss in last minute patches, but these are all for
        things that are broken, and have bitten people for real.  Two of them
        go into stable.  Maybe all of them if the compile test problem is a
        pain in the ass also for stable folks.
      
        Final (hopefully) GPIO fixes for v4.7:
      
         - Fix an oops on the Asus Eee PC 1201
      
         - Revert a patch trying to split GPIO parsing and GPIO configuration
      
         - Revert a too liberal compile testing thing"
      
      * tag 'gpio-v4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
        Revert "gpio: gpiolib-of: Allow compile testing"
        Revert "gpiolib: Split GPIO flags parsing and GPIO configuration"
        gpio: sch: Fix Oops on module load on Asus Eee PC 1201
      18b16676
    • L
      Merge tag 'drm-fixes-for-v4.7-rc7' of git://people.freedesktop.org/~airlied/linux · 1d110cf5
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "One nouveau fix, and a few AMD Polaris fixes and some Allwinner fixes.
      
        I've got some vmware fixes that I might send separate over the
        weekend, they fix some black screens, but I'm still debating them"
      
      * tag 'drm-fixes-for-v4.7-rc7' of git://people.freedesktop.org/~airlied/linux:
        drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation.
        drm/amd/powerplay: fix bug that get wrong polaris evv voltage.
        drm/amd/powerplay: incorrectly use of the function return value
        drm/amd/powerplay: fix incorrect voltage table value for tonga
        drm/amd/powerplay: fix incorrect voltage table value for polaris10
        drm/nouveau/disp/sor/gf119: select correct sor when poking training pattern
        gpu: drm: sun4i_drv: add missing of_node_put after calling of_parse_phandle
        drm/sun4i: Send vblank event when the CRTC is disabled
        drm/sun4i: Report proper vblank
      1d110cf5
    • J
      ecryptfs: don't allow mmap when the lower fs doesn't support it · f0fe970d
      Jeff Mahoney 提交于
      There are legitimate reasons to disallow mmap on certain files, notably
      in sysfs or procfs.  We shouldn't emulate mmap support on file systems
      that don't offer support natively.
      
      CVE-2016-1583
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Cc: stable@vger.kernel.org
      [tyhicks: clean up f_op check by using ecryptfs_file_to_lower()]
      Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
      f0fe970d
    • B
      x86/asm/entry: Make thunk's restore a local label · 9a7e7b57
      Borislav Petkov 提交于
      No need to have it appear in objdump output.
      
      No functionality change.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160708141016.GH3808@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9a7e7b57
    • J
      xen/acpi: allow xen-acpi-processor driver to load on Xen 4.7 · 6f2d9d99
      Jan Beulich 提交于
      As of Xen 4.7 PV CPUID doesn't expose either of CPUID[1].ECX[7] and
      CPUID[0x80000007].EDX[7] anymore, causing the driver to fail to load on
      both Intel and AMD systems. Doing any kind of hardware capability
      checks in the driver as a prerequisite was wrong anyway: With the
      hypervisor being in charge, all such checking should be done by it. If
      ACPI data gets uploaded despite some missing capability, the hypervisor
      is free to ignore part or all of that data.
      
      Ditch the entire check_prereq() function, and do the only valid check
      (xen_initial_domain()) in the caller in its place.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      6f2d9d99
    • D
      selftests/x86: Add vDSO mremap() test · f80fd3a5
      Dmitry Safonov 提交于
      Should print this on vDSO remapping success (on new kernels):
      
       [root@localhost ~]# ./test_mremap_vdso_32
      	AT_SYSINFO_EHDR is 0xf773f000
       [NOTE]	Moving vDSO: [f773f000, f7740000] -> [a000000, a001000]
       [OK]
      
      Or print that mremap() for vDSOs is unsupported:
      
       [root@localhost ~]# ./test_mremap_vdso_32
      	AT_SYSINFO_EHDR is 0xf773c000
       [NOTE]	Moving vDSO: [0xf773c000, 0xf773d000] -> [0xf7737000, 0xf7738000]
       [FAIL]	mremap() of the vDSO does not work on this kernel!
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: 0x7f454c46@gmail.com
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kselftest@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160628113539.13606-3-dsafonov@virtuozzo.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f80fd3a5
    • D
      x86/vdso: Add mremap hook to vm_special_mapping · b059a453
      Dmitry Safonov 提交于
      Add possibility for 32-bit user-space applications to move
      the vDSO mapping.
      
      Previously, when a user-space app called mremap() for the vDSO
      address, in the syscall return path it would land on the previous
      address of the vDSOpage, resulting in segmentation violation.
      
      Now it lands fine and returns to userspace with a remapped vDSO.
      
      This will also fix the context.vdso pointer for 64-bit, which does
      not affect the user of vDSO after mremap() currently, but this
      may change in the future.
      
      As suggested by Andy, return -EINVAL for mremap() that would
      split the vDSO image: that operation cannot possibly result in
      a working system so reject it.
      
      Renamed and moved the text_mapping structure declaration inside
      map_vdso(), as it used only there and now it complements the
      vvar_mapping variable.
      
      There is still a problem for remapping the vDSO in glibc
      applications: the linker relocates addresses for syscalls
      on the vDSO page, so you need to relink with the new
      addresses.
      
      Without that the next syscall through glibc may fail:
      
        Program received signal SIGSEGV, Segmentation fault.
        #0  0xf7fd9b80 in __kernel_vsyscall ()
        #1  0xf7ec8238 in _exit () from /usr/lib32/libc.so.6
      Signed-off-by: NDmitry Safonov <dsafonov@virtuozzo.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: 0x7f454c46@gmail.com
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160628113539.13606-2-dsafonov@virtuozzo.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b059a453
    • J
      xenbus: simplify xenbus_dev_request_and_reply() · e5a79475
      Jan Beulich 提交于
      No need to retain a local copy of the full request message, only the
      type is really needed.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      e5a79475
    • J
      xenbus: don't bail early from xenbus_dev_request_and_reply() · 7469be95
      Jan Beulich 提交于
      xenbus_dev_request_and_reply() needs to track whether a transaction is
      open.  For XS_TRANSACTION_START messages it calls transaction_start()
      and for XS_TRANSACTION_END messages it calls transaction_end().
      
      If sending an XS_TRANSACTION_START message fails or responds with an
      an error, the transaction is not open and transaction_end() must be
      called.
      
      If sending an XS_TRANSACTION_END message fails, the transaction is
      still open, but if an error response is returned the transaction is
      closed.
      
      Commit 027bd7e8 ("xen/xenbus: Avoid synchronous wait on XenBus
      stalling shutdown/restart") introduced a regression where failed
      XS_TRANSACTION_START messages were leaving the transaction open.  This
      can cause problems with suspend (and migration) as all transactions
      must be closed before suspending.
      
      It appears that the problematic change was added accidentally, so just
      remove it.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      7469be95