1. 09 5月, 2016 7 次提交
    • F
      MIPS: BMIPS: Add Whirlwind (BMIPS5200) initialization code · 21b30c00
      Florian Fainelli 提交于
      Import bmips_5xxx_init.S from the stblinux-3.3 tree, and to make sure that this
      would work nicely with a BMIPS multiplatform kernel (with BMIPS330, BMIPS43XX
      and BMIPS5000 enabled), update soft_reset to check for the BMIPS5200 processor
      id (PRID_IMP_BMIPS5200) and execute bmips_5xxx_init for these processors to
      bring them online.
      
      Tested on 7425, 7429 and 7435 with CPU hotplug. 7435 SMP still needs some
      additional changes in the L1 interrupt area to work properly with interrupt
      affinity.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Cc: linux-mips@linux-mips.org
      Cc: john@phrozen.org
      Cc: cernekee@gmail.com
      Cc: jon.fraser@broadcom.com
      Cc: jaedon.shin@gmail.com
      Cc: dragan.stancevic@gmail.com
      Cc: jogo@openwrt.org
      Patchwork: https://patchwork.linux-mips.org/patch/12377/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      21b30c00
    • F
      MIPS: BMIPS: Fix PRID_IMP_BMIPS5000 masking for BMIPS5200 · cbbda6e7
      Florian Fainelli 提交于
      BMIPS5000 have a PrID value of 0x5A00 and BMIPS5200 have a PrID value of
      0x5B00, which, masked with 0x5A00, returns 0x5A00. Update all conditionals on
      the PrID to cover both variants since we are going to need this to enable
      BMIPS5200 SMP. The existing check, masking with 0xFF00 would not cover
      BMIPS5200 at all.
      
      Fixes: 68e6a783 ("MIPS: BMIPS: Add PRId for BMIPS5200 (Whirlwind)")
      Fixes: 6465460c ("MIPS: BMIPS: change compile time checks to runtime checks")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Cc: john@phrozen.org
      Cc: cernekee@gmail.com
      Cc: jogo@openwrt.org
      Cc: jaedon.shin@gmail.com
      Cc: jfraser@broadcom.com
      Cc: pgynther@google.com
      Cc: dragan.stancevic@gmail.com
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/12279/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      cbbda6e7
    • P
      MIPS: Don't BUG_ON when no IPI domain is found · 578bffc8
      Paul Burton 提交于
      Commit fbde2d7d ("MIPS: Add generic SMP IPI support") introduced
      code that BUG_ON's in the case of a kernel that supports IPI domains but
      does not have one at runtime. This case is possible on Malta where for
      IPIs we may use either the GIC (which has an IPI IRQ domain
      implementation) or core-local software interrupts between VPEs (which do
      not currently have an IPI IRQ domain implementation). We can not know
      which will be used until runtime when we know whether a GIC is actually
      present, and if we run on a system with multiple VPEs and no GIC then
      the BUG_ON is hit.
      
      Commit 19fb5818 ("IPS: Fix broken malta qemu") worked around this
      for the single-core single-VPE case typically seen using QEMU, but does
      not catch the multi-VPE case. This patch removes the insufficient CPU
      presence check that was added and works around the bug differently,
      effectively reverting that commit.
      
      A simple way to reproduce this bug is by using QEMU, which partially
      implements the MT ASE but does not implement the GIC as of version 2.5.
      Using "-cpu 34Kf -smp 2" will present a system with 2 VPEs in one core &
      no GIC, hitting the BUG_ON.
      
      Given that we're post-merge-window on the way to v4.6, avoid this by
      just returning from mips_smp_ipi_init when no IPI IRQ domain is found.
      Ideally at some point all IPI implementations would be converted to the
      same IPI IRQ domain interface & we'd be able to restore the check.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Cc: Qais Yousef <qsyousef@gmail.com>
      Fixes: fbde2d7d ("MIPS: Add generic SMP IPI support")
      Fixes: 19fb5818 ("IPS: Fix broken malta qemu")
      Reverts: 19fb5818 ("IPS: Fix broken malta qemu")
      Cc: Qais Yousef <qsyousef@gmail.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Alex Smith <alex.smith@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/13007/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      578bffc8
    • J
      MIPS: Fix siginfo.h to use strict posix types · 5daebc47
      James Hogan 提交于
      Commit 85efde6f ("make exported headers use strict posix types")
      changed the asm-generic siginfo.h to use the __kernel_* types, and
      commit 3a471cbc ("remove __KERNEL_STRICT_NAMES") make the internal
      types accessible only to the kernel, but the MIPS implementation hasn't
      been updated to match.
      
      Switch to proper types now so that the exported asm/siginfo.h won't
      produce quite so many compiler errors when included alone by a user
      program.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Christopher Ferris <cferris@google.com>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 2.6.30-
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/12477/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      5daebc47
    • C
      MIPS: Fix crash registers on non-crashing CPUs · c80e1b62
      Corey Minyard 提交于
      As part of handling a crash on an SMP system, an IPI is send to
      all other CPUs to save their current registers and stop.  It was
      using task_pt_regs(current) to get the registers, but that will
      only be accurate if the CPU was interrupted running in userland.
      Instead allow the architecture to pass in the registers (all
      pass NULL now, but allow for the future) and then use get_irq_regs()
      which should be accurate as we are in an interrupt.  Fall back to
      task_pt_regs(current) if nothing else is available.
      Signed-off-by: NCorey Minyard <cminyard@mvista.com>
      Cc: David Daney <ddaney@caviumnetworks.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/13050/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      c80e1b62
    • N
      mips: Fix CPC_BASE_ADDR mask to match datasheet · e03ac9f0
      Nikolay Martynov 提交于
      According to 'MIPS32® interAptivTM Multiprocessing
      System Programmer’s Guide' CPC_BASE_ADDR takes bits [31:15].
      
      This change is tested ith mt7621 which wasn't working without it.
      Signed-off-by: NNikolay Martynov <mar.kolya@gmail.com>
      Reviewed-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/11766/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      e03ac9f0
    • L
      Linux 4.6-rc7 · 44549e8f
      Linus Torvalds 提交于
      44549e8f
  2. 08 5月, 2016 3 次提交
  3. 07 5月, 2016 14 次提交
  4. 06 5月, 2016 16 次提交
    • D
      parisc: fix a bug when syscall number of tracee is __NR_Linux_syscalls · f0b22d1b
      Dmitry V. Levin 提交于
      Do not load one entry beyond the end of the syscall table when the
      syscall number of a traced process equals to __NR_Linux_syscalls.
      Similar bug with regular processes was fixed by commit 3bb457af
      ("[PARISC] Fix bug when syscall nr is __NR_Linux_syscalls").
      
      This bug was found by strace test suite.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
      Acked-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      f0b22d1b
    • R
      Merge branches 'pm-opp-fixes', 'pm-cpufreq-fixes' and 'pm-cpuidle-fixes' · 5f2f88e3
      Rafael J. Wysocki 提交于
      * pm-opp-fixes:
        PM / OPP: Remove useless check
      
      * pm-cpufreq-fixes:
        intel_pstate: Fix intel_pstate_get()
        cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
        cpufreq: st: enable selective initialization based on the platform
      
      * pm-cpuidle-fixes:
        ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
      5f2f88e3
    • R
      Merge branches 'acpica-fixes' and 'device-properties-fixes' · 7c21b38c
      Rafael J. Wysocki 提交于
      * acpica-fixes:
        ACPICA: Dispatcher: Update thread ID for recursive method calls
      
      * device-properties-fixes:
        device property: Avoid potential dereferences of invalid pointers
      7c21b38c
    • C
      x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO · 886123fb
      Chen Yu 提交于
      Currently we read the tsc radio: ratio = (MSR_PLATFORM_INFO >> 8) & 0x1f;
      
      Thus we get bit 8-12 of MSR_PLATFORM_INFO, however according to the SDM
      (35.5), the ratio bits are bit 8-15.
      
      Ignoring the upper bits can result in an incorrect tsc ratio, which causes the
      TSC calibration and the Local APIC timer frequency to be incorrect.
      
      Fix this problem by masking 0xff instead.
      
      [ tglx: Massaged changelog ]
      
      Fixes: 7da7c156 "x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs"
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: stable@vger.kernel.org
      Cc: Bin Gao <bin.gao@intel.com>
      Cc: Len Brown <lenb@kernel.org>
      Link: http://lkml.kernel.org/r/1462505619-5516-1-git-send-email-yu.c.chen@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      886123fb
    • L
      Merge branch 'akpm' (patches from Andrew) · 9caa7e78
      Linus Torvalds 提交于
      Merge fixes from Andrew Morton:
       "14 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        byteswap: try to avoid __builtin_constant_p gcc bug
        lib/stackdepot: avoid to return 0 handle
        mm: fix kcompactd hang during memory offlining
        modpost: fix module autoloading for OF devices with generic compatible property
        proc: prevent accessing /proc/<PID>/environ until it's ready
        mm/zswap: provide unique zpool name
        mm: thp: kvm: fix memory corruption in KVM with THP enabled
        MAINTAINERS: fix Rajendra Nayak's address
        mm, cma: prevent nr_isolated_* counters from going negative
        mm: update min_free_kbytes from khugepaged after core initialization
        huge pagecache: mmap_sem is unlocked when truncation splits pmd
        rapidio/mport_cdev: fix uapi type definitions
        mm: memcontrol: let v2 cgroups follow changes in system swappiness
        mm: thp: correct split_huge_pages file permission
      9caa7e78
    • L
      mailmap: add John Paul Adrian Glaubitz · 43a3e837
      Linus Torvalds 提交于
      Apparently patchwork ended up truncating the full name.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43a3e837
    • L
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 7270a3f7
      Linus Torvalds 提交于
      Pull libnvdimm fixes from Dan Williams:
      
       - a fix for the persistent memory 'struct page' driver.  The
         implementation overlooked the fact that pages are allocated in 2MB
         units leading to -ENOMEM when establishing some configurations.
      
         It's tagged for -stable as the problem was introduced with the
         initial implementation in 4.5.
      
       - The new "error status translation" routine, introduced with the 4.6
         updates to the nfit driver, missed a necessary path in
         acpi_nfit_ctl().
      
         The end result is that we are falsely assuming commands complete
         successfully when the embedded status says otherwise.
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        nfit: fix translation of command status results
        libnvdimm, pfn: fix memmap reservation sizing
      7270a3f7
    • A
      byteswap: try to avoid __builtin_constant_p gcc bug · 7322dd75
      Arnd Bergmann 提交于
      This is another attempt to avoid a regression in wwn_to_u64() after that
      started using get_unaligned_be64(), which in turn ran into a bug on
      gcc-4.9 through 6.1.
      
      The regression got introduced due to the combination of two separate
      workarounds (commits e3bde956: "include/linux/unaligned: force
      inlining of byteswap operations" and ef3fb242: "scsi: fc: use
      get/put_unaligned64 for wwn access") that each try to sidestep distinct
      problems with gcc behavior (code growth and increased stack usage).
      
      Unfortunately after both have been applied, a more serious gcc bug has
      been uncovered, leading to incorrect object code that discards part of a
      function and causes undefined behavior.
      
      As part of this problem is how __builtin_constant_p gets evaluated on an
      argument passed by reference into an inline function, this avoids the
      use of __builtin_constant_p() for all architectures that set
      CONFIG_ARCH_USE_BUILTIN_BSWAP.  Most architectures do not set
      ARCH_SUPPORTS_OPTIMIZED_INLINING, which means they probably do not
      suffer from the problem in the qla2xxx driver, but they might still run
      into it elsewhere.
      
      Both of the original workarounds were only merged in the 4.6 kernel, and
      the bug that is fixed by this patch should only appear if both are
      there, so we probably don't need to backport the fix.  On the other
      hand, it works by simplifying the code path and should not have any
      negative effects.
      
      [arnd@arndb.de: fix older gcc warnings]
        (http://lkml.kernel.org/r/12243652.bxSxEgjgfk@wuerfel)
      Link: https://lkml.org/lkml/headers/2016/4/12/1103
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70232
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
      Fixes: e3bde956 ("include/linux/unaligned: force inlining of byteswap operations")
      Fixes: ef3fb242 ("scsi: fc: use get/put_unaligned64 for wwn access")
      Link: http://lkml.kernel.org/r/1780465.XdtPJpi8Tt@wuerfelSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Tested-by: Josh Poimboeuf <jpoimboe@redhat.com> # on gcc-5.3
      Tested-by: NQuinn Tran <quinn.tran@qlogic.com>
      Cc: Martin Jambor <mjambor@suse.cz>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
      Cc: Jan Hubicka <hubicka@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7322dd75
    • J
      lib/stackdepot: avoid to return 0 handle · 7c31190b
      Joonsoo Kim 提交于
      Recently, we allow to save the stacktrace whose hashed value is 0.  It
      causes the problem that stackdepot could return 0 even if in success.
      User of stackdepot cannot distinguish whether it is success or not so we
      need to solve this problem.  In this patch, 1 bit are added to handle
      and make valid handle none 0 by setting this bit.  After that, valid
      handle will not be 0 and 0 handle will represent failure correctly.
      
      Fixes: 33334e25 ("lib/stackdepot.c: allow the stack trace hash to be zero")
      Link: http://lkml.kernel.org/r/1462252403-1106-1-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c31190b
    • V
      mm: fix kcompactd hang during memory offlining · 172400c6
      Vlastimil Babka 提交于
      Assume memory47 is the last online block left in node1.  This will hang:
      
        # echo offline > /sys/devices/system/node/node1/memory47/state
      
      After a couple of minutes, the following pops up in dmesg:
      
        INFO: task bash:957 blocked for more than 120 seconds.
               Not tainted 4.6.0-rc6+ #6
        "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
        bash            D ffff8800b7adbaf8     0   957    951 0x00000000
        Call Trace:
          schedule+0x35/0x80
          schedule_timeout+0x1ac/0x270
          wait_for_completion+0xe1/0x120
          kthread_stop+0x4f/0x110
          kcompactd_stop+0x26/0x40
          __offline_pages.constprop.28+0x7e6/0x840
          offline_pages+0x11/0x20
          memory_block_action+0x73/0x1d0
          memory_subsys_offline+0x47/0x60
          device_offline+0x86/0xb0
          store_mem_state+0xda/0xf0
          dev_attr_store+0x18/0x30
          sysfs_kf_write+0x37/0x40
          kernfs_fop_write+0x11d/0x170
          __vfs_write+0x37/0x120
          vfs_write+0xa9/0x1a0
          SyS_write+0x55/0xc0
          entry_SYSCALL_64_fastpath+0x1a/0xa4
      
      kcompactd is waiting for kcompactd_max_order > 0 when it's woken up to
      actually exit.  Check kthread_should_stop() to break out of the wait.
      
      Fixes: 698b1b30 ("mm, compaction: introduce kcompactd").
      Reported-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Tested-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      172400c6
    • P
      modpost: fix module autoloading for OF devices with generic compatible property · acbef7b7
      Philipp Zabel 提交于
      Since the wildcard at the end of OF module aliases is gone, autoloading
      of modules that don't match a device's last (most generic) compatible
      value fails.
      
      For example the CODA960 VPU on i.MX6Q has the SoC specific compatible
      "fsl,imx6q-vpu" and the generic compatible "cnm,coda960".  Since the
      driver currently only works with knowledge about the SoC specific
      integration, it doesn't list "cnm,cod960" in the module device table.
      
      This results in the device compatible
      "of:NvpuT<NULL>Cfsl,imx6q-vpuCcnm,coda960" not matching the module alias
      "of:N*T*Cfsl,imx6q-vpu" anymore, whereas before commit 2f632369
      ("modpost: don't add a trailing wildcard for OF module aliases") it
      matched the module alias "of:N*T*Cfsl,imx6q-vpu*".
      
      This patch adds two module aliases for each compatible, one without the
      wildcard and one with "C*" appended.
      
        $ modinfo coda | grep imx6q
        alias:          of:N*T*Cfsl,imx6q-vpuC*
        alias:          of:N*T*Cfsl,imx6q-vpu
      
      Fixes: 2f632369 ("modpost: don't add a trailing wildcard for OF module aliases")
      Link: http://lkml.kernel.org/r/1462203339-15340-1-git-send-email-p.zabel@pengutronix.deSigned-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>
      Cc: Javier Martinez Canillas <javier@osg.samsung.com>
      Cc: Brian Norris <computersforpeace@gmail.com>
      Cc: Sjoerd Simons <sjoerd.simons@collabora.co.uk>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>	[4.5+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      acbef7b7
    • M
      proc: prevent accessing /proc/<PID>/environ until it's ready · 8148a73c
      Mathias Krause 提交于
      If /proc/<PID>/environ gets read before the envp[] array is fully set up
      in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to
      read more bytes than are actually written, as env_start will already be
      set but env_end will still be zero, making the range calculation
      underflow, allowing to read beyond the end of what has been written.
      
      Fix this as it is done for /proc/<PID>/cmdline by testing env_end for
      zero.  It is, apparently, intentionally set last in create_*_tables().
      
      This bug was found by the PaX size_overflow plugin that detected the
      arithmetic underflow of 'this_len = env_end - (env_start + src)' when
      env_end is still zero.
      
      The expected consequence is that userland trying to access
      /proc/<PID>/environ of a not yet fully set up process may get
      inconsistent data as we're in the middle of copying in the environment
      variables.
      
      Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: Pax Team <pageexec@freemail.hu>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Jarod Wilson <jarod@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8148a73c
    • D
      mm/zswap: provide unique zpool name · 32a4e169
      Dan Streetman 提交于
      Instead of using "zswap" as the name for all zpools created, add an
      atomic counter and use "zswap%x" with the counter number for each zpool
      created, to provide a unique name for each new zpool.
      
      As zsmalloc, one of the zpool implementations, requires/expects a unique
      name for each pool created, zswap should provide a unique name.  The
      zsmalloc pool creation does not fail if a new pool with a conflicting
      name is created, unless CONFIG_ZSMALLOC_STAT is enabled; in that case,
      zsmalloc pool creation fails with -ENOMEM.  Then zswap will be unable to
      change its compressor parameter if its zpool is zsmalloc; it also will
      be unable to change its zpool parameter back to zsmalloc, if it has any
      existing old zpool using zsmalloc with page(s) in it.  Attempts to
      change the parameters will result in failure to create the zpool.  This
      changes zswap to provide a unique name for each zpool creation.
      
      Fixes: f1c54846 ("zswap: dynamic pool creation")
      Signed-off-by: NDan Streetman <ddstreet@ieee.org>
      Reported-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Dan Streetman <dan.streetman@canonical.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32a4e169
    • A
      mm: thp: kvm: fix memory corruption in KVM with THP enabled · 127393fb
      Andrea Arcangeli 提交于
      After the THP refcounting change, obtaining a compound pages from
      get_user_pages() no longer allows us to assume the entire compound page
      is immediately mappable from a secondary MMU.
      
      A secondary MMU doesn't want to call get_user_pages() more than once for
      each compound page, in order to know if it can map the whole compound
      page.  So a secondary MMU needs to know from a single get_user_pages()
      invocation when it can map immediately the entire compound page to avoid
      a flood of unnecessary secondary MMU faults and spurious
      atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier
      users).
      
      Ideally instead of the page->_mapcount < 1 check, get_user_pages()
      should return the granularity of the "page" mapping in the "mm" passed
      to get_user_pages().  However it's non trivial change to pass the "pmd"
      status belonging to the "mm" walked by get_user_pages up the stack (up
      to the caller of get_user_pages).  So the fix just checks if there is
      not a single pte mapping on the page returned by get_user_pages, and in
      turn if the caller can assume that the whole compound page is mapped in
      the current "mm" (in a pmd_trans_huge()).  In such case the entire
      compound page is safe to map into the secondary MMU without additional
      get_user_pages() calls on the surrounding tail/head pages.  In addition
      of being faster, not having to run other get_user_pages() calls also
      reduces the memory footprint of the secondary MMU fault in case the pmd
      split happened as result of memory pressure.
      
      Without this fix after a MADV_DONTNEED (like invoked by QEMU during
      postcopy live migration or balloning) or after generic swapping (with a
      failure in split_huge_page() that would only result in pmd splitting and
      not a physical page split), KVM would map the whole compound page into
      the shadow pagetables, despite regular faults or userfaults (like
      UFFDIO_COPY) may map regular pages into the primary MMU as result of the
      pte faults, leading to the guest mode and userland mode going out of
      sync and not working on the same memory at all times.
      
      Any other secondary MMU notifier manager (KVM is just one of the many
      MMU notifier users) will need the same information if it doesn't want to
      run a flood of get_user_pages_fast and it can support multiple
      granularity in the secondary MMU mappings, so I think it is justified to
      be exposed not just to KVM.
      
      The other option would be to move transparent_hugepage_adjust to
      mm/huge_memory.c but that currently has all kind of KVM data structures
      in it, so it's definitely not a cut-and-paste work, so I couldn't do a
      fix as cleaner as this one for 4.6.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: "Li, Liang Z" <liang.z.li@intel.com>
      Cc: Amit Shah <amit.shah@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      127393fb
    • E
      MAINTAINERS: fix Rajendra Nayak's address · ff2de822
      Eric Engestrom 提交于
      Signed-off-by: NEric Engestrom <eric.engestrom@imgtec.com>
      Cc: Rajendra Nayak <rnayak@codeaurora.org>
      Cc: Afzal Mohammed <afzal.mohd.ma@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff2de822
    • H
      mm, cma: prevent nr_isolated_* counters from going negative · 14af4a5e
      Hugh Dickins 提交于
      /proc/sys/vm/stat_refresh warns nr_isolated_anon and nr_isolated_file go
      increasingly negative under compaction: which would add delay when
      should be none, or no delay when should delay.  The bug in compaction
      was due to a recent mmotm patch, but much older instance of the bug was
      also noticed in isolate_migratepages_range() which is used for CMA and
      gigantic hugepage allocations.
      
      The bug is caused by putback_movable_pages() in an error path
      decrementing the isolated counters without them being previously
      incremented by acct_isolated().  Fix isolate_migratepages_range() by
      removing the error-path putback, thus reaching acct_isolated() with
      migratepages still isolated, and leaving putback to caller like most
      other places do.
      
      Fixes: edc2ca61 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
      [vbabka@suse.cz: expanded the changelog]
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      14af4a5e