1. 14 4月, 2016 2 次提交
  2. 07 3月, 2016 8 次提交
    • T
      powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel · 8c50b72a
      Torsten Duwe 提交于
      Firstly we add logic to Kconfig to allow a user to choose if they want
      mprofile-kernel. This has to be user-selectable because only some
      current toolchains support it. If we enabled it unconditionally we would
      prevent some users from building the kernel entirely.
      
      Arguably it would be nice if we could detect if mprofile-kernel was
      available, and use it then. However that would violate the principle of
      least surprise because a user having choosen options such as live
      patching, would then see them quietly disabled at build time.
      
      We also make the user selectable option negative, ie. it disables when
      selected, so that allyesconfig continues to build on old toolchains.
      
      Once we've decided we do want to use mprofile-kernel, we then add a
      script which checks it actually works. That is because there are
      versions of gcc that accept the flag but don't generate correct code.
      
      Due to the way kconfig works, we can't error out when we detect a
      non-working toolchain. If we did a user would never be able to modify
      their config and run oldconfig - because the check would block oldconfig
      from running. Instead we emit a warning and add a bogus flag to CFLAGS
      so that the build will fail.
      Signed-off-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8c50b72a
    • T
      powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI · 15308664
      Torsten Duwe 提交于
      The gcc switch -mprofile-kernel defines a new ABI for calling _mcount()
      very early in the function with minimal overhead.
      
      Although mprofile-kernel has been available since GCC 3.4, there were
      bugs which were only fixed recently. Currently it is known to work in
      GCC 4.9, 5 and 6.
      
      Additionally there are two possible code sequences generated by the
      flag, the first uses mflr/std/bl and the second is optimised to omit the
      std. Currently only gcc 6 has the optimised sequence. This patch
      supports both sequences.
      
      Initial work started by Vojtech Pavlik, used with permission.
      
      Key changes:
       - rework _mcount() to work for both the old and new ABIs.
       - implement new versions of ftrace_caller() and ftrace_graph_caller()
         which deal with the new ABI.
       - updates to __ftrace_make_nop() to recognise the new mcount calling
         sequence.
       - updates to __ftrace_make_call() to recognise the nop'ed sequence.
       - implement ftrace_modify_call().
       - updates to the module loader to surpress the toc save in the module
         stub when calling mcount with the new ABI.
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      15308664
    • T
      powerpc/ftrace: Use $(CC_FLAGS_FTRACE) when disabling ftrace · 9a7841ae
      Torsten Duwe 提交于
      Rather than open-coding -pg whereever we want to disable ftrace, use the
      existing $(CC_FLAGS_FTRACE) variable.
      
      This has the advantage that it will work in future when we use a
      different set of flags to enable ftrace.
      Signed-off-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      9a7841ae
    • T
      powerpc/ftrace: Use generic ftrace_modify_all_code() · c96f8385
      Torsten Duwe 提交于
      Convert powerpc's arch_ftrace_update_code() from its own version to use
      the generic default functionality (without stop_machine -- our
      instructions are properly aligned and the replacements atomic).
      
      With this we gain error checking and the much-needed function_trace_op
      handling.
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
      Signed-off-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c96f8385
    • M
      powerpc/module: Create a special stub for ftrace_caller() · 336a7b5d
      Michael Ellerman 提交于
      In order to support the new -mprofile-kernel ABI, we need to be able to
      call from the module back to ftrace_caller() (in the kernel) without
      using the module's r2. That is because the function in this module which
      is calling ftrace_caller() may not have setup r2, if it doesn't
      otherwise need it (ie. it accesses no globals).
      
      To make that work we add a new stub which is used for calling
      ftrace_caller(), which uses the kernel toc instead of the module toc.
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      336a7b5d
    • M
      powerpc/module: Mark module stubs with a magic value · f17c4e01
      Michael Ellerman 提交于
      When a module is loaded, calls out to the kernel go via a stub which is
      generated at runtime. One of these stubs is used to call _mcount(),
      which is the default target of tracing calls generated by the compiler
      with -pg.
      
      If dynamic ftrace is enabled (which it typically is), another stub is
      used to call ftrace_caller(), which is the target of tracing calls when
      ftrace is actually active.
      
      ftrace then wants to disable the calls to _mcount() at module startup,
      and enable/disable the calls to ftrace_caller() when enabling/disabling
      tracing - all of these it does by patching the code.
      
      As part of that code patching, the ftrace code wants to confirm that the
      branch it is about to modify, is in fact a call to a module stub which
      calls _mcount() or ftrace_caller().
      
      Currently it does that by inspecting the instructions and confirming
      they are what it expects. Although that works, the code to do it is
      pretty intricate because it requires lots of knowledge about the exact
      format of the stub.
      
      We can make that process easier by marking the generated stubs with a
      magic value, and then looking for that magic value. Altough this is not
      as rigorous as the current method, I believe it is sufficient in
      practice.
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f17c4e01
    • M
      powerpc/module: Only try to generate the ftrace_caller() stub once · 136cd345
      Michael Ellerman 提交于
      Currently we generate the module stub for ftrace_caller() at the bottom
      of apply_relocate_add(). However apply_relocate_add() is potentially
      called more than once per module, which means we will try to generate
      the ftrace_caller() stub multiple times.
      
      Although the current code deals with that correctly, ie. it only
      generates a stub the first time, it would be clearer to only try to
      generate the stub once.
      
      Note also on first reading it may appear that we generate a different
      stub for each section that requires relocation, but that is not the
      case. The code in stub_for_addr() that searches for an existing stub
      uses sechdrs[me->arch.stubs_section], ie. the single stub section for
      this module.
      
      A cleaner approach is to only generate the ftrace_caller() stub once,
      from module_finalize(). Although the original code didn't check to see
      if the stub was actually generated correctly, it seems prudent to add a
      check, so do that. And an additional benefit is we can clean the ifdefs
      up a little.
      
      Finally we must propagate the const'ness of some of the pointers passed
      to module_finalize(), but that is also an improvement.
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      136cd345
    • M
      powerpc: Create a helper for getting the kernel toc value · a5cab83c
      Michael Ellerman 提交于
      Move the logic to work out the kernel toc pointer into a header. This is
      a good cleanup, and also means we can use it elsewhere in future.
      Reviewed-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
      Reviewed-by: NTorsten Duwe <duwe@suse.de>
      Reviewed-by: NBalbir Singh <bsingharora@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
      a5cab83c
  3. 08 2月, 2016 3 次提交
    • L
      Linux 4.5-rc3 · 388f7b1d
      Linus Torvalds 提交于
      388f7b1d
    • L
      Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · c17dfb01
      Linus Torvalds 提交于
      Pull ARM SoC fixes from Olof Johansson:
       "The first real batch of fixes for this release cycle, so there are a
        few more than usual.
      
        Most of these are fixes and tweaks to board support (DT bugfixes,
        etc).  I've also picked up a couple of small cleanups that seemed
        innocent enough that there was little reason to wait (const/
        __initconst and Kconfig deps).
      
        Quite a bit of the changes on OMAP were due to fixes to no longer
        write to rodata from assembly when ARM_KERNMEM_PERMS was enabled, but
        there were also other fixes.
      
        Kirkwood had a bunch of gpio fixes for some boards.  OMAP had RTC
        fixes on OMAP5, and Nomadik had changes to MMC parameters in DT.
      
        All in all, mostly the usual mix of various fixes"
      
      * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (46 commits)
        ARM: multi_v7_defconfig: enable DW_WATCHDOG
        ARM: nomadik: fix up SD/MMC DT settings
        ARM64: tegra: Add chosen node for tegra132 norrin
        ARM: realview: use "depends on" instead of "if" after prompt
        ARM: tango: use "depends on" instead of "if" after prompt
        ARM: tango: use const and __initconst for smp_operations
        ARM: realview: use const and __initconst for smp_operations
        bus: uniphier-system-bus: revive tristate prompt
        arm64: dts: Add missing DMA Abort interrupt to Juno
        bus: vexpress-config: Add missing of_node_put
        ARM: dts: am57xx: sbc-am57x: correct Eth PHY settings
        ARM: dts: am57xx: cl-som-am57x: fix CPSW EMAC pinmux
        ARM: dts: am57xx: sbc-am57x: fix UART3 pinmux
        ARM: dts: am57xx: cl-som-am57x: update SPI Flash frequency
        ARM: dts: am57xx: cl-som-am57x: set HOST mode for USB2
        ARM: dts: am57xx: sbc-am57x: fix SB-SOM EEPROM I2C address
        ARM: dts: LogicPD Torpedo: Revert Duplicative Entries
        ARM: dts: am437x: pixcir_tangoc: use correct flags for irq types
        ARM: dts: am4372: fix irq type for arm twd and global timer
        ARM: dts: at91: sama5d4 xplained: fix phy0 IRQ type
        ...
      c17dfb01
    • L
      Merge branch 'mailbox-devel' of git://git.linaro.org/landing-teams/working/fujitsu/integration · 63fee123
      Linus Torvalds 提交于
      Pull mailbox fixes from Jassi Brar:
      
       - fix getting element from the pcc-channels array by simply indexing
         into it
      
       - prevent building mailbox-test driver for archs that don't have IOMEM
      
      * 'mailbox-devel' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
        mailbox: Fix dependencies for !HAS_IOMEM archs
        mailbox: pcc: fix channel calculation in get_pcc_channel()
      63fee123
  4. 07 2月, 2016 2 次提交
    • L
      Merge tag 'usb-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 46df55ce
      Linus Torvalds 提交于
      Pull USB fixes from Greg KH:
       "Here are some USB fixes for 4.5-rc3.
      
        The usual, xhci fixes for reported issues, combined with some small
        gadget driver fixes, and a MAINTAINERS file update.  All have been in
        linux-next with no reported issues"
      
      * tag 'usb-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        xhci: harden xhci_find_next_ext_cap against device removal
        xhci: Fix list corruption in urb dequeue at host removal
        usb: host: xhci-plat: fix NULL pointer in probe for device tree case
        usb: xhci-mtk: fix AHB bus hang up caused by roothubs polling
        usb: xhci-mtk: fix bpkts value of LS/HS periodic eps not behind TT
        usb: xhci: apply XHCI_PME_STUCK_QUIRK to Intel Broxton-M platforms
        usb: xhci: set SSIC port unused only if xhci_suspend succeeds
        usb: xhci: add a quirk bit for ssic port unused
        usb: xhci: handle both SSIC ports in PME stuck quirk
        usb: dwc3: gadget: set the OTG flag in dwc3 gadget driver.
        Revert "xhci: don't finish a TD if we get a short-transfer event mid TD"
        MAINTAINERS: fix my email address
        usb: dwc2: Fix probe problem on bcm2835
        Revert "usb: dwc2: Move reset into dwc2_get_hwparams()"
        usb: musb: ux500: Fix NULL pointer dereference at system PM
        usb: phy: mxs: declare variable with initialized value
        usb: phy: msm: fix error handling in probe.
      46df55ce
    • L
      Merge tag 'staging-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · dacd53c8
      Linus Torvalds 提交于
      Pull staging and IIO driver fixes from Greg KH:
       "Here are some IIO and staging driver fixes for 4.5-rc3.
      
        All of them, except one, are for IIO drivers, and one is for a speakup
        driver fix caused by some earlier patches, to resolve a reported build
        failure"
      
      * tag 'staging-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        Staging: speakup: Fix allyesconfig build on mn10300
        iio: dht11: Use boottime
        iio: ade7753: avoid uninitialized data
        iio: pressure: mpl115: fix temperature offset sign
        iio: imu: Fix dependencies for !HAS_IOMEM archs
        staging: iio: Fix dependencies for !HAS_IOMEM archs
        iio: adc: Fix dependencies for !HAS_IOMEM archs
        iio: inkern: fix a NULL dereference on error
        iio:adc:ti_am335x_adc Fix buffered mode by identifying as software buffer.
        iio: light: acpi-als: Report data as processed
        iio: dac: mcp4725: set iio name property in sysfs
        iio: add HAS_IOMEM dependency to VF610_ADC
        iio: add IIO_TRIGGER dependency to STK8BA50
        iio: proximity: lidar: correct return value
        iio-light: Use a signed return type for ltr501_match_samp_freq()
      dacd53c8
  5. 06 2月, 2016 25 次提交
    • L
      Merge branch 'akpm' (patches from Andrew) · 5af9c2e1
      Linus Torvalds 提交于
      Merge fixes from Andrew Morton:
       "22 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
        epoll: restrict EPOLLEXCLUSIVE to POLLIN and POLLOUT
        radix-tree: fix oops after radix_tree_iter_retry
        MAINTAINERS: trim the file triggers for ABI/API
        dax: dirty inode only if required
        thp: make deferred_split_scan() work again
        mm: replace vma_lock_anon_vma with anon_vma_lock_read/write
        ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup
        um: asm/page.h: remove the pte_high member from struct pte_t
        mm, hugetlb: don't require CMA for runtime gigantic pages
        mm/hugetlb: fix gigantic page initialization/allocation
        mm: downgrade VM_BUG in isolate_lru_page() to warning
        mempolicy: do not try to queue pages from !vma_migratable()
        mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress
        vmstat: make vmstat_update deferrable
        mm, vmstat: make quiet_vmstat lighter
        mm/Kconfig: correct description of DEFERRED_STRUCT_PAGE_INIT
        memblock: don't mark memblock_phys_mem_size() as __init
        dump_stack: avoid potential deadlocks
        mm: validate_mm browse_rb SMP race condition
        m32r: fix build failure due to SMP and MMU
        ...
      5af9c2e1
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 5d6a6a75
      Linus Torvalds 提交于
      Pull Ceph fixes from Sage Weil:
       "We have a few wire protocol compatibility fixes, ports of a few recent
        CRUSH mapping changes, and a couple error path fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
        libceph: MOSDOpReply v7 encoding
        libceph: advertise support for TUNABLES5
        crush: decode and initialize chooseleaf_stable
        crush: add chooseleaf_stable tunable
        crush: ensure take bucket value is valid
        crush: ensure bucket id is valid before indexing buckets array
        ceph: fix snap context leak in error path
        ceph: checking for IS_ERR instead of NULL
      5d6a6a75
    • L
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 9b108828
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "Fixes all over the place:
      
         - amdkfd: two static checker fixes
         - mst: a bunch of static checker and spec/hw interaction fixes
         - amdgpu: fix Iceland hw properly, and some fiji bugs, along with
           some write-combining fixes.
         - exynos: some regression fixes
         - adv7511: fix some EDID reading issues"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (38 commits)
        drm/dp/mst: deallocate payload on port destruction
        drm/dp/mst: Reverse order of MST enable and clearing VC payload table.
        drm/dp/mst: move GUID storage from mgr, port to only mst branch
        drm/dp/mst: change MST detection scheme
        drm/dp/mst: Calculate MST PBN with 31.32 fixed point
        drm: Add drm_fixp_from_fraction and drm_fixp2int_ceil
        drm/mst: Add range check for max_payloads during init
        drm/mst: Don't ignore the MST PBN self-test result
        drm: fix missing reference counting decrease
        drm/amdgpu: disable uvd and vce clockgating on Fiji
        drm/amdgpu: remove exp hardware support from iceland
        drm/amdgpu: load MEC ucode manually on iceland
        drm/amdgpu: don't load MEC2 on topaz
        drm/amdgpu: drop topaz support from gmc8 module
        drm/amdgpu: pull topaz gmc bits into gmc_v7
        drm/amdgpu: The VI specific EXE bit should only apply to GMC v8.0 above
        drm/amdgpu: iceland use CI based MC IP
        drm/amdgpu: move gmc7 support out of CIK dependency
        drm/amdgpu/gfx7: enable cp inst/reg error interrupts
        drm/amdgpu/gfx8: enable cp inst/reg error interrupts
        ...
      9b108828
    • L
      Merge tag 'pm+acpi-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 22f60701
      Linus Torvalds 提交于
      Pull power management and ACPI fixes from Rafael Wysocki:
       "These are: a fix for a recently introduced false-positive warnings
        about PM domain pointers being changed inappropriately (harmless but
        annoying), an MCH size workaround quirk for one more platform, a
        compiler warning fix (generic power domains framework), an ACPI LPSS
        (Intel SoCs) driver fixup and a cleanup of the ACPI CPPC core code.
      
        Specifics:
      
         - PM core fix to avoid false-positive warnings generated when the
           pm_domain field is cleared for a device that appears to be bound to
           a driver (Rafael Wysocki).
      
         - New MCH size workaround quirk for Intel Haswell-ULT (Josh Boyer).
      
         - Fix for an "unused function" compiler warning in the generic power
           domains framework (Ulf Hansson).
      
         - Fixup for the ACPI driver for Intel SoCs (acpi-lpss) to set the PM
           domain pointer of a device properly in one place that was
           overlooked by a recent PM core update (Andy Shevchenko).
      
         - Removal of a redundant function declaration in the ACPI CPPC core
           code (Timur Tabi)"
      
      * tag 'pm+acpi-4.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM: Avoid false-positive warnings in dev_pm_domain_set()
        PM / Domains: Silence compiler warning for an unused function
        ACPI / CPPC: remove redundant mbox_send_message() declaration
        ACPI / LPSS: set PM domain via helper setter
        PNP: Add Haswell-ULT to Intel MCH size workaround
      22f60701
    • J
      epoll: restrict EPOLLEXCLUSIVE to POLLIN and POLLOUT · b6a515c8
      Jason Baron 提交于
      In the current implementation of the EPOLLEXCLUSIVE flag (added for
      4.5-rc1), if epoll waiters create different POLL* sets and register them
      as exclusive against the same target fd, the current implementation will
      stop waking any further waiters once it finds the first idle waiter.
      This means that waiters could miss wakeups in certain cases.
      
      For example, when we wake up a pipe for reading we do:
      wake_up_interruptible_sync_poll(&pipe->wait, POLLIN | POLLRDNORM); So if
      one epoll set or epfd is added to pipe p with POLLIN and a second set
      epfd2 is added to pipe p with POLLRDNORM, only epfd may receive the
      wakeup since the current implementation will stop after it finds any
      intersection of events with a waiter that is blocked in epoll_wait().
      
      We could potentially address this by requiring all epoll waiters that
      are added to p be required to pass the same set of POLL* events.  IE the
      first EPOLL_CTL_ADD that passes EPOLLEXCLUSIVE establishes the set POLL*
      flags to be used by any other epfds that are added as EPOLLEXCLUSIVE.
      However, I think it might be somewhat confusing interface as we would
      have to reference count the number of users for that set, and so
      userspace would have to keep track of that count, or we would need a
      more involved interface.  It also adds some shared state that we'd have
      store somewhere.  I don't think anybody will want to bloat
      __wait_queue_head for this.
      
      I think what we could do instead, is to simply restrict EPOLLEXCLUSIVE
      such that it can only be specified with EPOLLIN and/or EPOLLOUT.  So
      that way if the wakeup includes 'POLLIN' and not 'POLLOUT', we can stop
      once we hit the first idle waiter that specifies the EPOLLIN bit, since
      any remaining waiters that only have 'POLLOUT' set wouldn't need to be
      woken.  Likewise, we can do the same thing if 'POLLOUT' is in the wakeup
      bit set and not 'POLLIN'.  If both 'POLLOUT' and 'POLLIN' are set in the
      wake bit set (there is at least one example of this I saw in fs/pipe.c),
      then we just wake the entire exclusive list.  Having both 'POLLOUT' and
      'POLLIN' both set should not be on any performance critical path, so I
      think that's ok (in fs/pipe.c its in pipe_release()).  We also continue
      to include EPOLLERR and EPOLLHUP by default in any exclusive set.  Thus,
      the user can specify EPOLLERR and/or EPOLLHUP but is not required to do
      so.
      
      Since epoll waiters may be interested in other events as well besides
      EPOLLIN, EPOLLOUT, EPOLLERR and EPOLLHUP, these can still be added by
      doing a 'dup' call on the target fd and adding that as one normally
      would with EPOLL_CTL_ADD.  Since I think that the POLLIN and POLLOUT
      events are what we are interest in balancing, I think that the 'dup'
      thing could perhaps be added to only one of the waiter threads.
      However, I think that EPOLLIN, EPOLLOUT, EPOLLERR and EPOLLHUP should be
      sufficient for the majority of use-cases.
      
      Since EPOLLEXCLUSIVE is intended to be used with a target fd shared
      among multiple epfds, where between 1 and n of the epfds may receive an
      event, it does not satisfy the semantics of EPOLLONESHOT where only 1
      epfd would get an event.  Thus, it is not allowed to be specified in
      conjunction with EPOLLEXCLUSIVE.
      
      EPOLL_CTL_MOD is also not allowed if the fd was previously added as
      EPOLLEXCLUSIVE.  It seems with the limited number of flags to not be as
      interesting, but this could be relaxed at some further point.
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Tested-by: NMadars Vitolins <m@silodev.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Cc: Eric Wong <normalperson@yhbt.net>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Hagen Paul Pfeifer <hagen@jauu.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6a515c8
    • K
      radix-tree: fix oops after radix_tree_iter_retry · 73204282
      Konstantin Khlebnikov 提交于
      Helper radix_tree_iter_retry() resets next_index to the current index.
      In following radix_tree_next_slot current chunk size becomes zero.  This
      isn't checked and it tries to dereference null pointer in slot.
      
      Tagged iterator is fine because retry happens only at slot 0 where tag
      bitmask in iter->tags is filled with single bit.
      
      Fixes: 46437f9a ("radix-tree: fix race in gang lookup")
      Signed-off-by: NKonstantin Khlebnikov <koct9i@gmail.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ohad Ben-Cohen <ohad@wizery.com>
      Cc: Jeremiah Mahler <jmmahler@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73204282
    • M
      MAINTAINERS: trim the file triggers for ABI/API · b14fd334
      Michael Kerrisk (man-pages) 提交于
      Commit ea8f8fc8 ("MAINTAINERS: add linux-api for review of API/ABI
      changes") added file triggers for various paths that likely indicated
      API/ABI changes.  However, catching all changes in Documentation/ABI/
      and include/uapi/ produces a large volume of mail to linux-api, rather
      than only API/ABI changes.  Drop those two entries, but leave
      include/linux/syscalls.h and kernel/sys_ni.c to catch syscall-related
      changes.
      
      [josh@joshtriplett.org: redid changelog]
      Signed-off-by: NMichael Kerrisk <mtk.man-pages@gmail.com>
      Acked-by: NShuah khan <shuahkh@osg.samsung.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b14fd334
    • D
      dax: dirty inode only if required · d2b2a28e
      Dmitry Monakhov 提交于
      Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d2b2a28e
    • K
      thp: make deferred_split_scan() work again · ae026204
      Kirill A. Shutemov 提交于
      We need to iterate over split_queue, not local empty list to get
      anything split from the shrinker.
      
      Fixes: e3ae1953 ("thp: limit number of object to scan on deferred_split_scan()")
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae026204
    • K
      mm: replace vma_lock_anon_vma with anon_vma_lock_read/write · 12352d3c
      Konstantin Khlebnikov 提交于
      Sequence vma_lock_anon_vma() - vma_unlock_anon_vma() isn't safe if
      anon_vma appeared between lock and unlock.  We have to check anon_vma
      first or call anon_vma_prepare() to be sure that it's here.  There are
      only few users of these legacy helpers.  Let's get rid of them.
      
      This patch fixes anon_vma lock imbalance in validate_mm().  Write lock
      isn't required here, read lock is enough.
      
      And reorders expand_downwards/expand_upwards: security_mmap_addr() and
      wrapping-around check don't have to be under anon vma lock.
      
      Link: https://lkml.kernel.org/r/CACT4Y+Y908EjM2z=706dv4rV6dWtxTLK9nFg9_7DhRMLppBo2g@mail.gmail.comSigned-off-by: NKonstantin Khlebnikov <koct9i@gmail.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12352d3c
    • X
      ocfs2/dlm: clear refmap bit of recovery lock while doing local recovery cleanup · c95a5180
      xuejiufei 提交于
      When recovery master down, dlm_do_local_recovery_cleanup() only remove
      the $RECOVERY lock owned by dead node, but do not clear the refmap bit.
      Which will make umount thread falling in dead loop migrating $RECOVERY
      to the dead node.
      Signed-off-by: Nxuejiufei <xuejiufei@huawei.com>
      Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c95a5180
    • N
      um: asm/page.h: remove the pte_high member from struct pte_t · 012a4163
      Nicolai Stange 提交于
      Commit 16da3068 ("um: kill pfn_t") introduced a compile warning for
      defconfig (SUBARCH=i386):
      
        arch/um/kernel/skas/mmu.c:38:206:
            warning: right shift count >= width of type [-Wshift-count-overflow]
      
      Aforementioned patch changes the definition of the phys_to_pfn() macro
      from
      
        ((pfn_t) ((p) >> PAGE_SHIFT))
      
      to
      
        ((p) >> PAGE_SHIFT)
      
      This effectively changes the phys_to_pfn() expansion's type from
      unsigned long long to unsigned long.
      
      Through the callchain init_stub_pte() => mk_pte(), the expansion of
      phys_to_pfn() is (indirectly) fed into the 'phys' argument of the
      pte_set_val(pte, phys, prot) macro, eventually leading to
      
        (pte).pte_high = (phys) >> 32;
      
      This results in the warning from above.
      
      Since UML only deals with 32 bit addresses, the upper 32 bits from
      'phys' used to be always zero anyway.  Also, all page protection flags
      defined by UML don't use any bits beyond bit 9.  Since the contents of a
      PTE are defined within architecture scope only, the ->pte_high member
      can be safely removed.
      
      Remove the ->pte_high member from struct pte_t.
      Rename ->pte_low to ->pte.
      Adapt the pte helper macros in arch/um/include/asm/page.h.
      
      Noteworthy is the pte_copy() macro where a smp_wmb() gets dropped.  This
      write barrier doesn't seem to be paired with any read barrier though and
      thus, was useless anyway.
      
      Fixes: 16da3068 ("um: kill pfn_t")
      Signed-off-by: NNicolai Stange <nicstange@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Nicolai Stange <nicstange@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      012a4163
    • V
      mm, hugetlb: don't require CMA for runtime gigantic pages · 080fe206
      Vlastimil Babka 提交于
      Commit 944d9fec ("hugetlb: add support for gigantic page allocation
      at runtime") has added the runtime gigantic page allocation via
      alloc_contig_range(), making this support available only when CONFIG_CMA
      is enabled.  Because it doesn't depend on MIGRATE_CMA pageblocks and the
      associated infrastructure, it is possible with few simple adjustments to
      require only CONFIG_MEMORY_ISOLATION instead of full CONFIG_CMA.
      
      After this patch, alloc_contig_range() and related functions are
      available and used for gigantic pages with just CONFIG_MEMORY_ISOLATION
      enabled.  Note CONFIG_CMA selects CONFIG_MEMORY_ISOLATION.  This allows
      supporting runtime gigantic pages without the CMA-specific checks in
      page allocator fastpaths.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      080fe206
    • M
      mm/hugetlb: fix gigantic page initialization/allocation · b4330afb
      Mike Kravetz 提交于
      Attempting to preallocate 1G gigantic huge pages at boot time with
      "hugepagesz=1G hugepages=1" on the kernel command line will prevent
      booting with the following:
      
        kernel BUG at mm/hugetlb.c:1218!
      
      When mapcount accounting was reworked, the setting of
      compound_mapcount_ptr in prep_compound_gigantic_page was overlooked.  As
      a result, the validation of mapcount in free_huge_page fails.
      
      The "BUG_ON" checks in free_huge_page were also changed to
      "VM_BUG_ON_PAGE" to assist with debugging.
      
      Fixes: 53f9263b ("mm: rework mapcount accounting to enable 4k mapping of THPs")
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Tested-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4330afb
    • K
      mm: downgrade VM_BUG in isolate_lru_page() to warning · cf2a82ee
      Kirill A. Shutemov 提交于
      Calling isolate_lru_page() is wrong and shouldn't happen, but it not
      nessesary fatal: the page just will not be isolated if it's not on LRU.
      
      Let's downgrade the VM_BUG_ON_PAGE() to WARN_RATELIMIT().
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf2a82ee
    • K
      mempolicy: do not try to queue pages from !vma_migratable() · 77bf45e7
      Kirill A. Shutemov 提交于
      Maybe I miss some point, but I don't see a reason why we try to queue
      pages from non migratable VMAs.
      
      This testcase steps on VM_BUG_ON_PAGE() in isolate_lru_page():
      
          #include <fcntl.h>
          #include <unistd.h>
          #include <stdio.h>
          #include <sys/mman.h>
          #include <numaif.h>
      
          #define SIZE 0x2000
      
          int foo;
      
          int main()
          {
              int fd;
              char *p;
              unsigned long mask = 2;
      
              fd = open("/dev/sg0", O_RDWR);
              p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
              /* Faultin pages */
              foo = p[0] + p[0x1000];
              mbind(p, SIZE, MPOL_BIND, &mask, 4, MPOL_MF_MOVE | MPOL_MF_STRICT);
              return 0;
          }
      
      The only case when we can queue pages from such VMA is MPOL_MF_STRICT
      plus MPOL_MF_MOVE or MPOL_MF_MOVE_ALL for VMA which has pages on LRU,
      but gfp mask is not sutable for migaration (see mapping_gfp_mask() check
      in vma_migratable()).  That's looks like a bug to me.
      
      Let's filter out non-migratable vma at start of queue_pages_test_walk()
      and go to queue_pages_pte_range() only if MPOL_MF_MOVE or
      MPOL_MF_MOVE_ALL flag is set.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      77bf45e7
    • T
      mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress · 564e81a5
      Tetsuo Handa 提交于
      Jan Stancek has reported that system occasionally hanging after "oom01"
      testcase from LTP triggers OOM.  Guessing from a result that there is a
      kworker thread doing memory allocation and the values between "Node 0
      Normal free:" and "Node 0 Normal:" differs when hanging, vmstat is not
      up-to-date for some reason.
      
      According to commit 373ccbe5 ("mm, vmstat: allow WQ concurrency to
      discover memory reclaim doesn't make any progress"), it meant to force
      the kworker thread to take a short sleep, but it by error used
      schedule_timeout(1).  We missed that schedule_timeout() in state
      TASK_RUNNING doesn't do anything.
      
      Fix it by using schedule_timeout_uninterruptible(1) which forces the
      kworker thread to take a short sleep in order to make sure that vmstat
      is up-to-date.
      
      Fixes: 373ccbe5 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress")
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: NJan Stancek <jstancek@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Cristopher Lameter <clameter@sgi.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      564e81a5
    • M
      vmstat: make vmstat_update deferrable · ccde8bd4
      Michal Hocko 提交于
      Commit 0eb77e98 ("vmstat: make vmstat_updater deferrable again and
      shut down on idle") made vmstat_shepherd deferrable.  vmstat_update
      itself is still useing standard timer which might interrupt idle task.
      This is possible because "mm, vmstat: make quiet_vmstat lighter" removed
      cancel_delayed_work from the quiet_vmstat.
      
      Change vmstat_work to use DEFERRABLE_WORK to prevent from pointless
      wakeups from the idle context.
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ccde8bd4
    • M
      mm, vmstat: make quiet_vmstat lighter · f01f17d3
      Michal Hocko 提交于
      Mike has reported a considerable overhead of refresh_cpu_vm_stats from
      the idle entry during pipe test:
      
          12.89%  [kernel]       [k] refresh_cpu_vm_stats.isra.12
           4.75%  [kernel]       [k] __schedule
           4.70%  [kernel]       [k] mutex_unlock
           3.14%  [kernel]       [k] __switch_to
      
      This is caused by commit 0eb77e98 ("vmstat: make vmstat_updater
      deferrable again and shut down on idle") which has placed quiet_vmstat
      into cpu_idle_loop.  The main reason here seems to be that the idle
      entry has to get over all zones and perform atomic operations for each
      vmstat entry even though there might be no per cpu diffs.  This is a
      pointless overhead for _each_ idle entry.
      
      Make sure that quiet_vmstat is as light as possible.
      
      First of all it doesn't make any sense to do any local sync if the
      current cpu is already set in oncpu_stat_off because vmstat_update puts
      itself there only if there is nothing to do.
      
      Then we can check need_update which should be a cheap way to check for
      potential per-cpu diffs and only then do refresh_cpu_vm_stats.
      
      The original patch also did cancel_delayed_work which we are not doing
      here.  There are two reasons for that.  Firstly cancel_delayed_work from
      idle context will blow up on RT kernels (reported by Mike):
      
        CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rt3 #7
        Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013
        Call Trace:
          dump_stack+0x49/0x67
          ___might_sleep+0xf5/0x180
          rt_spin_lock+0x20/0x50
          try_to_grab_pending+0x69/0x240
          cancel_delayed_work+0x26/0xe0
          quiet_vmstat+0x75/0xa0
          cpu_idle_loop+0x38/0x3e0
          cpu_startup_entry+0x13/0x20
          start_secondary+0x114/0x140
      
      And secondly, even on !RT kernels it might add some non trivial overhead
      which is not necessary.  Even if the vmstat worker wakes up and preempts
      idle then it will be most likely a single shot noop because the stats
      were already synced and so it would end up on the oncpu_stat_off anyway.
      We just need to teach both vmstat_shepherd and vmstat_update to stop
      scheduling the worker if there is nothing to do.
      
      [mgalbraith@suse.de: cancel pending work of the cpu_stat_off CPU]
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NMike Galbraith <mgalbraith@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f01f17d3
    • V
      mm/Kconfig: correct description of DEFERRED_STRUCT_PAGE_INIT · 1ce22103
      Vlastimil Babka 提交于
      The description mentions kswapd threads, while the deferred struct page
      initialization is actually done by one-off "pgdatinitX" threads.
      
      Fix the description so that potentially users are not confused about
      pgdatinit threads using CPU after boot instead of kswapd.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ce22103
    • D
      memblock: don't mark memblock_phys_mem_size() as __init · 1f1ffb8a
      David Gibson 提交于
      At the moment memblock_phys_mem_size() is marked as __init, and so is
      discarded after boot.  This is different from most of the memblock
      functions which are marked __init_memblock, and are only discarded after
      boot if memory hotplug is not configured.
      
      To allow for upcoming code which will need memblock_phys_mem_size() in
      the hotplug path, change it from __init to __init_memblock.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f1ffb8a
    • E
      dump_stack: avoid potential deadlocks · d7ce3692
      Eric Dumazet 提交于
      Some servers experienced fatal deadlocks because of a combination of
      bugs, leading to multiple cpus calling dump_stack().
      
      The checksumming bug was fixed in commit 34ae6a1a ("ipv6: update
      skb->csum when CE mark is propagated").
      
      The second problem is a faulty locking in dump_stack()
      
      CPU1 runs in process context and calls dump_stack(), grabs dump_lock.
      
         CPU2 receives a TCP packet under softirq, grabs socket spinlock, and
         call dump_stack() from netdev_rx_csum_fault().
      
         dump_stack() spins on atomic_cmpxchg(&dump_lock, -1, 2), since
         dump_lock is owned by CPU1
      
      While dumping its stack, CPU1 is interrupted by a softirq, and happens
      to process a packet for the TCP socket locked by CPU2.
      
      CPU1 spins forever in spin_lock() : deadlock
      
      Stack trace on CPU1 looked like :
      
          NMI backtrace for cpu 1
          RIP: _raw_spin_lock+0x25/0x30
          ...
          Call Trace:
            <IRQ>
            tcp_v6_rcv+0x243/0x620
            ip6_input_finish+0x11f/0x330
            ip6_input+0x38/0x40
            ip6_rcv_finish+0x3c/0x90
            ipv6_rcv+0x2a9/0x500
            process_backlog+0x461/0xaa0
            net_rx_action+0x147/0x430
            __do_softirq+0x167/0x2d0
            call_softirq+0x1c/0x30
            do_softirq+0x3f/0x80
            irq_exit+0x6e/0xc0
            smp_call_function_single_interrupt+0x35/0x40
            call_function_single_interrupt+0x6a/0x70
            <EOI>
            printk+0x4d/0x4f
            printk_address+0x31/0x33
            print_trace_address+0x33/0x3c
            print_context_stack+0x7f/0x119
            dump_trace+0x26b/0x28e
            show_trace_log_lvl+0x4f/0x5c
            show_stack_log_lvl+0x104/0x113
            show_stack+0x42/0x44
            dump_stack+0x46/0x58
            netdev_rx_csum_fault+0x38/0x3c
            __skb_checksum_complete_head+0x6e/0x80
            __skb_checksum_complete+0x11/0x20
            tcp_rcv_established+0x2bd5/0x2fd0
            tcp_v6_do_rcv+0x13c/0x620
            sk_backlog_rcv+0x15/0x30
            release_sock+0xd2/0x150
            tcp_recvmsg+0x1c1/0xfc0
            inet_recvmsg+0x7d/0x90
            sock_recvmsg+0xaf/0xe0
            ___sys_recvmsg+0x111/0x3b0
            SyS_recvmsg+0x5c/0xb0
            system_call_fastpath+0x16/0x1b
      
      Fixes: b58d9774 ("dump_stack: serialize the output from dump_stack()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7ce3692
    • A
      mm: validate_mm browse_rb SMP race condition · acf128d0
      Andrea Arcangeli 提交于
      The mmap_sem for reading in validate_mm called from expand_stack is not
      enough to prevent the argumented rbtree rb_subtree_gap information to
      change from under us because expand_stack may be running from other
      threads concurrently which will hold the mmap_sem for reading too.
      
      The argumented rbtree is updated with vma_gap_update under the
      page_table_lock so use it in browse_rb() too to avoid false positives.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      acf128d0
    • S
      m32r: fix build failure due to SMP and MMU · af1ddcb5
      Sudip Mukherjee 提交于
      One of the randconfig build failed with the error:
      
        arch/m32r/kernel/smp.c: In function 'smp_flush_tlb_mm':
        arch/m32r/kernel/smp.c:283:20: error: subscripted value is neither array nor pointer nor vector
          mmc = &mm->context[cpu_id];
                            ^
        arch/m32r/kernel/smp.c: In function 'smp_flush_tlb_page':
        arch/m32r/kernel/smp.c:353:20: error: subscripted value is neither array nor pointer nor vector
          mmc = &mm->context[cpu_id];
                            ^
        arch/m32r/kernel/smp.c: In function 'smp_invalidate_interrupt':
        arch/m32r/kernel/smp.c:479:41: error: subscripted value is neither array nor pointer nor vector
          unsigned long *mmc = &flush_mm->context[cpu_id];
      
      It turned out that CONFIG_SMP was defined but CONFIG_MMU was not
      defined.  But arch/m32r/include/asm/mmu.h only defines mm_context_t as
      an array when both CONFIG_SMP and CONFIG_MMU are defined.  And
      arch/m32r/kernel/smp.c is always using context as an array.  So without
      MMU SMP can not work.
      Signed-off-by: NSudip Mukherjee <sudip@vectorindia.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af1ddcb5
    • R
      block: fix pfn_mkwrite() DAX fault handler · 9c5a05bc
      Ross Zwisler 提交于
      Previously the pfn_mkwrite() fault handler for raw block devices called
      bldev_dax_fault() -> __dax_fault() to do a full DAX page fault.
      
      Really what the pfn_mkwrite() fault handler needs to do is call
      dax_pfn_mkwrite() to make sure that the radix tree entry for the given
      PTE is marked as dirty so that a follow-up fsync or msync call will
      flush it durably to media.
      
      Fixes: 5a023cdb ("block: enable dax for raw block devices")
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c5a05bc