1. 13 5月, 2017 3 次提交
    • R
      dax: prevent invalidation of mapped DAX entries · 4636e70b
      Ross Zwisler 提交于
      Patch series "mm,dax: Fix data corruption due to mmap inconsistency",
      v4.
      
      This series fixes data corruption that can happen for DAX mounts when
      page faults race with write(2) and as a result page tables get out of
      sync with block mappings in the filesystem and thus data seen through
      mmap is different from data seen through read(2).
      
      The series passes testing with t_mmap_stale test program from Ross and
      also other mmap related tests on DAX filesystem.
      
      This patch (of 4):
      
      dax_invalidate_mapping_entry() currently removes DAX exceptional entries
      only if they are clean and unlocked.  This is done via:
      
        invalidate_mapping_pages()
          invalidate_exceptional_entry()
            dax_invalidate_mapping_entry()
      
      However, for page cache pages removed in invalidate_mapping_pages()
      there is an additional criteria which is that the page must not be
      mapped.  This is noted in the comments above invalidate_mapping_pages()
      and is checked in invalidate_inode_page().
      
      For DAX entries this means that we can can end up in a situation where a
      DAX exceptional entry, either a huge zero page or a regular DAX entry,
      could end up mapped but without an associated radix tree entry.  This is
      inconsistent with the rest of the DAX code and with what happens in the
      page cache case.
      
      We aren't able to unmap the DAX exceptional entry because according to
      its comments invalidate_mapping_pages() isn't allowed to block, and
      unmap_mapping_range() takes a write lock on the mapping->i_mmap_rwsem.
      
      Since we essentially never have unmapped DAX entries to evict from the
      radix tree, just remove dax_invalidate_mapping_entry().
      
      Fixes: c6dcf52c ("mm: Invalidate DAX radix tree entries only if appropriate")
      Link: http://lkml.kernel.org/r/20170510085419.27601-2-jack@suse.czSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reported-by: NJan Kara <jack@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>    [4.10+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4636e70b
    • M
      mm, vmalloc: fix vmalloc users tracking properly · 8594a21c
      Michal Hocko 提交于
      Commit 1f5307b1 ("mm, vmalloc: properly track vmalloc users") has
      pulled asm/pgtable.h include dependency to linux/vmalloc.h and that
      turned out to be a bad idea for some architectures.  E.g.  m68k fails
      with
      
         In file included from arch/m68k/include/asm/pgtable_mm.h:145:0,
                          from arch/m68k/include/asm/pgtable.h:4,
                          from include/linux/vmalloc.h:9,
                          from arch/m68k/kernel/module.c:9:
         arch/m68k/include/asm/mcf_pgtable.h: In function 'nocache_page':
      >> arch/m68k/include/asm/mcf_pgtable.h:339:43: error: 'init_mm' undeclared (first use in this function)
          #define pgd_offset_k(address) pgd_offset(&init_mm, address)
      
      as spotted by kernel build bot. nios2 fails for other reason
      
        In file included from include/asm-generic/io.h:767:0,
                         from arch/nios2/include/asm/io.h:61,
                         from include/linux/io.h:25,
                         from arch/nios2/include/asm/pgtable.h:18,
                         from include/linux/mm.h:70,
                         from include/linux/pid_namespace.h:6,
                         from include/linux/ptrace.h:9,
                         from arch/nios2/include/uapi/asm/elf.h:23,
                         from arch/nios2/include/asm/elf.h:22,
                         from include/linux/elf.h:4,
                         from include/linux/module.h:15,
                         from init/main.c:16:
        include/linux/vmalloc.h: In function '__vmalloc_node_flags':
        include/linux/vmalloc.h:99:40: error: 'PAGE_KERNEL' undeclared (first use in this function); did you mean 'GFP_KERNEL'?
      
      which is due to the newly added #include <asm/pgtable.h>, which on nios2
      includes <linux/io.h> and thus <asm/io.h> and <asm-generic/io.h> which
      again includes <linux/vmalloc.h>.
      
      Tweaking that around just turns out a bigger headache than necessary.
      This patch reverts 1f5307b1 and reimplements the original fix in a
      different way.  __vmalloc_node_flags can stay static inline which will
      cover vmalloc* functions.  We only have one external user
      (kvmalloc_node) and we can export __vmalloc_node_flags_caller and
      provide the caller directly.  This is much simpler and it doesn't really
      need any games with header files.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [mhocko@kernel.org: revert old comment]
        Link: http://lkml.kernel.org/r/20170509211054.GB16325@dhcp22.suse.cz
      Fixes: 1f5307b1 ("mm, vmalloc: properly track vmalloc users")
      Link: http://lkml.kernel.org/r/20170509153702.GR6481@dhcp22.suse.czSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: Tobias Klauser <tklauser@distanz.ch>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8594a21c
    • D
      time: delete current_fs_time() · 572e0ca9
      Deepa Dinamani 提交于
      All uses of the current_fs_time() function have been replaced by other
      time interfaces.
      
      And, its use cases can be fulfilled by current_time() or ktime_get_*
      variants.
      
      Link: http://lkml.kernel.org/r/1491613030-11599-13-git-send-email-deepa.kernel@gmail.comSigned-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      572e0ca9
  2. 11 5月, 2017 1 次提交
    • V
      libnvdimm: add an atomic vs process context flag to rw_bytes · 3ae3d67b
      Vishal Verma 提交于
      nsio_rw_bytes can clear media errors, but this cannot be done while we
      are in an atomic context due to locking within ACPI. From the BTT,
      ->rw_bytes may be called either from atomic or process context depending
      on whether the calls happen during initialization or during IO.
      
      During init, we want to ensure error clearing happens, and the flag
      marking process context allows nsio_rw_bytes to do that. When called
      during IO, we're in atomic context, and error clearing can be skipped.
      
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      3ae3d67b
  3. 09 5月, 2017 26 次提交
  4. 08 5月, 2017 4 次提交
  5. 06 5月, 2017 2 次提交
    • R
      ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle · eed4d47e
      Rafael J. Wysocki 提交于
      The ACPI SCI (System Control Interrupt) is set up as a wakeup IRQ
      during suspend-to-idle transitions and, consequently, any events
      signaled through it wake up the system from that state.  However,
      on some systems some of the events signaled via the ACPI SCI while
      suspended to idle should not cause the system to wake up.  In fact,
      quite often they should just be discarded.
      
      Arguably, systems should not resume entirely on such events, but in
      order to decide which events really should cause the system to resume
      and which are spurious, it is necessary to resume up to the point
      when ACPI SCIs are actually handled and processed, which is after
      executing dpm_resume_noirq() in the system resume path.
      
      For this reasons, add a loop around freeze_enter() in which the
      platforms can process events signaled via multiplexed IRQ lines
      like the ACPI SCI and add suspend-to-idle hooks that can be
      used for this purpose to struct platform_freeze_ops.
      
      In the ACPI case, the ->wake hook is used for checking if the SCI
      has triggered while suspended and deferring the interrupt-induced
      system wakeup until the events signaled through it are actually
      processed sufficiently to decide whether or not the system should
      resume.  In turn, the ->sync hook allows all of the relevant event
      queues to be flushed so as to prevent events from being missed due
      to race conditions.
      
      In addition to that, some ACPI code processing wakeup events needs
      to be modified to use the "hard" version of wakeup triggers, so that
      it will cause a system resume to happen on device-induced wakeup
      events even if the "soft" mechanism to prevent the system from
      suspending is not enabled (that also helps to catch device-induced
      wakeup events occurring during suspend transitions in progress).
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      eed4d47e
    • R
      PM / wakeup: Integrate mechanism to abort transitions in progress · 8a537ece
      Rafael J. Wysocki 提交于
      The system wakeup framework is not very consistent with respect to
      the way it handles suspend-to-idle and generally wakeup events
      occurring during transitions to system low-power states.
      
      First off, system transitions in progress are aborted by the event
      reporting helpers like pm_wakeup_event() only if the wakeup_count
      sysfs attribute is in use (as documented), but there are cases in
      which system-wide transitions should be aborted even if that is
      not the case.  For example, a wakeup signal from a designated
      wakeup device during system-wide PM transition, it should cause
      the transition to be aborted right away.
      
      Moreover, there is a freeze_wake() call in wakeup_source_activate(),
      but that really is only effective after suspend_freeze_state has
      been set to FREEZE_STATE_ENTER by freeze_enter().  However, it
      is very unlikely that wakeup_source_activate() will ever be called
      at that time, as it could only be triggered by a IRQF_NO_SUSPEND
      interrupt handler, so wakeups from suspend-to-idle don't really
      occur in wakeup_source_activate().
      
      At the same time there is a way to abort a system suspend in
      progress (or wake up the system from suspend-to-idle), which is by
      calling pm_system_wakeup(), but in turn that doesn't cause any
      wakeup source objects to be activated, so it will not be covered
      by wakeup source statistics and will not prevent the system from
      suspending again immediately (in case autosleep is used, for
      example).  Consequently, if anyone wants to abort system transitions
      in progress and allow the wakeup_count mechanism to work, they need
      to use both pm_system_wakeup() and pm_wakeup_event(), say, at the
      same time which is awkward.
      
      For the above reasons, make it possible to trigger
      pm_system_wakeup() from within wakeup_source_activate() and
      provide a new pm_wakeup_hard_event() helper to do so within the
      wakeup framework.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8a537ece
  6. 05 5月, 2017 4 次提交