1. 19 2月, 2016 1 次提交
    • D
      mm/core: Do not enforce PKEY permissions on remote mm access · 1b2ee126
      Dave Hansen 提交于
      We try to enforce protection keys in software the same way that we
      do in hardware.  (See long example below).
      
      But, we only want to do this when accessing our *own* process's
      memory.  If GDB set PKRU[6].AD=1 (disable access to PKEY 6), then
      tried to PTRACE_POKE a target process which just happened to have
      some mprotect_pkey(pkey=6) memory, we do *not* want to deny the
      debugger access to that memory.  PKRU is fundamentally a
      thread-local structure and we do not want to enforce it on access
      to _another_ thread's data.
      
      This gets especially tricky when we have workqueues or other
      delayed-work mechanisms that might run in a random process's context.
      We can check that we only enforce pkeys when operating on our *own* mm,
      but delayed work gets performed when a random user context is active.
      We might end up with a situation where a delayed-work gup fails when
      running randomly under its "own" task but succeeds when running under
      another process.  We want to avoid that.
      
      To avoid that, we use the new GUP flag: FOLL_REMOTE and add a
      fault flag: FAULT_FLAG_REMOTE.  They indicate that we are
      walking an mm which is not guranteed to be the same as
      current->mm and should not be subject to protection key
      enforcement.
      
      Thanks to Jerome Glisse for pointing out this scenario.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boaz Harrosh <boaz@plexistor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Dominik Vogt <vogt@linux.vnet.ibm.com>
      Cc: Eric B Munson <emunson@akamai.com>
      Cc: Geliang Tang <geliangtang@163.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jason Low <jason.low2@hp.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Shachar Raindel <raindel@mellanox.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: iommu@lists.linux-foundation.org
      Cc: linux-arch@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-s390@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      1b2ee126
  2. 18 2月, 2016 2 次提交
    • D
      x86/mm/pkeys: Add arch-specific VMA protection bits · 8f62c883
      Dave Hansen 提交于
      Lots of things seem to do:
      
              vma->vm_page_prot = vm_get_page_prot(flags);
      
      and the ptes get created right from things we pull out
      of ->vm_page_prot.  So it is very convenient if we can
      store the protection key in flags and vm_page_prot, just
      like the existing permission bits (_PAGE_RW/PRESENT).  It
      greatly reduces the amount of plumbing and arch-specific
      hacking we have to do in generic code.
      
      This also takes the new PROT_PKEY{0,1,2,3} flags and
      turns *those* in to VM_ flags for vma->vm_flags.
      
      The protection key values are stored in 4 places:
      	1. "prot" argument to system calls
      	2. vma->vm_flags, filled from the mmap "prot"
      	3. vma->vm_page prot, filled from vma->vm_flags
      	4. the PTE itself.
      
      The pseudocode for these for steps are as follows:
      
      	mmap(PROT_PKEY*)
      	vma->vm_flags 	  = ... | arch_calc_vm_prot_bits(mmap_prot);
      	vma->vm_page_prot = ... | arch_vm_get_page_prot(vma->vm_flags);
      	pte = pfn | vma->vm_page_prot
      
      Note that this provides a new definitions for x86:
      
      	arch_vm_get_page_prot()
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210210.FE483A42@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8f62c883
    • D
      mm/core, x86/mm/pkeys: Store protection bits in high VMA flags · 63c17fb8
      Dave Hansen 提交于
      vma->vm_flags is an 'unsigned long', so has space for 32 flags
      on 32-bit architectures.  The high 32 bits are unused on 64-bit
      platforms.  We've steered away from using the unused high VMA
      bits for things because we would have difficulty supporting it
      on 32-bit.
      
      Protection Keys are not available in 32-bit mode, so there is
      no concern about supporting this feature in 32-bit mode or on
      32-bit CPUs.
      
      This patch carves out 4 bits from the high half of
      vma->vm_flags and allows architectures to set config option
      to make them available.
      
      Sparse complains about these constants unless we explicitly
      call them "UL".
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Valentin Rothberg <valentinrothberg@gmail.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210208.81AF00D5@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      63c17fb8
  3. 16 2月, 2016 2 次提交
    • D
      mm/gup: Overload get_user_pages() functions · cde70140
      Dave Hansen 提交于
      The concept here was a suggestion from Ingo.  The implementation
      horrors are all mine.
      
      This allows get_user_pages(), get_user_pages_unlocked(), and
      get_user_pages_locked() to be called with or without the
      leading tsk/mm arguments.  We will give a compile-time warning
      about the old style being __deprecated and we will also
      WARN_ON() if the non-remote version is used for a remote-style
      access.
      
      Doing this, folks will get nice warnings and will not break the
      build.  This should be nice for -next and will hopefully let
      developers fix up their own code instead of maintainers needing
      to do it at merge time.
      
      The way we do this is hideous.  It uses the __VA_ARGS__ macro
      functionality to call different functions based on the number
      of arguments passed to the macro.
      
      There's an additional hack to ensure that our EXPORT_SYMBOL()
      of the deprecated symbols doesn't trigger a warning.
      
      We should be able to remove this mess as soon as -rc1 hits in
      the release after this is merged.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Geliang Tang <geliangtang@163.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Leon Romanovsky <leon@leon.nu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Xie XiuQi <xiexiuqi@huawei.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210155.73222EE1@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cde70140
    • D
      mm/gup: Introduce get_user_pages_remote() · 1e987790
      Dave Hansen 提交于
      For protection keys, we need to understand whether protections
      should be enforced in software or not.  In general, we enforce
      protections when working on our own task, but not when on others.
      We call these "current" and "remote" operations.
      
      This patch introduces a new get_user_pages() variant:
      
              get_user_pages_remote()
      
      Which is a replacement for when get_user_pages() is called on
      non-current tsk/mm.
      
      We also introduce a new gup flag: FOLL_REMOTE which can be used
      for the "__" gup variants to get this new behavior.
      
      The uprobes is_trap_at_addr() location holds mmap_sem and
      calls get_user_pages(current->mm) on an instruction address.  This
      makes it a pretty unique gup caller.  Being an instruction access
      and also really originating from the kernel (vs. the app), I opted
      to consider this a 'remote' access where protection keys will not
      be enforced.
      
      Without protection keys, this patch should not change any behavior.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: jack@suse.cz
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210154.3F0E51EA@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1e987790
  4. 12 2月, 2016 2 次提交
  5. 09 2月, 2016 1 次提交
  6. 07 2月, 2016 1 次提交
    • H
      pty: make sure super_block is still valid in final /dev/tty close · 1f55c718
      Herton R. Krzesinski 提交于
      Considering current pty code and multiple devpts instances, it's possible
      to umount a devpts file system while a program still has /dev/tty opened
      pointing to a previosuly closed pty pair in that instance. In the case all
      ptmx and pts/N files are closed, umount can be done. If the program closes
      /dev/tty after umount is done, devpts_kill_index will use now an invalid
      super_block, which was already destroyed in the umount operation after
      running ->kill_sb. This is another "use after free" type of issue, but now
      related to the allocated super_block instance.
      
      To avoid the problem (warning at ida_remove and potential crashes) for
      this specific case, I added two functions in devpts which grabs additional
      references to the super_block, which pty code now uses so it makes sure
      the super block structure is still valid until pty shutdown is done.
      I also moved the additional inode references to the same functions, which
      also covered similar case with inode being freed before /dev/tty final
      close/shutdown.
      Signed-off-by: NHerton R. Krzesinski <herton@redhat.com>
      Cc: stable@vger.kernel.org # 2.6.29+
      Reviewed-by: NPeter Hurley <peter@hurleysoftware.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f55c718
  7. 06 2月, 2016 3 次提交
  8. 05 2月, 2016 4 次提交
  9. 04 2月, 2016 5 次提交
  10. 03 2月, 2016 1 次提交
    • R
      modules: fix longstanding /proc/kallsyms vs module insertion race. · 8244062e
      Rusty Russell 提交于
      For CONFIG_KALLSYMS, we keep two symbol tables and two string tables.
      There's one full copy, marked SHF_ALLOC and laid out at the end of the
      module's init section.  There's also a cut-down version that only
      contains core symbols and strings, and lives in the module's core
      section.
      
      After module init (and before we free the module memory), we switch
      the mod->symtab, mod->num_symtab and mod->strtab to point to the core
      versions.  We do this under the module_mutex.
      
      However, kallsyms doesn't take the module_mutex: it uses
      preempt_disable() and rcu tricks to walk through the modules, because
      it's used in the oops path.  It's also used in /proc/kallsyms.
      There's nothing atomic about the change of these variables, so we can
      get the old (larger!) num_symtab and the new symtab pointer; in fact
      this is what I saw when trying to reproduce.
      
      By grouping these variables together, we can use a
      carefully-dereferenced pointer to ensure we always get one or the
      other (the free of the module init section is already done in an RCU
      callback, so that's safe).  We allocate the init one at the end of the
      module init section, and keep the core one inside the struct module
      itself (it could also have been allocated at the end of the module
      core, but that's probably overkill).
      Reported-by: NWeilong Chen <chenweilong@huawei.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111541
      Cc: stable@kernel.org
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      8244062e
  11. 01 2月, 2016 1 次提交
  12. 31 1月, 2016 3 次提交
    • D
      block: use DAX for partition table reads · d1a5f2b4
      Dan Williams 提交于
      Avoid populating pagecache when the block device is in DAX mode.
      Otherwise these page cache entries collide with the fsync/msync
      implementation and break data durability guarantees.
      
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reported-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      d1a5f2b4
    • D
      block: revert runtime dax control of the raw block device · 9f4736fe
      Dan Williams 提交于
      Dynamically enabling DAX requires that the page cache first be flushed
      and invalidated.  This must occur atomically with the change of DAX mode
      otherwise we confuse the fsync/msync tracking and violate data
      durability guarantees.  Eliminate the possibilty of DAX-disabled to
      DAX-enabled transitions for now and revisit this for the next cycle.
      
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      9f4736fe
    • D
      fs, block: force direct-I/O for dax-enabled block devices · 65f87ee7
      Dan Williams 提交于
      Similar to the file I/O path, re-direct all I/O to the DAX path for I/O
      to a block-device special file.  Both regular files and device special
      files can use the common filp->f_mapping->host lookup to determing is
      DAX is enabled.
      
      Otherwise, we confuse the DAX code that does not expect to find live
      data in the page cache:
      
          ------------[ cut here ]------------
          WARNING: CPU: 0 PID: 7676 at mm/filemap.c:217
          __delete_from_page_cache+0x9f6/0xb60()
          Modules linked in:
          CPU: 0 PID: 7676 Comm: a.out Not tainted 4.4.0+ #276
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
           00000000ffffffff ffff88006d3f7738 ffffffff82999e2d 0000000000000000
           ffff8800620a0000 ffffffff86473d20 ffff88006d3f7778 ffffffff81352089
           ffffffff81658d36 ffffffff86473d20 00000000000000d9 ffffea0000009d60
          Call Trace:
           [<     inline     >] __dump_stack lib/dump_stack.c:15
           [<ffffffff82999e2d>] dump_stack+0x6f/0xa2 lib/dump_stack.c:50
           [<ffffffff81352089>] warn_slowpath_common+0xd9/0x140 kernel/panic.c:482
           [<ffffffff813522b9>] warn_slowpath_null+0x29/0x30 kernel/panic.c:515
           [<ffffffff81658d36>] __delete_from_page_cache+0x9f6/0xb60 mm/filemap.c:217
           [<ffffffff81658fb2>] delete_from_page_cache+0x112/0x200 mm/filemap.c:244
           [<ffffffff818af369>] __dax_fault+0x859/0x1800 fs/dax.c:487
           [<ffffffff8186f4f6>] blkdev_dax_fault+0x26/0x30 fs/block_dev.c:1730
           [<     inline     >] wp_pfn_shared mm/memory.c:2208
           [<ffffffff816e9145>] do_wp_page+0xc85/0x14f0 mm/memory.c:2307
           [<     inline     >] handle_pte_fault mm/memory.c:3323
           [<     inline     >] __handle_mm_fault mm/memory.c:3417
           [<ffffffff816ecec3>] handle_mm_fault+0x2483/0x4640 mm/memory.c:3446
           [<ffffffff8127eff6>] __do_page_fault+0x376/0x960 arch/x86/mm/fault.c:1238
           [<ffffffff8127f738>] trace_do_page_fault+0xe8/0x420 arch/x86/mm/fault.c:1331
           [<ffffffff812705c4>] do_async_page_fault+0x14/0xd0 arch/x86/kernel/kvm.c:264
           [<ffffffff86338f78>] async_page_fault+0x28/0x30 arch/x86/entry/entry_64.S:986
           [<ffffffff86336c36>] entry_SYSCALL_64_fastpath+0x16/0x7a
          arch/x86/entry/entry_64.S:185
          ---[ end trace dae21e0f85f1f98c ]---
      
      Fixes: 5a023cdb ("block: enable dax for raw block devices")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: NKirill A. Shutemov <kirill@shutemov.name>
      Suggested-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Suggested-by: NMatthew Wilcox <willy@linux.intel.com>
      Tested-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      65f87ee7
  13. 30 1月, 2016 1 次提交
    • T
      workqueue: skip flush dependency checks for legacy workqueues · 23d11a58
      Tejun Heo 提交于
      fca839c0 ("workqueue: warn if memory reclaim tries to flush
      !WQ_MEM_RECLAIM workqueue") implemented flush dependency warning which
      triggers if a PF_MEMALLOC task or WQ_MEM_RECLAIM workqueue tries to
      flush a !WQ_MEM_RECLAIM workquee.
      
      This assumes that workqueues marked with WQ_MEM_RECLAIM sit in memory
      reclaim path and making it depend on something which may need more
      memory to make forward progress can lead to deadlocks.  Unfortunately,
      workqueues created with the legacy create*_workqueue() interface
      always have WQ_MEM_RECLAIM regardless of whether they are depended
      upon memory reclaim or not.  These spurious WQ_MEM_RECLAIM markings
      cause spurious triggering of the flush dependency checks.
      
        WARNING: CPU: 0 PID: 6 at kernel/workqueue.c:2361 check_flush_dependency+0x138/0x144()
        workqueue: WQ_MEM_RECLAIM deferwq:deferred_probe_work_func is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu
        ...
        Workqueue: deferwq deferred_probe_work_func
        [<c0017acc>] (unwind_backtrace) from [<c0013134>] (show_stack+0x10/0x14)
        [<c0013134>] (show_stack) from [<c0245f18>] (dump_stack+0x94/0xd4)
        [<c0245f18>] (dump_stack) from [<c0026f9c>] (warn_slowpath_common+0x80/0xb0)
        [<c0026f9c>] (warn_slowpath_common) from [<c0026ffc>] (warn_slowpath_fmt+0x30/0x40)
        [<c0026ffc>] (warn_slowpath_fmt) from [<c00390b8>] (check_flush_dependency+0x138/0x144)
        [<c00390b8>] (check_flush_dependency) from [<c0039ca0>] (flush_work+0x50/0x15c)
        [<c0039ca0>] (flush_work) from [<c00c51b0>] (lru_add_drain_all+0x130/0x180)
        [<c00c51b0>] (lru_add_drain_all) from [<c00f728c>] (migrate_prep+0x8/0x10)
        [<c00f728c>] (migrate_prep) from [<c00bfbc4>] (alloc_contig_range+0xd8/0x338)
        [<c00bfbc4>] (alloc_contig_range) from [<c00f8f18>] (cma_alloc+0xe0/0x1ac)
        [<c00f8f18>] (cma_alloc) from [<c001cac4>] (__alloc_from_contiguous+0x38/0xd8)
        [<c001cac4>] (__alloc_from_contiguous) from [<c001ceb4>] (__dma_alloc+0x240/0x278)
        [<c001ceb4>] (__dma_alloc) from [<c001cf78>] (arm_dma_alloc+0x54/0x5c)
        [<c001cf78>] (arm_dma_alloc) from [<c0355ea4>] (dmam_alloc_coherent+0xc0/0xec)
        [<c0355ea4>] (dmam_alloc_coherent) from [<c039cc4c>] (ahci_port_start+0x150/0x1dc)
        [<c039cc4c>] (ahci_port_start) from [<c0384734>] (ata_host_start.part.3+0xc8/0x1c8)
        [<c0384734>] (ata_host_start.part.3) from [<c03898dc>] (ata_host_activate+0x50/0x148)
        [<c03898dc>] (ata_host_activate) from [<c039d558>] (ahci_host_activate+0x44/0x114)
        [<c039d558>] (ahci_host_activate) from [<c039f05c>] (ahci_platform_init_host+0x1d8/0x3c8)
        [<c039f05c>] (ahci_platform_init_host) from [<c039e6bc>] (tegra_ahci_probe+0x448/0x4e8)
        [<c039e6bc>] (tegra_ahci_probe) from [<c0347058>] (platform_drv_probe+0x50/0xac)
        [<c0347058>] (platform_drv_probe) from [<c03458cc>] (driver_probe_device+0x214/0x2c0)
        [<c03458cc>] (driver_probe_device) from [<c0343cc0>] (bus_for_each_drv+0x60/0x94)
        [<c0343cc0>] (bus_for_each_drv) from [<c03455d8>] (__device_attach+0xb0/0x114)
        [<c03455d8>] (__device_attach) from [<c0344ab8>] (bus_probe_device+0x84/0x8c)
        [<c0344ab8>] (bus_probe_device) from [<c0344f48>] (deferred_probe_work_func+0x68/0x98)
        [<c0344f48>] (deferred_probe_work_func) from [<c003b738>] (process_one_work+0x120/0x3f8)
        [<c003b738>] (process_one_work) from [<c003ba48>] (worker_thread+0x38/0x55c)
        [<c003ba48>] (worker_thread) from [<c0040f14>] (kthread+0xdc/0xf4)
        [<c0040f14>] (kthread) from [<c000f778>] (ret_from_fork+0x14/0x3c)
      
      Fix it by marking workqueues created via create*_workqueue() with
      __WQ_LEGACY and disabling flush dependency checks on them.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NThierry Reding <thierry.reding@gmail.com>
      Link: http://lkml.kernel.org/g/20160126173843.GA11115@ulmo.nvidia.com
      Fixes: fca839c0 ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue")
      23d11a58
  14. 29 1月, 2016 3 次提交
  15. 27 1月, 2016 3 次提交
  16. 26 1月, 2016 2 次提交
    • M
      irqdomain: Allow domain lookup with DOMAIN_BUS_WIRED token · 530cbe10
      Marc Zyngier 提交于
      Let's take the (outlandish) example of an interrupt controller
      capable of handling both wired interrupts and PCI MSIs.
      
      With the current code, the PCI MSI domain is going to be tagged
      with DOMAIN_BUS_PCI_MSI, and the wired domain with DOMAIN_BUS_ANY.
      
      Things get hairy when we start looking up the domain for a wired
      interrupt (typically when creating it based on some firmware
      information - DT or ACPI).
      
      In irq_create_fwspec_mapping(), we perform the lookup using
      DOMAIN_BUS_ANY, which is actually used as a wildcard. This gives
      us one chance out of two to end up with the wrong domain, and
      we try to configure a wired interrupt with the MSI domain.
      Everything grinds to a halt pretty quickly.
      
      What we really need to do is to start looking for a domain that
      would uniquely identify a wired interrupt domain, and only use
      DOMAIN_BUS_ANY as a fallback.
      
      In order to solve this, let's introduce a new DOMAIN_BUS_WIRED
      token, which is going to be used exactly as described above.
      Of course, this depends on the irqchip to setup the domain
      bus_token, and nobody had to implement this so far.
      
      Only so far.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Link: http://lkml.kernel.org/r/1453816347-32720-2-git-send-email-marc.zyngier@arm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      530cbe10
    • D
      drivers: ata: wake port before DMA stop for ALPM · fb329633
      Danesh Petigara 提交于
      The AHCI driver code stops and starts port DMA engines at will
      without considering the power state of the particular port. The
      AHCI specification isn't very clear on how to handle this scenario,
      leaving implementation open to interpretation.
      
      Broadcom's STB SATA host controller is unable to handle port DMA
      controller restarts when the port in question is in low power mode.
      When a port enters partial or slumber mode, its PHY is powered down.
      When a controller restart is requested, the controller's internal
      state machine expects the PHY to be brought back up by software which
      never happens in this case, resulting in failures.
      
      To avoid this situation, logic is added to manually wake up the port
      just before its DMA engine is stopped, if the port happens to be in
      a low power state. HBA initiated power management ensures that the port
      eventually returns to its configured low power state, when the link is
      idle (as per the conditions listed in the spec). A new host flag is also
      added to ensure this logic is only exercised for hosts with the above
      limitation.
      
      tj: Formatting changes.
      Signed-off-by: NDanesh Petigara <dpetigara@broadcom.com>
      Reviewed-by: NMarkus Mayer <mmayer@broadcom.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      fb329633
  17. 25 1月, 2016 2 次提交
    • M
      of: drop symbols declared by _OF_DECLARE() from modules · 71f50c6d
      Masahiro Yamada 提交于
      The users of this macro (OF_EARLYCON_DECLARE, CLK_OF_DECLARE,
      IRQCHIP_DECLARE, etc.) are only parsed in the early boot stage.
      Such symbols contained in modules are never used.
      
      This commit fixes the link error introduced by commit b8d20e06
      ("serial: 8250_uniphier: add earlycon support"); the combination
      of CONFIG_SERIAL_8250_UNIPHIER=m and CONFIG_SERIAL_8250_CONSOLE=y
      fails to link:
      
      ERROR: "early_serial8250_setup" [drivers/tty/serial/8250/8250_uniphier.ko] undefined!
      
      Fixes: b8d20e06 ("serial: 8250_uniphier: add earlycon support")
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRob Herring <robh@kernel.org>
      71f50c6d
    • A
      net: simplify napi_synchronize() to avoid warnings · facc432f
      Arnd Bergmann 提交于
      The napi_synchronize() function is defined twice: The definition
      for SMP builds waits for other CPUs to be done, while the uniprocessor
      variant just contains a barrier and ignores its argument.
      
      In the mvneta driver, this leads to a warning about an unused variable
      when we lookup the NAPI struct of another CPU and then don't use it:
      
      ethernet/marvell/mvneta.c: In function 'mvneta_percpu_notifier':
      ethernet/marvell/mvneta.c:2910:30: error: unused variable 'other_port' [-Werror=unused-variable]
      
      There are no other CPUs on a UP build, so that code never runs, but
      gcc does not know this.
      
      The nicest solution seems to be to turn the napi_synchronize() helper
      into an inline function for the UP case as well, as that leads gcc to
      not complain about the argument being unused. Once we do that, we can
      also combine the two cases into a single function definition and use
      if(IS_ENABLED()) rather than #ifdef to make it look a bit nicer.
      
      The warning first came up in linux-4.4, but I failed to catch it
      earlier.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Fixes: f8642885 ("net: mvneta: Statically assign queues to CPUs")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      facc432f
  18. 24 1月, 2016 3 次提交