1. 04 11月, 2018 16 次提交
    • L
      Merge branch 'akpm' (patches from Andrew) · cddfa11a
      Linus Torvalds 提交于
      Merge more updates from Andrew Morton:
      
       - more ocfs2 work
      
       - various leftovers
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        memory_hotplug: cond_resched in __remove_pages
        bfs: add sanity check at bfs_fill_super()
        kernel/sysctl.c: remove duplicated include
        kernel/kexec_file.c: remove some duplicated includes
        mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask
        ocfs2: fix clusters leak in ocfs2_defrag_extent()
        ocfs2: dlmglue: clean up timestamp handling
        ocfs2: don't put and assigning null to bh allocated outside
        ocfs2: fix a misuse a of brelse after failing ocfs2_check_dir_entry
        ocfs2: don't use iocb when EIOCBQUEUED returns
        ocfs2: without quota support, avoid calling quota recovery
        ocfs2: remove ocfs2_is_o2cb_active()
        mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings
        include/linux/notifier.h: SRCU: fix ctags
        mm: handle no memcg case in memcg_kmem_charge() properly
      cddfa11a
    • M
      memory_hotplug: cond_resched in __remove_pages · dd33ad7b
      Michal Hocko 提交于
      We have received a bug report that unbinding a large pmem (>1TB) can
      result in a soft lockup:
      
        NMI watchdog: BUG: soft lockup - CPU#9 stuck for 23s! [ndctl:4365]
        [...]
        Supported: Yes
        CPU: 9 PID: 4365 Comm: ndctl Not tainted 4.12.14-94.40-default #1 SLE12-SP4
        Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.01.00.0833.051120182255 05/11/2018
        task: ffff9cce7d4410c0 task.stack: ffffbe9eb1bc4000
        RIP: 0010:__put_page+0x62/0x80
        Call Trace:
         devm_memremap_pages_release+0x152/0x260
         release_nodes+0x18d/0x1d0
         device_release_driver_internal+0x160/0x210
         unbind_store+0xb3/0xe0
         kernfs_fop_write+0x102/0x180
         __vfs_write+0x26/0x150
         vfs_write+0xad/0x1a0
         SyS_write+0x42/0x90
         do_syscall_64+0x74/0x150
         entry_SYSCALL_64_after_hwframe+0x3d/0xa2
        RIP: 0033:0x7fd13166b3d0
      
      It has been reported on an older (4.12) kernel but the current upstream
      code doesn't cond_resched in the hot remove code at all and the given
      range to remove might be really large.  Fix the issue by calling
      cond_resched once per memory section.
      
      Link: http://lkml.kernel.org/r/20181031125840.23982-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Cc: Dan Williams <dan.j.williams@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd33ad7b
    • T
      bfs: add sanity check at bfs_fill_super() · 9f2df09a
      Tetsuo Handa 提交于
      syzbot is reporting too large memory allocation at bfs_fill_super() [1].
      Since file system image is corrupted such that bfs_sb->s_start == 0,
      bfs_fill_super() is trying to allocate 8MB of continuous memory. Fix
      this by adding a sanity check on bfs_sb->s_start, __GFP_NOWARN and
      printf().
      
      [1] https://syzkaller.appspot.com/bug?id=16a87c236b951351374a84c8a32f40edbc034e96
      
      Link: http://lkml.kernel.org/r/1525862104-3407-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jpSigned-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: Nsyzbot <syzbot+71c6b5d68e91149fc8a4@syzkaller.appspotmail.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Tigran Aivazian <aivazian.tigran@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f2df09a
    • M
      6f0483d1
    • Z
      kernel/kexec_file.c: remove some duplicated includes · 3383b360
      zhong jiang 提交于
      We include kexec.h and slab.h twice in kexec_file.c. It's unnecessary.
      hence just remove them.
      
      Link: http://lkml.kernel.org/r/1537498098-19171-1-git-send-email-zhongjiang@huawei.comSigned-off-by: Nzhong jiang <zhongjiang@huawei.com>
      Reviewed-by: NBhupesh Sharma <bhsharma@redhat.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3383b360
    • M
      mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask · 89c83fb5
      Michal Hocko 提交于
      THP allocation mode is quite complex and it depends on the defrag mode.
      This complexity is hidden in alloc_hugepage_direct_gfpmask from a large
      part currently. The NUMA special casing (namely __GFP_THISNODE) is
      however independent and placed in alloc_pages_vma currently. This both
      adds an unnecessary branch to all vma based page allocation requests and
      it makes the code more complex unnecessarily as well. Not to mention
      that e.g. shmem THP used to do the node reclaiming unconditionally
      regardless of the defrag mode until recently. This was not only
      unexpected behavior but it was also hardly a good default behavior and I
      strongly suspect it was just a side effect of the code sharing more than
      a deliberate decision which suggests that such a layering is wrong.
      
      Get rid of the thp special casing from alloc_pages_vma and move the
      logic to alloc_hugepage_direct_gfpmask. __GFP_THISNODE is applied to the
      resulting gfp mask only when the direct reclaim is not requested and
      when there is no explicit numa binding to preserve the current logic.
      
      Please note that there's also a slight difference wrt MPOL_BIND now. The
      previous code would avoid using __GFP_THISNODE if the local node was
      outside of policy_nodemask(). After this patch __GFP_THISNODE is avoided
      for all MPOL_BIND policies. So there's a difference that if local node
      is actually allowed by the bind policy's nodemask, previously
      __GFP_THISNODE would be added, but now it won't be. From the behavior
      POV this is still correct because the policy nodemask is used.
      
      Link: http://lkml.kernel.org/r/20180925120326.24392-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89c83fb5
    • L
      ocfs2: fix clusters leak in ocfs2_defrag_extent() · 6194ae42
      Larry Chen 提交于
      ocfs2_defrag_extent() might leak allocated clusters.  When the file
      system has insufficient space, the number of claimed clusters might be
      less than the caller wants.  If that happens, the original code might
      directly commit the transaction without returning clusters.
      
      This patch is based on code in ocfs2_add_clusters_in_btree().
      
      [akpm@linux-foundation.org: include localalloc.h, reduce scope of data_ac]
      Link: http://lkml.kernel.org/r/20180904041621.16874-3-lchen@suse.comSigned-off-by: NLarry Chen <lchen@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Changwei Ge <ge.changwei@h3c.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6194ae42
    • A
      ocfs2: dlmglue: clean up timestamp handling · 3a3d1e51
      Arnd Bergmann 提交于
      The handling of timestamps outside of the 1970..2038 range in the dlm
      glue is rather inconsistent: on 32-bit architectures, this has always
      wrapped around to negative timestamps in the 1902..1969 range, while on
      64-bit kernels all timestamps are interpreted as positive 34 bit numbers
      in the 1970..2514 year range.
      
      Now that the VFS code handles 64-bit timestamps on all architectures, we
      can make the behavior more consistent here, and return the same result
      that we had on 64-bit already, making the file system y2038 safe in the
      process.  Outside of dlmglue, it already uses 64-bit on-disk timestamps
      anway, so that part is fine.
      
      For consistency, I'm changing ocfs2_pack_timespec() to clamp anything
      outside of the supported range to the minimum and maximum values.  This
      avoids a possible ambiguity of values before 1970 in particular, which
      used to be interpreted as times at the end of the 2514 range previously.
      
      Link: http://lkml.kernel.org/r/20180619155826.4106487-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Changwei Ge <ge.changwei@h3c.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3a3d1e51
    • C
      ocfs2: don't put and assigning null to bh allocated outside · cf76c785
      Changwei Ge 提交于
      ocfs2_read_blocks() and ocfs2_read_blocks_sync() are both used to read
      several blocks from disk.  Currently, the input argument *bhs* can be
      NULL or NOT.  It depends on the caller's behavior.  If the function
      fails in reading blocks from disk, the corresponding bh will be assigned
      to NULL and put.
      
      Obviously, above process for non-NULL input bh is not appropriate.
      Because the caller doesn't even know its bhs are put and re-assigned.
      
      If buffer head is managed by caller, ocfs2_read_blocks and
      ocfs2_read_blocks_sync() should not evaluate it to NULL.  It will cause
      caller accessing illegal memory, thus crash.
      
      Link: http://lkml.kernel.org/r/HK2PR06MB045285E0F4FBB561F9F2F9B3D5680@HK2PR06MB0452.apcprd06.prod.outlook.comSigned-off-by: NChangwei Ge <ge.changwei@h3c.com>
      Reviewed-by: NGuozhonghua <guozhonghua@h3c.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Changwei Ge <ge.changwei@h3c.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf76c785
    • C
      ocfs2: fix a misuse a of brelse after failing ocfs2_check_dir_entry · 29aa3016
      Changwei Ge 提交于
      Somehow, file system metadata was corrupted, which causes
      ocfs2_check_dir_entry() to fail in function ocfs2_dir_foreach_blk_el().
      
      According to the original design intention, if above happens we should
      skip the problematic block and continue to retrieve dir entry.  But
      there is obviouse misuse of brelse around related code.
      
      After failure of ocfs2_check_dir_entry(), current code just moves to
      next position and uses the problematic buffer head again and again
      during which the problematic buffer head is released for multiple times.
      I suppose, this a serious issue which is long-lived in ocfs2.  This may
      cause other file systems which is also used in a the same host insane.
      
      So we should also consider about bakcporting this patch into linux
      -stable.
      
      Link: http://lkml.kernel.org/r/HK2PR06MB045211675B43EED794E597B6D56E0@HK2PR06MB0452.apcprd06.prod.outlook.comSigned-off-by: NChangwei Ge <ge.changwei@h3c.com>
      Suggested-by: NChangkuo Shi <shi.changkuo@h3c.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29aa3016
    • C
      ocfs2: don't use iocb when EIOCBQUEUED returns · 9e985787
      Changwei Ge 提交于
      When -EIOCBQUEUED returns, it means that aio_complete() will be called
      from dio_complete(), which is an asynchronous progress against
      write_iter.  Generally, IO is a very slow progress than executing
      instruction, but we still can't take the risk to access a freed iocb.
      
      And we do face a BUG crash issue.  Using the crash tool, iocb is
      obviously freed already.
      
        crash> struct -x kiocb ffff881a350f5900
        struct kiocb {
          ki_filp = 0xffff881a350f5a80,
          ki_pos = 0x0,
          ki_complete = 0x0,
          private = 0x0,
          ki_flags = 0x0
        }
      
      And the backtrace shows:
        ocfs2_file_write_iter+0xcaa/0xd00 [ocfs2]
        aio_run_iocb+0x229/0x2f0
        do_io_submit+0x291/0x540
        SyS_io_submit+0x10/0x20
        system_call_fastpath+0x16/0x75
      
      Link: http://lkml.kernel.org/r/1523361653-14439-1-git-send-email-ge.changwei@h3c.comSigned-off-by: NChangwei Ge <ge.changwei@h3c.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e985787
    • G
      ocfs2: without quota support, avoid calling quota recovery · 21158ca8
      Guozhonghua 提交于
      During one dead node's recovery by other node, quota recovery work will
      be queued.  We should avoid calling quota when it is not supported, so
      check the quota flags.
      
      Link: http://lkml.kernel.org/r/71604351584F6A4EBAE558C676F37CA401071AC9FB@H3CMLB12-EX.srv.huawei-3com.comSigned-off-by: Nguozhonghua <guozhonghua@h3c.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: Changwei Ge <ge.changwei@h3c.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      21158ca8
    • G
      ocfs2: remove ocfs2_is_o2cb_active() · a6346447
      Gang He 提交于
      Remove ocfs2_is_o2cb_active().  We have similar functions to identify
      which cluster stack is being used via osb->osb_cluster_stack.
      
      Secondly, the current implementation of ocfs2_is_o2cb_active() is not
      totally safe.  Based on the design of stackglue, we need to get
      ocfs2_stack_lock before using ocfs2_stack related data structures, and
      that active_stack pointer can be NULL in the case of mount failure.
      
      Link: http://lkml.kernel.org/r/1495441079-11708-1-git-send-email-ghe@suse.comSigned-off-by: NGang He <ghe@suse.com>
      Reviewed-by: NJoseph Qi <jiangqi903@gmail.com>
      Reviewed-by: NEric Ren <zren@suse.com>
      Acked-by: NChangwei Ge <ge.changwei@h3c.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6346447
    • A
      mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings · ac5b2c18
      Andrea Arcangeli 提交于
      THP allocation might be really disruptive when allocated on NUMA system
      with the local node full or hard to reclaim.  Stefan has posted an
      allocation stall report on 4.12 based SLES kernel which suggests the
      same issue:
      
        kvm: page allocation stalls for 194572ms, order:9, mode:0x4740ca(__GFP_HIGHMEM|__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_THISNODE|__GFP_MOVABLE|__GFP_DIRECT_RECLAIM), nodemask=(null)
        kvm cpuset=/ mems_allowed=0-1
        CPU: 10 PID: 84752 Comm: kvm Tainted: G        W 4.12.0+98-ph <a href="/view.php?id=1" title="[geschlossen] Integration Ramdisk" class="resolved">0000001</a> SLE15 (unreleased)
        Hardware name: Supermicro SYS-1029P-WTRT/X11DDW-NT, BIOS 2.0 12/05/2017
        Call Trace:
         dump_stack+0x5c/0x84
         warn_alloc+0xe0/0x180
         __alloc_pages_slowpath+0x820/0xc90
         __alloc_pages_nodemask+0x1cc/0x210
         alloc_pages_vma+0x1e5/0x280
         do_huge_pmd_wp_page+0x83f/0xf00
         __handle_mm_fault+0x93d/0x1060
         handle_mm_fault+0xc6/0x1b0
         __do_page_fault+0x230/0x430
         do_page_fault+0x2a/0x70
         page_fault+0x7b/0x80
         [...]
        Mem-Info:
        active_anon:126315487 inactive_anon:1612476 isolated_anon:5
         active_file:60183 inactive_file:245285 isolated_file:0
         unevictable:15657 dirty:286 writeback:1 unstable:0
         slab_reclaimable:75543 slab_unreclaimable:2509111
         mapped:81814 shmem:31764 pagetables:370616 bounce:0
         free:32294031 free_pcp:6233 free_cma:0
        Node 0 active_anon:254680388kB inactive_anon:1112760kB active_file:240648kB inactive_file:981168kB unevictable:13368kB isolated(anon):0kB isolated(file):0kB mapped:280240kB dirty:1144kB writeback:0kB shmem:95832kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 81225728kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
        Node 1 active_anon:250583072kB inactive_anon:5337144kB active_file:84kB inactive_file:0kB unevictable:49260kB isolated(anon):20kB isolated(file):0kB mapped:47016kB dirty:0kB writeback:4kB shmem:31224kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 31897600kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
      
      The defrag mode is "madvise" and from the above report it is clear that
      the THP has been allocated for MADV_HUGEPAGA vma.
      
      Andrea has identified that the main source of the problem is
      __GFP_THISNODE usage:
      
      : The problem is that direct compaction combined with the NUMA
      : __GFP_THISNODE logic in mempolicy.c is telling reclaim to swap very
      : hard the local node, instead of failing the allocation if there's no
      : THP available in the local node.
      :
      : Such logic was ok until __GFP_THISNODE was added to the THP allocation
      : path even with MPOL_DEFAULT.
      :
      : The idea behind the __GFP_THISNODE addition, is that it is better to
      : provide local memory in PAGE_SIZE units than to use remote NUMA THP
      : backed memory. That largely depends on the remote latency though, on
      : threadrippers for example the overhead is relatively low in my
      : experience.
      :
      : The combination of __GFP_THISNODE and __GFP_DIRECT_RECLAIM results in
      : extremely slow qemu startup with vfio, if the VM is larger than the
      : size of one host NUMA node. This is because it will try very hard to
      : unsuccessfully swapout get_user_pages pinned pages as result of the
      : __GFP_THISNODE being set, instead of falling back to PAGE_SIZE
      : allocations and instead of trying to allocate THP on other nodes (it
      : would be even worse without vfio type1 GUP pins of course, except it'd
      : be swapping heavily instead).
      
      Fix this by removing __GFP_THISNODE for THP requests which are
      requesting the direct reclaim.  This effectivelly reverts 5265047a
      on the grounds that the zone/node reclaim was known to be disruptive due
      to premature reclaim when there was memory free.  While it made sense at
      the time for HPC workloads without NUMA awareness on rare machines, it
      was ultimately harmful in the majority of cases.  The existing behaviour
      is similar, if not as widespare as it applies to a corner case but
      crucially, it cannot be tuned around like zone_reclaim_mode can.  The
      default behaviour should always be to cause the least harm for the
      common case.
      
      If there are specialised use cases out there that want zone_reclaim_mode
      in specific cases, then it can be built on top.  Longterm we should
      consider a memory policy which allows for the node reclaim like behavior
      for the specific memory ranges which would allow a
      
      [1] http://lkml.kernel.org/r/20180820032204.9591-1-aarcange@redhat.com
      
      Mel said:
      
      : Both patches look correct to me but I'm responding to this one because
      : it's the fix.  The change makes sense and moves further away from the
      : severe stalling behaviour we used to see with both THP and zone reclaim
      : mode.
      :
      : I put together a basic experiment with usemem configured to reference a
      : buffer multiple times that is 80% the size of main memory on a 2-socket
      : box with symmetric node sizes and defrag set to "always".  The defrag
      : setting is not the default but it would be functionally similar to
      : accessing a buffer with madvise(MADV_HUGEPAGE).  Usemem is configured to
      : reference the buffer multiple times and while it's not an interesting
      : workload, it would be expected to complete reasonably quickly as it fits
      : within memory.  The results were;
      :
      : usemem
      :                                   vanilla           noreclaim-v1
      : Amean     Elapsd-1       42.78 (   0.00%)       26.87 (  37.18%)
      : Amean     Elapsd-3       27.55 (   0.00%)        7.44 (  73.00%)
      : Amean     Elapsd-4        5.72 (   0.00%)        5.69 (   0.45%)
      :
      : This shows the elapsed time in seconds for 1 thread, 3 threads and 4
      : threads referencing buffers 80% the size of memory.  With the patches
      : applied, it's 37.18% faster for the single thread and 73% faster with two
      : threads.  Note that 4 threads showing little difference does not indicate
      : the problem is related to thread counts.  It's simply the case that 4
      : threads gets spread so their workload mostly fits in one node.
      :
      : The overall view from /proc/vmstats is more startling
      :
      :                          4.19.0-rc1  4.19.0-rc1
      :                             vanillanoreclaim-v1r1
      : Minor Faults               35593425      708164
      : Major Faults                 484088          36
      : Swap Ins                    3772837           0
      : Swap Outs                   3932295           0
      :
      : Massive amounts of swap in/out without the patch
      :
      : Direct pages scanned        6013214           0
      : Kswapd pages scanned              0           0
      : Kswapd pages reclaimed            0           0
      : Direct pages reclaimed      4033009           0
      :
      : Lots of reclaim activity without the patch
      :
      : Kswapd efficiency              100%        100%
      : Kswapd velocity               0.000       0.000
      : Direct efficiency               67%        100%
      : Direct velocity           11191.956       0.000
      :
      : Mostly from direct reclaim context as you'd expect without the patch.
      :
      : Page writes by reclaim  3932314.000       0.000
      : Page writes file                 19           0
      : Page writes anon            3932295           0
      : Page reclaim immediate        42336           0
      :
      : Writes from reclaim context is never good but the patch eliminates it.
      :
      : We should never have default behaviour to thrash the system for such a
      : basic workload.  If zone reclaim mode behaviour is ever desired but on a
      : single task instead of a global basis then the sensible option is to build
      : a mempolicy that enforces that behaviour.
      
      This was a severe regression compared to previous kernels that made
      important workloads unusable and it starts when __GFP_THISNODE was
      added to THP allocations under MADV_HUGEPAGE.  It is not a significant
      risk to go to the previous behavior before __GFP_THISNODE was added, it
      worked like that for years.
      
      This was simply an optimization to some lucky workloads that can fit in
      a single node, but it ended up breaking the VM for others that can't
      possibly fit in a single node, so going back is safe.
      
      [mhocko@suse.com: rewrote the changelog based on the one from Andrea]
      Link: http://lkml.kernel.org/r/20180925120326.24392-2-mhocko@kernel.org
      Fixes: 5265047a ("mm, thp: really limit transparent hugepage allocation to local node")
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NStefan Priebe <s.priebe@profihost.ag>
      Debugged-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NMel Gorman <mgorman@techsingularity.net>
      Tested-by: NMel Gorman <mgorman@techsingularity.net>
      Cc: Zi Yan <zi.yan@cs.rutgers.edu>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: <stable@vger.kernel.org>	[4.1+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac5b2c18
    • S
      include/linux/notifier.h: SRCU: fix ctags · 94e297c5
      Sam Protsenko 提交于
      ctags indexing ("make tags" command) throws this warning:
      
          ctags: Warning: include/linux/notifier.h:125:
          null expansion of name pattern "\1"
      
      This is the result of DEFINE_PER_CPU() macro expansion.  Fix that by
      getting rid of line break.
      
      Similar fix was already done in commit 25528213 ("tags: Fix
      DEFINE_PER_CPU expansions"), but this one probably wasn't noticed.
      
      Link: http://lkml.kernel.org/r/20181030202808.28027-1-semen.protsenko@linaro.org
      Fixes: 9c80172b ("kernel/SRCU: provide a static initializer")
      Signed-off-by: NSam Protsenko <semen.protsenko@linaro.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      94e297c5
    • R
      mm: handle no memcg case in memcg_kmem_charge() properly · e68599a3
      Roman Gushchin 提交于
      Mike Galbraith reported a regression caused by the commit 9b6f7e16
      ("mm: rework memcg kernel stack accounting") on a system with
      "cgroup_disable=memory" boot option: the system panics with the following
      stack trace:
      
        BUG: unable to handle kernel NULL pointer dereference at 00000000000000f8
        PGD 0 P4D 0
        Oops: 0002 [#1] PREEMPT SMP PTI
        CPU: 0 PID: 1 Comm: systemd Not tainted 4.19.0-preempt+ #410
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180531_142017-buildhw-08.phx2.fed4
        RIP: 0010:page_counter_try_charge+0x22/0xc0
        Code: 41 5d c3 c3 0f 1f 40 00 0f 1f 44 00 00 48 85 ff 0f 84 a7 00 00 00 41 56 48 89 f8 49 89 fe 49
        Call Trace:
         try_charge+0xcb/0x780
         memcg_kmem_charge_memcg+0x28/0x80
         memcg_kmem_charge+0x8b/0x1d0
         copy_process.part.41+0x1ca/0x2070
         _do_fork+0xd7/0x3d0
         do_syscall_64+0x5a/0x180
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The problem occurs because get_mem_cgroup_from_current() returns the NULL
      pointer if memory controller is disabled.  Let's check if this is a case
      at the beginning of memcg_kmem_charge() and just return 0 if
      mem_cgroup_disabled() returns true.  This is how we handle this case in
      many other places in the memory controller code.
      
      Link: http://lkml.kernel.org/r/20181029215123.17830-1-guro@fb.com
      Fixes: 9b6f7e16 ("mm: rework memcg kernel stack accounting")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Reported-by: NMike Galbraith <efault@gmx.de>
      Acked-by: NRik van Riel <riel@surriel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e68599a3
  2. 03 11月, 2018 9 次提交
    • L
      Merge tag 'for-linus-20181102' of git://git.kernel.dk/linux-block · 5f215853
      Linus Torvalds 提交于
      Pull block layer fixes from Jens Axboe:
       "The biggest part of this pull request is the revert of the blkcg
        cleanup series. It had one fix earlier for a stacked device issue, but
        another one was reported. Rather than play whack-a-mole with this,
        revert the entire series and try again for the next kernel release.
      
        Apart from that, only small fixes/changes.
      
        Summary:
      
         - Indentation fixup for mtip32xx (Colin Ian King)
      
         - The blkcg cleanup series revert (Dennis Zhou)
      
         - Two NVMe fixes. One fixing a regression in the nvme request
           initialization in this merge window, causing nvme-fc to not work.
           The other is a suspend/resume p2p resource issue (James, Keith)
      
         - Fix sg discard merge, allowing us to merge in cases where we didn't
           before (Jianchao Wang)
      
         - Call rq_qos_exit() after the queue is frozen, preventing a hang
           (Ming)
      
         - Fix brd queue setup, fixing an oops if we fail setting up all
           devices (Ming)"
      
      * tag 'for-linus-20181102' of git://git.kernel.dk/linux-block:
        nvme-pci: fix conflicting p2p resource adds
        nvme-fc: fix request private initialization
        blkcg: revert blkcg cleanups series
        block: brd: associate with queue until adding disk
        block: call rq_qos_exit() after queue is frozen
        mtip32xx: clean an indentation issue, remove extraneous tabs
        block: fix the DISCARD request merge
      5f215853
    • L
      Merge tag 'pwm/for-4.20-rc1' of... · fcc37f76
      Linus Torvalds 提交于
      Merge tag 'pwm/for-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
      
      Pull pwm updates from Thierry Reding:
       "This series contains a number of improvements to existing drivers,
        such as LPSS. Some drivers, such as renesas-tpu and rcar get support
        for more SoC generations. To round things off this fixes an issue with
        the sysfs interface"
      
      * tag 'pwm/for-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
        pwm: lpss: Only set update bit if we are actually changing the settings
        pwm: lpss: Force runtime-resume on suspend on Cherry Trail
        pwm: Enable TI ECAP driver for ARCH_K3
        dt-bindings: pwm: tiecap: Add TI AM654 SoC specific compatible
        dt-bindings: pwm: rcar: Add r8a774a1 support
        pwm: Send a uevent on the pwmchip device upon channel sysfs (un)export
        Revert "pwm: Set class for exported channels in sysfs"
        dt-bindings: pwm: renesas-tpu: Document r8a7744 support
        dt-bindings: pwm: rcar: Add r8a7744 support
        dt-bindings: pwm: renesas: tpu: Document R8A779{7|8}0 bindings
        dt-bindings: pwm: renesas: pwm-rcar: Document R8A779{7|8}0 bindings
        dt-bindings: pwm: renesas: tpu: Fix "compatible" prop description
        pwm: Use SPDX identifier for Renesas drivers
        pwm: lpss: Add get_state callback
        pwm: lpss: Release runtime-pm reference from the driver's remove callback
        pwm: lpss: Check PWM powerstate after resume on Cherry Trail devices
        pwm: lpss: Move struct pwm_lpss_chip definition to the header file
        pwm: lpss: Add ACPI HID for second PWM controller on Cherry Trail devices
        ACPI / PM: Export acpi_device_get_power() for use by modular build drivers
        pwm: tegra: Remove gratuituous blank line
      fcc37f76
    • L
      Merge tag 'edac_for_4.20_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp · 0b21f21a
      Linus Torvalds 提交于
      Pull more EDAC updates from Borislav Petkov:
       "The second part of the EDAC pile which contains the ADXL user and a
        build fix which addresses a not-so-sensical .config but fixes
        randconfig builds people do:
      
         - skx_edac: Address translation for NVDIMMs (Tony Luck and Qiuxu Zhuo)
      
         - ACPI_ADXL build fix"
      
      [ I don't think "sensical" is a word, particularly when used in the
        context of actually meaning "nonsensical", but I like it   - Linus ]
      
      * tag 'edac_for_4.20_2' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
        EDAC, skx: Fix randconfig builds
        EDAC, skx_edac: Add address translation for non-volatile DIMMs
      0b21f21a
    • L
      Merge tag 'sound-fix-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 54480aa7
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "A few device-specific fixes: a fix for SPDIF on old Creative PCI
        board, and two additional fixes for the recent changes in FireWire
        audio stack"
      
      * tag 'sound-fix-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: firewire-lib: fix insufficient PCM rule for period/buffer size
        ALSA: ca0106: Disable IZD on SB0570 DAC to fix audio pops
        ALSA: dice: fix to wait for releases of all ALSA character devices
      54480aa7
    • L
      Merge tag 'drm-next-2018-11-02' of git://anongit.freedesktop.org/drm/drm · bc6080ae
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "Pretty much a normal fixes pull pre-rc1, mostly amdgpu fixes, one i915
        link training regression fix, and a couple of minor panel/bridge fixes
        and a panel quirk"
      
      * tag 'drm-next-2018-11-02' of git://anongit.freedesktop.org/drm/drm: (37 commits)
        drm/amdgpu: revert "enable gfxoff in non-sriov and stutter mode by default"
        drm/amd/pp: Print warning if od_sclk/mclk out of range
        drm/amd/pp: Fix pp_sclk/mclk_od not work on Vega10
        drm/amd/pp: Fix pp_sclk/mclk_od not work on smu7
        drm/amd/powerplay: no MGPU fan boost enablement on DPM disabled
        drm/amdgpu: Fix skipping hangged job reset during gpu recover.
        drm/amd/powerplay: revise Vega20 pptable version check
        drm/amd/display: set backlight level limit to 1
        drm/panel: simple: Innolux TV123WAM is actually P120ZDG-BF1
        dt-bindings: drm/panel: simple: Innolux TV123WAM is actually P120ZDG-BF1
        drm/bridge: ti-sn65dsi86: Remove the mystery delay
        drm/panel: simple: Add "no-hpd" delay for Innolux TV123WAM
        drm/panel: simple: Support panels with HPD where HPD isn't connected
        dt-bindings: drm/panel: simple: Add no-hpd property
        drm/edid: Add 6 bpc quirk for BOE panel.
        drm/amdgpu: fix reporting of failed msg sent to SMU (v2)
        drm/amdgpu: Fix compute ring 1.0.0 failure after reset
        drm/amdgpu: fix VM leaf walking
        drm/amdgpu: fix amdgpu_vm_fini
        drm/amd/powerplay: commonize the API for retrieving current clocks
        ...
      bc6080ae
    • L
      Merge tag 'apparmor-pr-2018-11-01' of... · d81f50bd
      Linus Torvalds 提交于
      Merge tag 'apparmor-pr-2018-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor
      
      Pull apparmor updates from John Johansen:
       "Features/Improvements:
         - replace spin_is_locked() with lockdep
         - add base support for secmark labeling and matching
      
        Cleanups:
         - clean an indentation issue, remove extraneous space
         - remove no-op permission check in policy_unpack
         - fix checkpatch missing spaces error in Parse secmark policy
         - fix network performance issue in aa_label_sk_perm
      
        Bug fixes:
         - add #ifdef checks for secmark filtering
         - fix an error code in __aa_create_ns()
         - don't try to replace stale label in ptrace checks
         - fix failure to audit context info in build_change_hat
         - check buffer bounds when mapping permissions mask
         - fully initialize aa_perms struct when answering userspace query
         - fix uninitialized value in aa_split_fqname"
      
      * tag 'apparmor-pr-2018-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
        apparmor: clean an indentation issue, remove extraneous space
        apparmor: fix checkpatch error in Parse secmark policy
        apparmor: add #ifdef checks for secmark filtering
        apparmor: Fix uninitialized value in aa_split_fqname
        apparmor: don't try to replace stale label in ptraceme check
        apparmor: Replace spin_is_locked() with lockdep
        apparmor: Allow filtering based on secmark policy
        apparmor: Parse secmark policy
        apparmor: Add a wildcard secid
        apparmor: don't try to replace stale label in ptrace access check
        apparmor: Fix network performance issue in aa_label_sk_perm
      d81f50bd
    • L
      Merge tag 'xfs-4.20-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · c2aa1a44
      Linus Torvalds 提交于
      Pull vfs dedup fixes from Dave Chinner:
       "This reworks the vfs data cloning infrastructure.
      
        We discovered many issues with these interfaces late in the 4.19 cycle
        - the worst of them (data corruption, setuid stripping) were fixed for
        XFS in 4.19-rc8, but a larger rework of the infrastructure fixing all
        the problems was needed. That rework is the contents of this pull
        request.
      
        Rework the vfs_clone_file_range and vfs_dedupe_file_range
        infrastructure to use a common .remap_file_range method and supply
        generic bounds and sanity checking functions that are shared with the
        data write path. The current VFS infrastructure has problems with
        rlimit, LFS file sizes, file time stamps, maximum filesystem file
        sizes, stripping setuid bits, etc and so they are addressed in these
        commits.
      
        We also introduce the ability for the ->remap_file_range methods to
        return short clones so that clones for vfs_copy_file_range() don't get
        rejected if the entire range can't be cloned. It also allows
        filesystems to sliently skip deduplication of partial EOF blocks if
        they are not capable of doing so without requiring errors to be thrown
        to userspace.
      
        Existing filesystems are converted to user the new remap_file_range
        method, and both XFS and ocfs2 are modified to make use of the new
        generic checking infrastructure"
      
      * tag 'xfs-4.20-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (28 commits)
        xfs: remove [cm]time update from reflink calls
        xfs: remove xfs_reflink_remap_range
        xfs: remove redundant remap partial EOF block checks
        xfs: support returning partial reflink results
        xfs: clean up xfs_reflink_remap_blocks call site
        xfs: fix pagecache truncation prior to reflink
        ocfs2: remove ocfs2_reflink_remap_range
        ocfs2: support partial clone range and dedupe range
        ocfs2: fix pagecache truncation prior to reflink
        ocfs2: truncate page cache for clone destination file before remapping
        vfs: clean up generic_remap_file_range_prep return value
        vfs: hide file range comparison function
        vfs: enable remap callers that can handle short operations
        vfs: plumb remap flags through the vfs dedupe functions
        vfs: plumb remap flags through the vfs clone functions
        vfs: make remap_file_range functions take and return bytes completed
        vfs: remap helper should update destination inode metadata
        vfs: pass remap flags to generic_remap_checks
        vfs: pass remap flags to generic_remap_file_range_prep
        vfs: combine the clone and dedupe into a single remap_file_range
        ...
      c2aa1a44
    • L
      Merge tag 'powerpc-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · b69f9e17
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
       "Some things that I missed due to travel, or that came in late.
      
        Two fixes also going to stable:
      
         - A revert of a buggy change to the 8xx TLB miss handlers.
      
         - Our flushing of SPE (Signal Processing Engine) registers on fork
           was broken.
      
        Other changes:
      
         - A change to the KVM decrementer emulation to use proper APIs.
      
         - Some cleanups to the way we do code patching in the 8xx code.
      
         - Expose the maximum possible memory for the system in
           /proc/powerpc/lparcfg.
      
         - Merge some updates from Scott: "a couple device tree updates, and a
           fix for a missing prototype warning"
      
        A few other minor fixes and a handful of fixes for our selftests.
      
        Thanks to: Aravinda Prasad, Breno Leitao, Camelia Groza, Christophe
        Leroy, Felipe Rechia, Joel Stanley, Naveen N. Rao, Paul Mackerras,
        Scott Wood, Tyrel Datwyler"
      
      * tag 'powerpc-4.20-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (21 commits)
        selftests/powerpc: Fix compilation issue due to asm label
        selftests/powerpc/cache_shape: Fix out-of-tree build
        selftests/powerpc/switch_endian: Fix out-of-tree build
        selftests/powerpc/pmu: Link ebb tests with -no-pie
        selftests/powerpc/signal: Fix out-of-tree build
        selftests/powerpc/ptrace: Fix out-of-tree build
        powerpc/xmon: Relax frame size for clang
        selftests: powerpc: Fix warning for security subdir
        selftests/powerpc: Relax L1d miss targets for rfi_flush test
        powerpc/process: Fix flush_all_to_thread for SPE
        powerpc/pseries: add missing cpumask.h include file
        selftests/powerpc: Fix ptrace tm failure
        KVM: PPC: Use exported tb_to_ns() function in decrementer emulation
        powerpc/pseries: Export maximum memory value
        powerpc/8xx: Use patch_site for perf counters setup
        powerpc/8xx: Use patch_site for memory setup patching
        powerpc/code-patching: Add a helper to get the address of a patch_site
        Revert "powerpc/8xx: Use L1 entry APG to handle _PAGE_ACCESSED for CONFIG_SWAP"
        powerpc/8xx: add missing header in 8xx_mmu.c
        powerpc/8xx: Add DT node for using the SEC engine of the MPC885
        ...
      b69f9e17
    • L
      Merge tag 'riscv-for-linus-4.20-mw3' of... · 63c6e188
      Linus Torvalds 提交于
      Merge tag 'riscv-for-linus-4.20-mw3' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
      
      Pull RISC-V defconfig update from Palmer Dabbelt:
       "Sorry for the last minute patches, but it was suggested we try to push
        this in before rc1 to make it easier for people to keep their branch
        rebases sane"
      
      * tag 'riscv-for-linus-4.20-mw3' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
        RISC-V: refresh defconfig
      63c6e188
  3. 02 11月, 2018 15 次提交
    • K
      nvme-pci: fix conflicting p2p resource adds · 9fe5c59f
      Keith Busch 提交于
      The nvme pci driver had been adding its CMB resource to the P2P DMA
      subsystem everytime on on a controller reset. This results in the
      following warning:
      
          ------------[ cut here ]------------
          nvme 0000:00:03.0: Conflicting mapping in same section
          WARNING: CPU: 7 PID: 81 at kernel/memremap.c:155 devm_memremap_pages+0xa6/0x380
          ...
          Call Trace:
           pci_p2pdma_add_resource+0x153/0x370
           nvme_reset_work+0x28c/0x17b1 [nvme]
           ? add_timer+0x107/0x1e0
           ? dequeue_entity+0x81/0x660
           ? dequeue_entity+0x3b0/0x660
           ? pick_next_task_fair+0xaf/0x610
           ? __switch_to+0xbc/0x410
           process_one_work+0x1cf/0x350
           worker_thread+0x215/0x3d0
           ? process_one_work+0x350/0x350
           kthread+0x107/0x120
           ? kthread_park+0x80/0x80
           ret_from_fork+0x1f/0x30
          ---[ end trace f7ea76ac6ee72727 ]---
          nvme nvme0: failed to register the CMB
      
      This patch fixes this by registering the CMB with P2P only once.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9fe5c59f
    • J
      nvme-fc: fix request private initialization · d19b8bc8
      James Smart 提交于
      The patch made to avoid Coverity reporting of out of bounds access
      on aen_op moved the assignment of a pointer, leaving it null when it
      was subsequently used to calculate a private pointer. Thus the private
      pointer was bad.
      
      Move/correct the private pointer initialization to be in sync with the
      patch.
      
      Fixes: 0d2bdf9f ("nvme-fc: rework the request initialization code")
      Signed-off-by: NJames Smart <jsmart2021@gmail.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d19b8bc8
    • C
      apparmor: clean an indentation issue, remove extraneous space · 566f52ec
      Colin Ian King 提交于
      Trivial fix to clean up an indentation issue, remove space
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NJohn Johansen <john.johansen@canonical.com>
      566f52ec
    • J
      apparmor: fix checkpatch error in Parse secmark policy · 76af016e
      John Johansen 提交于
      Fix missed spacing error reported by checkpatch for
      9caafbe2 ("Parse secmark policy")
      Signed-off-by: NJohn Johansen <john.johansen@canonical.com>
      76af016e
    • D
      Merge tag 'drm-intel-next-fixes-2018-10-25' of... · f9885ef8
      Dave Airlie 提交于
      Merge tag 'drm-intel-next-fixes-2018-10-25' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
      
      - Fix to avoid link retraining workaround on eDP (the other is a comment change)
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181025131836.GA2296@jlahtine-desk.ger.corp.intel.com
      f9885ef8
    • L
      Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 8adcc599
      Linus Torvalds 提交于
      Pull misc vfs updates from Al Viro:
       "No common topic, really - a handful of assorted stuff; the least
        trivial bits are Mark's dedupe patches"
      
      * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        fs/exofs: only use true/false for asignment of bool type variable
        fs/exofs: fix potential memory leak in mount option parsing
        Delete invalid assignment statements in do_sendfile
        iomap: remove duplicated include from iomap.c
        vfs: dedupe should return EPERM if permission is not granted
        vfs: allow dedupe of user owned read-only files
        ntfs: don't open-code ERR_CAST
        ext4: don't open-code ERR_CAST
      8adcc599
    • L
      Merge branch 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 9931a07d
      Linus Torvalds 提交于
      Pull AFS updates from Al Viro:
       "AFS series, with some iov_iter bits included"
      
      * 'work.afs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
        missing bits of "iov_iter: Separate type from direction and use accessor functions"
        afs: Probe multiple fileservers simultaneously
        afs: Fix callback handling
        afs: Eliminate the address pointer from the address list cursor
        afs: Allow dumping of server cursor on operation failure
        afs: Implement YFS support in the fs client
        afs: Expand data structure fields to support YFS
        afs: Get the target vnode in afs_rmdir() and get a callback on it
        afs: Calc callback expiry in op reply delivery
        afs: Fix FS.FetchStatus delivery from updating wrong vnode
        afs: Implement the YFS cache manager service
        afs: Remove callback details from afs_callback_break struct
        afs: Commit the status on a new file/dir/symlink
        afs: Increase to 64-bit volume ID and 96-bit vnode ID for YFS
        afs: Don't invoke the server to read data beyond EOF
        afs: Add a couple of tracepoints to log I/O errors
        afs: Handle EIO from delivery function
        afs: Fix TTL on VL server and address lists
        afs: Implement VL server rotation
        afs: Improve FS server rotation error handling
        ...
      9931a07d
    • D
      Merge branch 'drm-next-4.20' of git://people.freedesktop.org/~agd5f/linux into drm-next · 43e0f873
      Dave Airlie 提交于
      - Fix flickering at low backlight levels on some systems
      - Fix some overclocking regressions
      - Vega20 updates for
      - GPU recovery fixes
      - Disable gfxoff on RV as some sbios/fw combinations are not stable yet
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      From: Alex Deucher <alexdeucher@gmail.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181101151939.2828-1-alexander.deucher@amd.com
      43e0f873
    • D
      Merge tag 'drm-misc-next-fixes-2018-10-31' of... · 52b50ae1
      Dave Airlie 提交于
      Merge tag 'drm-misc-next-fixes-2018-10-31' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
      
      - Properly label Innolux TV123WAM as P120ZDG-BF1 (Doug)
      - Add optional delay for panels without hpd hooked up (which solves the
        mystery delay for TI SN65DSI86 bridge) (Doug)
      - Another 6bpc quirk for BOE panel 0x0771 (Shawn)
      
      Cc: Doug Anderson <dianders@chromium.org>
      Cc: Lee, Shawn C <shawn.c.lee@intel.com>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      
      From: Sean Paul <sean@poorly.run>
      Link: https://patchwork.freedesktop.org/patch/msgid/20181031201944.GA262020@art_vandelay
      52b50ae1
    • D
      blkcg: revert blkcg cleanups series · b5f2954d
      Dennis Zhou 提交于
      This reverts a series committed earlier due to null pointer exception
      bug report in [1]. It seems there are edge case interactions that I did
      not consider and will need some time to understand what causes the
      adverse interactions.
      
      The original series can be found in [2] with a follow up series in [3].
      
      [1] https://www.spinics.net/lists/cgroups/msg20719.html
      [2] https://lore.kernel.org/lkml/20180911184137.35897-1-dennisszhou@gmail.com/
      [3] https://lore.kernel.org/lkml/20181020185612.51587-1-dennis@kernel.org/
      
      This reverts the following commits:
      d459d853, b2c3fa54, 101246ec, b3b9f24f, e2b09899,
      f0fcb3ec, c839e7a0, bdc24917, 74b7c02a, 5bf9a1f3,
      a7b39b4e, 07b05bcc, 49f4c2dc, 27e6fa99Signed-off-by: NDennis Zhou <dennis@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b5f2954d
    • M
      block: brd: associate with queue until adding disk · 153fcd5f
      Ming Lei 提交于
      brd_free() may be called in failure path on one brd instance which
      disk isn't added yet, so release handler of gendisk may free the
      associated request_queue early and causes the following use-after-free[1].
      
      This patch fixes this issue by associating gendisk with request_queue
      just before adding disk.
      
      [1] KASAN: use-after-free Read in del_timer_syncNon-volatile memory driver v1.3
      Linux agpgart interface v0.103
      [drm] Initialized vgem 1.0.0 20120112 for virtual device on minor 0
      usbcore: registered new interface driver udl
      ==================================================================
      BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20
      kernel/locking/lockdep.c:3218
      Read of size 8 at addr ffff8801d1b6b540 by task swapper/0/1
      
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0+ #88
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x244/0x39d lib/dump_stack.c:113
        print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
        kasan_report_error mm/kasan/report.c:354 [inline]
        kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
        __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
        __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218
        lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
        del_timer_sync+0xb7/0x270 kernel/time/timer.c:1283
        blk_cleanup_queue+0x413/0x710 block/blk-core.c:809
        brd_free+0x5d/0x71 drivers/block/brd.c:422
        brd_init+0x2eb/0x393 drivers/block/brd.c:518
        do_one_initcall+0x145/0x957 init/main.c:890
        do_initcall_level init/main.c:958 [inline]
        do_initcalls init/main.c:966 [inline]
        do_basic_setup init/main.c:984 [inline]
        kernel_init_freeable+0x5c6/0x6b9 init/main.c:1148
        kernel_init+0x11/0x1ae init/main.c:1068
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:350
      
      Reported-by: syzbot+3701447012fe951dabb2@syzkaller.appspotmail.com
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      153fcd5f
    • L
      Merge tag 'compiler-attributes-for-linus-4.20-rc1' of https://github.com/ojeda/linux · e468f5c0
      Linus Torvalds 提交于
      Pull compiler attribute updates from Miguel Ojeda:
       "This is an effort to disentangle the include/linux/compiler*.h headers
        and bring them up to date.
      
        The main idea behind the series is to use feature checking macros
        (i.e. __has_attribute) instead of compiler version checks (e.g.
        GCC_VERSION), which are compiler-agnostic (so they can be shared,
        reducing the size of compiler-specific headers) and version-agnostic.
      
        Other related improvements have been performed in the headers as well,
        which on top of the use of __has_attribute it has amounted to a
        significant simplification of these headers (e.g. GCC_VERSION is now
        only guarding a few non-attribute macros).
      
        This series should also help the efforts to support compiling the
        kernel with clang and icc. A fair amount of documentation and comments
        have also been added, clarified or removed; and the headers are now
        more readable, which should help kernel developers in general.
      
        The series was triggered due to the move to gcc >= 4.6. In turn, this
        series has also triggered Sparse to gain the ability to recognize
        __has_attribute on its own.
      
        Finally, the __nonstring variable attribute series has been also
        applied on top; plus two related patches from Nick Desaulniers for
        unreachable() that came a bit afterwards"
      
      * tag 'compiler-attributes-for-linus-4.20-rc1' of https://github.com/ojeda/linux:
        compiler-gcc: remove comment about gcc 4.5 from unreachable()
        compiler.h: update definition of unreachable()
        Compiler Attributes: ext4: remove local __nonstring definition
        Compiler Attributes: auxdisplay: panel: use __nonstring
        Compiler Attributes: enable -Wstringop-truncation on W=1 (gcc >= 8)
        Compiler Attributes: add support for __nonstring (gcc >= 8)
        Compiler Attributes: add MAINTAINERS entry
        Compiler Attributes: add Doc/process/programming-language.rst
        Compiler Attributes: remove uses of __attribute__ from compiler.h
        Compiler Attributes: KENTRY used twice the "used" attribute
        Compiler Attributes: use feature checks instead of version checks
        Compiler Attributes: add missing SPDX ID in compiler_types.h
        Compiler Attributes: remove unneeded sparse (__CHECKER__) tests
        Compiler Attributes: homogenize __must_be_array
        Compiler Attributes: remove unneeded tests
        Compiler Attributes: always use the extra-underscores syntax
        Compiler Attributes: remove unused attributes
      e468f5c0
    • A
      RISC-V: refresh defconfig · ba1f0d95
      Anup Patel 提交于
      This patch updates defconfig using savedefconfig on Linux-4.19.  It is
      intended to have no functional change.
      Signed-off-by: NAnup Patel <anup@brainfault.org>
      Reviewed-by: NPalmer Dabbelt <palmer@sifive.com>
      Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
      ba1f0d95
    • L
      Merge branch 'next-keys2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · baa888d2
      Linus Torvalds 提交于
      Pull keys updates from James Morris:
       "Provide five new operations in the key_type struct that can be used to
        provide access to asymmetric key operations. These will be implemented
        for the asymmetric key type in a later patch and may refer to a key
        retained in RAM by the kernel or a key retained in crypto hardware.
      
           int (*asym_query)(const struct kernel_pkey_params *params,
                             struct kernel_pkey_query *info);
           int (*asym_eds_op)(struct kernel_pkey_params *params,
                              const void *in, void *out);
           int (*asym_verify_signature)(struct kernel_pkey_params *params,
                                        const void *in, const void *in2);
      
        Since encrypt, decrypt and sign are identical in their interfaces,
        they're rolled together in the asym_eds_op() operation and there's an
        operation ID in the params argument to distinguish them.
      
        Verify is different in that we supply the data and the signature
        instead and get an error value (or 0) as the only result on the
        expectation that this may well be how a hardware crypto device may
        work"
      
      * 'next-keys2' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (22 commits)
        KEYS: asym_tpm: Add support for the sign operation [ver #2]
        KEYS: asym_tpm: Implement tpm_sign [ver #2]
        KEYS: asym_tpm: Implement signature verification [ver #2]
        KEYS: asym_tpm: Implement the decrypt operation [ver #2]
        KEYS: asym_tpm: Implement tpm_unbind [ver #2]
        KEYS: asym_tpm: Add loadkey2 and flushspecific [ver #2]
        KEYS: Move trusted.h to include/keys [ver #2]
        KEYS: trusted: Expose common functionality [ver #2]
        KEYS: asym_tpm: Implement encryption operation [ver #2]
        KEYS: asym_tpm: Implement pkey_query [ver #2]
        KEYS: Add parser for TPM-based keys [ver #2]
        KEYS: asym_tpm: extract key size & public key [ver #2]
        KEYS: asym_tpm: add skeleton for asym_tpm [ver #2]
        crypto: rsa-pkcs1pad: Allow hash to be optional [ver #2]
        KEYS: Implement PKCS#8 RSA Private Key parser [ver #2]
        KEYS: Implement encrypt, decrypt and sign for software asymmetric key [ver #2]
        KEYS: Allow the public_key struct to hold a private key [ver #2]
        KEYS: Provide software public key query function [ver #2]
        KEYS: Make the X.509 and PKCS7 parsers supply the sig encoding type [ver #2]
        KEYS: Provide missing asymmetric key subops for new key type ops [ver #2]
        ...
      baa888d2
    • A
      missing bits of "iov_iter: Separate type from direction and use accessor functions" · 0e9b4a82
      Al Viro 提交于
      sunrpc patches from nfs tree conflict with calling conventions change done
      in iov_iter work.  Trivial fixup...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0e9b4a82