1. 09 5月, 2017 11 次提交
  2. 04 5月, 2017 3 次提交
    • M
      mm: introduce memalloc_nofs_{save,restore} API · 7dea19f9
      Michal Hocko 提交于
      GFP_NOFS context is used for the following 5 reasons currently:
      
       - to prevent from deadlocks when the lock held by the allocation
         context would be needed during the memory reclaim
      
       - to prevent from stack overflows during the reclaim because the
         allocation is performed from a deep context already
      
       - to prevent lockups when the allocation context depends on other
         reclaimers to make a forward progress indirectly
      
       - just in case because this would be safe from the fs POV
      
       - silence lockdep false positives
      
      Unfortunately overuse of this allocation context brings some problems to
      the MM.  Memory reclaim is much weaker (especially during heavy FS
      metadata workloads), OOM killer cannot be invoked because the MM layer
      doesn't have enough information about how much memory is freeable by the
      FS layer.
      
      In many cases it is far from clear why the weaker context is even used
      and so it might be used unnecessarily.  We would like to get rid of
      those as much as possible.  One way to do that is to use the flag in
      scopes rather than isolated cases.  Such a scope is declared when really
      necessary, tracked per task and all the allocation requests from within
      the context will simply inherit the GFP_NOFS semantic.
      
      Not only this is easier to understand and maintain because there are
      much less problematic contexts than specific allocation requests, this
      also helps code paths where FS layer interacts with other layers (e.g.
      crypto, security modules, MM etc...) and there is no easy way to convey
      the allocation context between the layers.
      
      Introduce memalloc_nofs_{save,restore} API to control the scope of
      GFP_NOFS allocation context.  This is basically copying
      memalloc_noio_{save,restore} API we have for other restricted allocation
      context GFP_NOIO.  The PF_MEMALLOC_NOFS flag already exists and it is
      just an alias for PF_FSTRANS which has been xfs specific until recently.
      There are no more PF_FSTRANS users anymore so let's just drop it.
      
      PF_MEMALLOC_NOFS is now checked in the MM layer and drops __GFP_FS
      implicitly same as PF_MEMALLOC_NOIO drops __GFP_IO.  memalloc_noio_flags
      is renamed to current_gfp_context because it now cares about both
      PF_MEMALLOC_NOFS and PF_MEMALLOC_NOIO contexts.  Xfs code paths preserve
      their semantic.  kmem_flags_convert() doesn't need to evaluate the flag
      anymore.
      
      This patch shouldn't introduce any functional changes.
      
      Let's hope that filesystems will drop direct GFP_NOFS (resp.  ~__GFP_FS)
      usage as much as possible and only use a properly documented
      memalloc_nofs_{save,restore} checkpoints where they are appropriate.
      
      [akpm@linux-foundation.org: fix comment typo, reflow comment]
      Link: http://lkml.kernel.org/r/20170306131408.9828-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7dea19f9
    • M
      lockdep: allow to disable reclaim lockup detection · 7e784422
      Michal Hocko 提交于
      The current implementation of the reclaim lockup detection can lead to
      false positives and those even happen and usually lead to tweak the code
      to silence the lockdep by using GFP_NOFS even though the context can use
      __GFP_FS just fine.
      
      See
      
        http://lkml.kernel.org/r/20160512080321.GA18496@dastard
      
      as an example.
      
        =================================
        [ INFO: inconsistent lock state ]
        4.5.0-rc2+ #4 Tainted: G           O
        ---------------------------------
        inconsistent {RECLAIM_FS-ON-R} -> {IN-RECLAIM_FS-W} usage.
        kswapd0/543 [HC0[0]:SC0[0]:HE1:SE1] takes:
      
        (&xfs_nondir_ilock_class){++++-+}, at: xfs_ilock+0x177/0x200 [xfs]
      
        {RECLAIM_FS-ON-R} state was registered at:
          mark_held_locks+0x79/0xa0
          lockdep_trace_alloc+0xb3/0x100
          kmem_cache_alloc+0x33/0x230
          kmem_zone_alloc+0x81/0x120 [xfs]
          xfs_refcountbt_init_cursor+0x3e/0xa0 [xfs]
          __xfs_refcount_find_shared+0x75/0x580 [xfs]
          xfs_refcount_find_shared+0x84/0xb0 [xfs]
          xfs_getbmap+0x608/0x8c0 [xfs]
          xfs_vn_fiemap+0xab/0xc0 [xfs]
          do_vfs_ioctl+0x498/0x670
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x12/0x6f
      
               CPU0
               ----
          lock(&xfs_nondir_ilock_class);
          <Interrupt>
            lock(&xfs_nondir_ilock_class);
      
         *** DEADLOCK ***
      
        3 locks held by kswapd0/543:
      
        stack backtrace:
        CPU: 0 PID: 543 Comm: kswapd0 Tainted: G           O    4.5.0-rc2+ #4
        Call Trace:
         lock_acquire+0xd8/0x1e0
         down_write_nested+0x5e/0xc0
         xfs_ilock+0x177/0x200 [xfs]
         xfs_reflink_cancel_cow_range+0x150/0x300 [xfs]
         xfs_fs_evict_inode+0xdc/0x1e0 [xfs]
         evict+0xc5/0x190
         dispose_list+0x39/0x60
         prune_icache_sb+0x4b/0x60
         super_cache_scan+0x14f/0x1a0
         shrink_slab.part.63.constprop.79+0x1e9/0x4e0
         shrink_zone+0x15e/0x170
         kswapd+0x4f1/0xa80
         kthread+0xf2/0x110
         ret_from_fork+0x3f/0x70
      
      To quote Dave:
       "Ignoring whether reflink should be doing anything or not, that's a
        "xfs_refcountbt_init_cursor() gets called both outside and inside
        transactions" lockdep false positive case. The problem here is lockdep
        has seen this allocation from within a transaction, hence a GFP_NOFS
        allocation, and now it's seeing it in a GFP_KERNEL context. Also note
        that we have an active reference to this inode.
      
        So, because the reclaim annotations overload the interrupt level
        detections and it's seen the inode ilock been taken in reclaim
        ("interrupt") context, this triggers a reclaim context warning where
        it thinks it is unsafe to do this allocation in GFP_KERNEL context
        holding the inode ilock..."
      
      This sounds like a fundamental problem of the reclaim lock detection.
      It is really impossible to annotate such a special usecase IMHO unless
      the reclaim lockup detection is reworked completely.  Until then it is
      much better to provide a way to add "I know what I am doing flag" and
      mark problematic places.  This would prevent from abusing GFP_NOFS flag
      which has a runtime effect even on configurations which have lockdep
      disabled.
      
      Introduce __GFP_NOLOCKDEP flag which tells the lockdep gfp tracking to
      skip the current allocation request.
      
      While we are at it also make sure that the radix tree doesn't
      accidentaly override tags stored in the upper part of the gfp_mask.
      
      Link: http://lkml.kernel.org/r/20170306131408.9828-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Nikolay Borisov <nborisov@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e784422
    • N
      lockdep: teach lockdep about memalloc_noio_save · 6d7225f0
      Nikolay Borisov 提交于
      Patch series "scope GFP_NOFS api", v5.
      
      This patch (of 7):
      
      Commit 21caf2fc ("mm: teach mm by current context info to not do I/O
      during memory allocation") added the memalloc_noio_(save|restore)
      functions to enable people to modify the MM behavior by disabling I/O
      during memory allocation.
      
      This was further extended in commit 934f3072 ("mm: clear __GFP_FS
      when PF_MEMALLOC_NOIO is set").
      
      memalloc_noio_* functions prevent allocation paths recursing back into
      the filesystem without explicitly changing the flags for every
      allocation site.
      
      However, lockdep hasn't been keeping up with the changes and it entirely
      misses handling the memalloc_noio adjustments.  Instead, it is left to
      the callers of __lockdep_trace_alloc to call the function after they
      have shaven the respective GFP flags which can lead to false positives:
      
        =================================
         [ INFO: inconsistent lock state ]
         4.10.0-nbor #134 Not tainted
         ---------------------------------
         inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
         fsstress/3365 [HC0[0]:SC0[0]:HE1:SE1] takes:
          (&xfs_nondir_ilock_class){++++?.}, at: xfs_ilock+0x141/0x230
         {IN-RECLAIM_FS-W} state was registered at:
           __lock_acquire+0x62a/0x17c0
           lock_acquire+0xc5/0x220
           down_write_nested+0x4f/0x90
           xfs_ilock+0x141/0x230
           xfs_reclaim_inode+0x12a/0x320
           xfs_reclaim_inodes_ag+0x2c8/0x4e0
           xfs_reclaim_inodes_nr+0x33/0x40
           xfs_fs_free_cached_objects+0x19/0x20
           super_cache_scan+0x191/0x1a0
           shrink_slab+0x26f/0x5f0
           shrink_node+0xf9/0x2f0
           kswapd+0x356/0x920
           kthread+0x10c/0x140
           ret_from_fork+0x31/0x40
         irq event stamp: 173777
         hardirqs last  enabled at (173777): __local_bh_enable_ip+0x70/0xc0
         hardirqs last disabled at (173775): __local_bh_enable_ip+0x37/0xc0
         softirqs last  enabled at (173776): _xfs_buf_find+0x67a/0xb70
         softirqs last disabled at (173774): _xfs_buf_find+0x5db/0xb70
      
         other info that might help us debug this:
          Possible unsafe locking scenario:
      
                CPU0
                ----
           lock(&xfs_nondir_ilock_class);
           <Interrupt>
             lock(&xfs_nondir_ilock_class);
      
          *** DEADLOCK ***
      
         4 locks held by fsstress/3365:
          #0:  (sb_writers#10){++++++}, at: mnt_want_write+0x24/0x50
          #1:  (&sb->s_type->i_mutex_key#12){++++++}, at: vfs_setxattr+0x6f/0xb0
          #2:  (sb_internal#2){++++++}, at: xfs_trans_alloc+0xfc/0x140
          #3:  (&xfs_nondir_ilock_class){++++?.}, at: xfs_ilock+0x141/0x230
      
         stack backtrace:
         CPU: 0 PID: 3365 Comm: fsstress Not tainted 4.10.0-nbor #134
         Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
         Call Trace:
          kmem_cache_alloc_node_trace+0x3a/0x2c0
          vm_map_ram+0x2a1/0x510
          _xfs_buf_map_pages+0x77/0x140
          xfs_buf_get_map+0x185/0x2a0
          xfs_attr_rmtval_set+0x233/0x430
          xfs_attr_leaf_addname+0x2d2/0x500
          xfs_attr_set+0x214/0x420
          xfs_xattr_set+0x59/0xb0
          __vfs_setxattr+0x76/0xa0
          __vfs_setxattr_noperm+0x5e/0xf0
          vfs_setxattr+0xae/0xb0
          setxattr+0x15e/0x1a0
          path_setxattr+0x8f/0xc0
          SyS_lsetxattr+0x11/0x20
          entry_SYSCALL_64_fastpath+0x23/0xc6
      
      Let's fix this by making lockdep explicitly do the shaving of respective
      GFP flags.
      
      Fixes: 934f3072 ("mm: clear __GFP_FS when PF_MEMALLOC_NOIO is set")
      Link: http://lkml.kernel.org/r/20170306131408.9828-2-mhocko@kernel.orgSigned-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: Chris Mason <clm@fb.com>
      Cc: David Sterba <dsterba@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Brian Foster <bfoster@redhat.com>
      Cc: Darrick J. Wong <darrick.wong@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d7225f0
  3. 02 5月, 2017 13 次提交
  4. 01 5月, 2017 3 次提交
    • Y
      bpf: enhance verifier to understand stack pointer arithmetic · 332270fd
      Yonghong Song 提交于
      llvm 4.0 and above generates the code like below:
      ....
      440: (b7) r1 = 15
      441: (05) goto pc+73
      515: (79) r6 = *(u64 *)(r10 -152)
      516: (bf) r7 = r10
      517: (07) r7 += -112
      518: (bf) r2 = r7
      519: (0f) r2 += r1
      520: (71) r1 = *(u8 *)(r8 +0)
      521: (73) *(u8 *)(r2 +45) = r1
      ....
      and the verifier complains "R2 invalid mem access 'inv'" for insn #521.
      This is because verifier marks register r2 as unknown value after #519
      where r2 is a stack pointer and r1 holds a constant value.
      
      Teach verifier to recognize "stack_ptr + imm" and
      "stack_ptr + reg with const val" as valid stack_ptr with new offset.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      332270fd
    • S
      ring-buffer: Return reader page back into existing ring buffer · 73a757e6
      Steven Rostedt (VMware) 提交于
      When reading the ring buffer for consuming, it is optimized for splice,
      where a page is taken out of the ring buffer (zero copy) and sent to the
      reading consumer. When the read is finished with the page, it calls
      ring_buffer_free_read_page(), which simply frees the page. The next time the
      reader needs to get a page from the ring buffer, it must call
      ring_buffer_alloc_read_page() which allocates and initializes a reader page
      for the ring buffer to be swapped into the ring buffer for a new filled page
      for the reader.
      
      The problem is that there's no reason to actually free the page when it is
      passed back to the ring buffer. It can hold it off and reuse it for the next
      iteration. This completely removes the interaction with the page_alloc
      mechanism.
      
      Using the trace-cmd utility to record all events (causing trace-cmd to
      require reading lots of pages from the ring buffer, and calling
      ring_buffer_alloc/free_read_page() several times), and also assigning a
      stack trace trigger to the mm_page_alloc event, we can see how many times
      the ring_buffer_alloc_read_page() needed to allocate a page for the ring
      buffer.
      
      Before this change:
      
        # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
        # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
        9968
      
      After this change:
      
        # trace-cmd record -e all -e mem_page_alloc -R stacktrace sleep 1
        # trace-cmd report |grep ring_buffer_alloc_read_page | wc -l
        4
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      73a757e6
    • D
      mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash · 71389703
      Dan Williams 提交于
      The x86 conversion to the generic GUP code included a small change which causes
      crashes and data corruption in the pmem code - not good.
      
      The root cause is that the /dev/pmem driver code implicitly relies on the x86
      get_user_pages() implementation doing a get_page() on the page refcount, because
      get_page() does a get_zone_device_page() which properly refcounts pmem's separate
      page struct arrays that are not present in the regular page struct structures.
      (The pmem driver does this because it can cover huge memory areas.)
      
      But the x86 conversion to the generic GUP code changed the get_page() to
      page_cache_get_speculative() which is faster but doesn't do the
      get_zone_device_page() call the pmem code relies on.
      
      One way to solve the regression would be to change the generic GUP code to use
      get_page(), but that would slow things down a bit and punish other generic-GUP
      using architectures for an x86-ism they did not care about. (Arguably the pmem
      driver was probably not working reliably for them: but nvdimm is an Intel
      feature, so non-x86 exposure is probably still limited.)
      
      So restructure the pmem code's interface with the MM instead: get rid of the
      get/put_zone_device_page() distinction, integrate put_zone_device_page() into
      __put_page() and and restructure the pmem completion-wait and teardown machinery:
      
      Kirill points out that the calls to {get,put}_dev_pagemap() can be
      removed from the mm fast path if we take a single get_dev_pagemap()
      reference to signify that the page is alive and use the final put of the
      page to drop that reference.
      
      This does require some care to make sure that any waits for the
      percpu_ref to drop to zero occur *after* devm_memremap_page_release(),
      since it now maintains its own elevated reference.
      
      This speeds up things while also making the pmem refcounting more robust going
      forward.
      Suggested-by: NKirill Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NKirill Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/149339998297.24933.1129582806028305912.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      71389703
  5. 29 4月, 2017 3 次提交
    • Z
      cgroup: avoid attaching a cgroup root to two different superblocks, take 2 · 9732adc5
      Zefan Li 提交于
      Commit bfb0b80d ("cgroup: avoid attaching a cgroup root to two
      different superblocks") is broken.  Now we try to fix the race by
      delaying the initialization of cgroup root refcnt until a superblock
      has been allocated.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reported-by: NAndrei Vagin <avagin@virtuozzo.com>
      Tested-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NZefan Li <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9732adc5
    • H
      bpf: bpf_lock on kallsysms doesn't need to be irqsave · d24f7c7f
      Hannes Frederic Sowa 提交于
      Hannes rightfully spotted that the bpf_lock doesn't need to be
      irqsave variant. We never perform any such updates where this
      would be necessary (neither right now nor in future), therefore
      relax this further.
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d24f7c7f
    • T
      cgroup: fix spurious warnings on cgroup_is_dead() from cgroup_sk_alloc() · a590b90d
      Tejun Heo 提交于
      cgroup_get() expected to be called only on live cgroups and triggers
      warning on a dead cgroup; however, cgroup_sk_alloc() may be called
      while cloning a socket which is left in an empty and removed cgroup
      and thus may legitimately duplicate its reference on a dead cgroup.
      This currently triggers the following warning spuriously.
      
       WARNING: CPU: 14 PID: 0 at kernel/cgroup.c:490 cgroup_get+0x55/0x60
       ...
        [<ffffffff8107e123>] __warn+0xd3/0xf0
        [<ffffffff8107e20e>] warn_slowpath_null+0x1e/0x20
        [<ffffffff810ff465>] cgroup_get+0x55/0x60
        [<ffffffff81106061>] cgroup_sk_alloc+0x51/0xe0
        [<ffffffff81761beb>] sk_clone_lock+0x2db/0x390
        [<ffffffff817cce06>] inet_csk_clone_lock+0x16/0xc0
        [<ffffffff817e8173>] tcp_create_openreq_child+0x23/0x4b0
        [<ffffffff818601a1>] tcp_v6_syn_recv_sock+0x91/0x670
        [<ffffffff817e8b16>] tcp_check_req+0x3a6/0x4e0
        [<ffffffff81861ba3>] tcp_v6_rcv+0x693/0xa00
        [<ffffffff81837429>] ip6_input_finish+0x59/0x3e0
        [<ffffffff81837cb2>] ip6_input+0x32/0xb0
        [<ffffffff81837387>] ip6_rcv_finish+0x57/0xa0
        [<ffffffff81837ac8>] ipv6_rcv+0x318/0x4d0
        [<ffffffff817778c7>] __netif_receive_skb_core+0x2d7/0x9a0
        [<ffffffff81777fa6>] __netif_receive_skb+0x16/0x70
        [<ffffffff81778023>] netif_receive_skb_internal+0x23/0x80
        [<ffffffff817787d8>] napi_gro_frags+0x208/0x270
        [<ffffffff8168a9ec>] mlx4_en_process_rx_cq+0x74c/0xf40
        [<ffffffff8168b270>] mlx4_en_poll_rx_cq+0x30/0x90
        [<ffffffff81778b30>] net_rx_action+0x210/0x350
        [<ffffffff8188c426>] __do_softirq+0x106/0x2c7
        [<ffffffff81082bad>] irq_exit+0x9d/0xa0 [<ffffffff8188c0e4>] do_IRQ+0x54/0xd0
        [<ffffffff8188a63f>] common_interrupt+0x7f/0x7f <EOI>
        [<ffffffff8173d7e7>] cpuidle_enter+0x17/0x20
        [<ffffffff810bdfd9>] cpu_startup_entry+0x2a9/0x2f0
        [<ffffffff8103edd1>] start_secondary+0xf1/0x100
      
      This patch renames the existing cgroup_get() with the dead cgroup
      warning to cgroup_get_live() after cgroup_kn_lock_live() and
      introduces the new cgroup_get() which doesn't check whether the cgroup
      is live or dead.
      
      All existing cgroup_get() users except for cgroup_sk_alloc() are
      converted to use cgroup_get_live().
      
      Fixes: d979a39d ("cgroup: duplicate cgroup reference when cloning sockets")
      Cc: stable@vger.kernel.org # v4.5+
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Reported-by: NChris Mason <clm@fb.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a590b90d
  6. 27 4月, 2017 1 次提交
  7. 25 4月, 2017 3 次提交
  8. 22 4月, 2017 3 次提交
    • E
      signal: Make kill_proc_info static · 6c478ae9
      Eric W. Biederman 提交于
      There are no users outside of signal.c so make the function static so
      the compiler and other developers have that information.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      6c478ae9
    • E
      rlimit: Properly call security_task_setrlimit · cad4ea54
      Eric W. Biederman 提交于
      Modify do_prlimit to call security_task_setrlimit passing the task
      whose rlimit we are changing not the tsk->group_leader.
      
      In general this should not matter as the lsms implementing
      security_task_setrlimit apparmor and selinux both examine the
      task->cred to see what should be allowed on the destination task.
      
      That task->cred is shared between tasks created with CLONE_THREAD
      unless thread keyrings are in play, in which case both apparmor and
      selinux create duplicate security contexts.
      
      So the only time when it will matter which thread is passed to
      security_task_setrlimit is if one of the threads of a process performs
      an operation that changes only it's credentials.  At which point if a
      thread has done that we don't want to hide that information from the
      lsms.
      
      So fix the call of security_task_setrlimit.  With the removal
      of tsk->group_leader this makes the code slightly faster,
      more comprehensible and maintainable.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      cad4ea54
    • D
      net: Remove NET_CORE_BUDGET_USECS from sysctl binary interface. · 1f4407e2
      David S. Miller 提交于
      We are not supposed to add new entries to this thing
      any more.
      
      Thanks to Eric Dumazet for noticing this.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f4407e2