1. 03 4月, 2018 21 次提交
  2. 26 3月, 2018 1 次提交
  3. 06 3月, 2018 2 次提交
  4. 05 3月, 2018 2 次提交
  5. 03 3月, 2018 1 次提交
    • M
      signals: Move put_compat_sigset to compat.h to silence hardened usercopy · fde9fc76
      Matt Redfearn 提交于
      Since commit afcc90f8 ("usercopy: WARN() on slab cache usercopy
      region violations"), MIPS systems booting with a compat root filesystem
      emit a warning when copying compat siginfo to userspace:
      
      WARNING: CPU: 0 PID: 953 at mm/usercopy.c:81 usercopy_warn+0x98/0xe8
      Bad or missing usercopy whitelist? Kernel memory exposure attempt
      detected from SLAB object 'task_struct' (offset 1432, size 16)!
      Modules linked in:
      CPU: 0 PID: 953 Comm: S01logging Not tainted 4.16.0-rc2 #10
      Stack : ffffffff808c0000 0000000000000000 0000000000000001 65ac85163f3bdc4a
      	65ac85163f3bdc4a 0000000000000000 90000000ff667ab8 ffffffff808c0000
      	00000000000003f8 ffffffff808d0000 00000000000000d1 0000000000000000
      	000000000000003c 0000000000000000 ffffffff808c8ca8 ffffffff808d0000
      	ffffffff808d0000 ffffffff80810000 fffffc0000000000 ffffffff80785c30
      	0000000000000009 0000000000000051 90000000ff667eb0 90000000ff667db0
      	000000007fe0d938 0000000000000018 ffffffff80449958 0000000020052798
      	ffffffff808c0000 90000000ff664000 90000000ff667ab0 00000000100c0000
      	ffffffff80698810 0000000000000000 0000000000000000 0000000000000000
      	0000000000000000 0000000000000000 ffffffff8010d02c 65ac85163f3bdc4a
      	...
      Call Trace:
      [<ffffffff8010d02c>] show_stack+0x9c/0x130
      [<ffffffff80698810>] dump_stack+0x90/0xd0
      [<ffffffff80137b78>] __warn+0x100/0x118
      [<ffffffff80137bdc>] warn_slowpath_fmt+0x4c/0x70
      [<ffffffff8021e4a8>] usercopy_warn+0x98/0xe8
      [<ffffffff8021e68c>] __check_object_size+0xfc/0x250
      [<ffffffff801bbfb8>] put_compat_sigset+0x30/0x88
      [<ffffffff8011af24>] setup_rt_frame_n32+0xc4/0x160
      [<ffffffff8010b8b4>] do_signal+0x19c/0x230
      [<ffffffff8010c408>] do_notify_resume+0x60/0x78
      [<ffffffff80106f50>] work_notifysig+0x10/0x18
      ---[ end trace 88fffbf69147f48a ]---
      
      Commit 5905429a ("fork: Provide usercopy whitelisting for
      task_struct") noted that:
      
      "While the blocked and saved_sigmask fields of task_struct are copied to
      userspace (via sigmask_to_save() and setup_rt_frame()), it is always
      copied with a static length (i.e. sizeof(sigset_t))."
      
      However, this is not true in the case of compat signals, whose sigset
      is copied by put_compat_sigset and receives size as an argument.
      
      At most call sites, put_compat_sigset is copying a sigset from the
      current task_struct. This triggers a warning when
      CONFIG_HARDENED_USERCOPY is active. However, by marking this function as
      static inline, the warning can be avoided because in all of these cases
      the size is constant at compile time, which is allowed. The only site
      where this is not the case is handling the rt_sigpending syscall, but
      there the copy is being made from a stack local variable so does not
      trigger the warning.
      
      Move put_compat_sigset to compat.h, and mark it static inline. This
      fixes the WARN on MIPS.
      
      Fixes: afcc90f8 ("usercopy: WARN() on slab cache usercopy region violations")
      Signed-off-by: NMatt Redfearn <matt.redfearn@mips.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: "Dmitry V . Levin" <ldv@altlinux.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/18639/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      fde9fc76
  6. 01 3月, 2018 1 次提交
  7. 28 2月, 2018 1 次提交
  8. 27 2月, 2018 4 次提交
    • D
      dax: fix vma_is_fsdax() helper · 230f5a89
      Dan Williams 提交于
      Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use
      S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on
      device-dax instances when those are meant to be explicitly allowed.
      
      Fixes: 2bb6d283 ("mm: introduce get_user_pages_longterm")
      Cc: <stable@vger.kernel.org>
      Reported-by: NGerd Rausch <gerd.rausch@oracle.com>
      Acked-by: NJane Chu <jane.chu@oracle.com>
      Reported-by: NHaozhong Zhang <haozhong.zhang@intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      230f5a89
    • J
      genhd: Fix BUG in blkdev_open() · 56c0908c
      Jan Kara 提交于
      When two blkdev_open() calls for a partition race with device removal
      and recreation, we can hit BUG_ON(!bd_may_claim(bdev, whole, holder)) in
      blkdev_open(). The race can happen as follows:
      
      CPU0				CPU1			CPU2
      							del_gendisk()
      							  bdev_unhash_inode(part1);
      
      blkdev_open(part1, O_EXCL)	blkdev_open(part1, O_EXCL)
        bdev = bd_acquire()		  bdev = bd_acquire()
        blkdev_get(bdev)
          bd_start_claiming(bdev)
            - finds old inode 'whole'
            bd_prepare_to_claim() -> 0
      							  bdev_unhash_inode(whole);
      							<device removed>
      							<new device under same
      							 number created>
      				  blkdev_get(bdev);
      				    bd_start_claiming(bdev)
      				      - finds new inode 'whole'
      				      bd_prepare_to_claim()
      					- this also succeeds as we have
      					  different 'whole' here...
      					- bad things happen now as we
      					  have two exclusive openers of
      					  the same bdev
      
      The problem here is that block device opens can see various intermediate
      states while gendisk is shutting down and then being recreated.
      
      We fix the problem by introducing new lookup_sem in gendisk that
      synchronizes gendisk deletion with get_gendisk() and furthermore by
      making sure that get_gendisk() does not return gendisk that is being (or
      has been) deleted. This makes sure that once we ever manage to look up
      newly created bdev inode, we are also guaranteed that following
      get_gendisk() will either return failure (and we fail open) or it
      returns gendisk for the new device and following bdget_disk() will
      return new bdev inode (i.e., blkdev_open() follows the path as if it is
      completely run after new device is created).
      Reported-and-analyzed-by: NHou Tao <houtao1@huawei.com>
      Tested-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      56c0908c
    • J
      genhd: Add helper put_disk_and_module() · 9df6c299
      Jan Kara 提交于
      Add a proper counterpart to get_disk_and_module() -
      put_disk_and_module(). Currently it is opencoded in several places.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9df6c299
    • J
      genhd: Rename get_disk() to get_disk_and_module() · 3079c22e
      Jan Kara 提交于
      Rename get_disk() to get_disk_and_module() to make sure what the
      function does. It's not a great name but at least it is now clear that
      put_disk() is not it's counterpart.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3079c22e
  9. 24 2月, 2018 2 次提交
  10. 23 2月, 2018 2 次提交
  11. 22 2月, 2018 3 次提交
    • A
      bug.h: work around GCC PR82365 in BUG() · 173a3efd
      Arnd Bergmann 提交于
      Looking at functions with large stack frames across all architectures
      led me discovering that BUG() suffers from the same problem as
      fortify_panic(), which I've added a workaround for already.
      
      In short, variables that go out of scope by calling a noreturn function
      or __builtin_unreachable() keep using stack space in functions
      afterwards.
      
      A workaround that was identified is to insert an empty assembler
      statement just before calling the function that doesn't return.  I'm
      adding a macro "barrier_before_unreachable()" to document this, and
      insert calls to that in all instances of BUG() that currently suffer
      from this problem.
      
      The files that saw the largest change from this had these frame sizes
      before, and much less with my patch:
      
        fs/ext4/inode.c:82:1: warning: the frame size of 1672 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/ext4/namei.c:434:1: warning: the frame size of 904 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/ext4/super.c:2279:1: warning: the frame size of 1160 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/ext4/xattr.c:146:1: warning: the frame size of 1168 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/f2fs/inode.c:152:1: warning: the frame size of 1424 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_core.c:1195:1: warning: the frame size of 1068 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_core.c:395:1: warning: the frame size of 1084 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_ftp.c:298:1: warning: the frame size of 928 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_ftp.c:418:1: warning: the frame size of 908 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_lblcr.c:718:1: warning: the frame size of 960 bytes is larger than 800 bytes [-Wframe-larger-than=]
        drivers/net/xen-netback/netback.c:1500:1: warning: the frame size of 1088 bytes is larger than 800 bytes [-Wframe-larger-than=]
      
      In case of ARC and CRIS, it turns out that the BUG() implementation
      actually does return (or at least the compiler thinks it does),
      resulting in lots of warnings about uninitialized variable use and
      leaving noreturn functions, such as:
      
        block/cfq-iosched.c: In function 'cfq_async_queue_prio':
        block/cfq-iosched.c:3804:1: error: control reaches end of non-void function [-Werror=return-type]
        include/linux/dmaengine.h: In function 'dma_maxpq':
        include/linux/dmaengine.h:1123:1: error: control reaches end of non-void function [-Werror=return-type]
      
      This makes them call __builtin_trap() instead, which should normally
      dump the stack and kill the current process, like some of the other
      architectures already do.
      
      I tried adding barrier_before_unreachable() to panic() and
      fortify_panic() as well, but that had very little effect, so I'm not
      submitting that patch.
      
      Vineet said:
      
      : For ARC, it is double win.
      :
      : 1. Fixes 3 -Wreturn-type warnings
      :
      : | ../net/core/ethtool.c:311:1: warning: control reaches end of non-void function
      : [-Wreturn-type]
      : | ../kernel/sched/core.c:3246:1: warning: control reaches end of non-void function
      : [-Wreturn-type]
      : | ../include/linux/sunrpc/svc_xprt.h:180:1: warning: control reaches end of
      : non-void function [-Wreturn-type]
      :
      : 2.  bloat-o-meter reports code size improvements as gcc elides the
      :    generated code for stack return.
      
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365
      Link: http://lkml.kernel.org/r/20171219114112.939391-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: Vineet Gupta <vgupta@synopsys.com>	[arch/arc]
      Tested-by: Vineet Gupta <vgupta@synopsys.com>	[arch/arc]
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christopher Li <sparse@chrisli.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      173a3efd
    • S
      mm, mlock, vmscan: no more skipping pagevecs · 9c4e6b1a
      Shakeel Butt 提交于
      When a thread mlocks an address space backed either by file pages which
      are currently not present in memory or swapped out anon pages (not in
      swapcache), a new page is allocated and added to the local pagevec
      (lru_add_pvec), I/O is triggered and the thread then sleeps on the page.
      On I/O completion, the thread can wake on a different CPU, the mlock
      syscall will then sets the PageMlocked() bit of the page but will not be
      able to put that page in unevictable LRU as the page is on the pagevec
      of a different CPU.  Even on drain, that page will go to evictable LRU
      because the PageMlocked() bit is not checked on pagevec drain.
      
      The page will eventually go to right LRU on reclaim but the LRU stats
      will remain skewed for a long time.
      
      This patch puts all the pages, even unevictable, to the pagevecs and on
      the drain, the pages will be added on their LRUs correctly by checking
      their evictability.  This resolves the mlocked pages on pagevec of other
      CPUs issue because when those pagevecs will be drained, the mlocked file
      pages will go to unevictable LRU.  Also this makes the race with munlock
      easier to resolve because the pagevec drains happen in LRU lock.
      
      However there is still one place which makes a page evictable and does
      PageLRU check on that page without LRU lock and needs special attention.
      TestClearPageMlocked() and isolate_lru_page() in clear_page_mlock().
      
      	#0: __pagevec_lru_add_fn	#1: clear_page_mlock
      
      	SetPageLRU()			if (!TestClearPageMlocked())
      					  return
      	smp_mb() // <--required
      					// inside does PageLRU
      	if (!PageMlocked())		if (isolate_lru_page())
      	  move to evictable LRU		  putback_lru_page()
      	else
      	  move to unevictable LRU
      
      In '#1', TestClearPageMlocked() provides full memory barrier semantics
      and thus the PageLRU check (inside isolate_lru_page) can not be
      reordered before it.
      
      In '#0', without explicit memory barrier, the PageMlocked() check can be
      reordered before SetPageLRU().  If that happens, '#0' can put a page in
      unevictable LRU and '#1' might have just cleared the Mlocked bit of that
      page but fails to isolate as PageLRU fails as '#0' still hasn't set
      PageLRU bit of that page.  That page will be stranded on the unevictable
      LRU.
      
      There is one (good) side effect though.  Without this patch, the pages
      allocated for System V shared memory segment are added to evictable LRUs
      even after shmctl(SHM_LOCK) on that segment.  This patch will correctly
      put such pages to unevictable LRU.
      
      Link: http://lkml.kernel.org/r/20171121211241.18877-1-shakeelb@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c4e6b1a
    • J
      mm: memcontrol: fix NR_WRITEBACK leak in memcg and system stats · c3cc3911
      Johannes Weiner 提交于
      After commit a983b5eb ("mm: memcontrol: fix excessive complexity in
      memory.stat reporting"), we observed slowly upward creeping NR_WRITEBACK
      counts over the course of several days, both the per-memcg stats as well
      as the system counter in e.g.  /proc/meminfo.
      
      The conversion from full per-cpu stat counts to per-cpu cached atomic
      stat counts introduced an irq-unsafe RMW operation into the updates.
      
      Most stat updates come from process context, but one notable exception
      is the NR_WRITEBACK counter.  While writebacks are issued from process
      context, they are retired from (soft)irq context.
      
      When writeback completions interrupt the RMW counter updates of new
      writebacks being issued, the decs from the completions are lost.
      
      Since the global updates are routed through the joint lruvec API, both
      the memcg counters as well as the system counters are affected.
      
      This patch makes the joint stat and event API irq safe.
      
      Link: http://lkml.kernel.org/r/20180203082353.17284-1-hannes@cmpxchg.org
      Fixes: a983b5eb ("mm: memcontrol: fix excessive complexity in memory.stat reporting")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Debugged-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NRik van Riel <riel@surriel.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c3cc3911