1. 23 6月, 2018 2 次提交
    • J
      bdi: Fix another oops in wb_workfn() · 3ee7e869
      Jan Kara 提交于
      syzbot is reporting NULL pointer dereference at wb_workfn() [1] due to
      wb->bdi->dev being NULL. And Dmitry confirmed that wb->state was
      WB_shutting_down after wb->bdi->dev became NULL. This indicates that
      unregister_bdi() failed to call wb_shutdown() on one of wb objects.
      
      The problem is in cgwb_bdi_unregister() which does cgwb_kill() and thus
      drops bdi's reference to wb structures before going through the list of
      wbs again and calling wb_shutdown() on each of them. This way the loop
      iterating through all wbs can easily miss a wb if that wb has already
      passed through cgwb_remove_from_bdi_list() called from wb_shutdown()
      from cgwb_release_workfn() and as a result fully shutdown bdi although
      wb_workfn() for this wb structure is still running. In fact there are
      also other ways cgwb_bdi_unregister() can race with
      cgwb_release_workfn() leading e.g. to use-after-free issues:
      
      CPU1                            CPU2
                                      cgwb_bdi_unregister()
                                        cgwb_kill(*slot);
      
      cgwb_release()
        queue_work(cgwb_release_wq, &wb->release_work);
      cgwb_release_workfn()
                                        wb = list_first_entry(&bdi->wb_list, ...)
                                        spin_unlock_irq(&cgwb_lock);
        wb_shutdown(wb);
        ...
        kfree_rcu(wb, rcu);
                                        wb_shutdown(wb); -> oops use-after-free
      
      We solve these issues by synchronizing writeback structure shutdown from
      cgwb_bdi_unregister() with cgwb_release_workfn() using a new mutex. That
      way we also no longer need synchronization using WB_shutting_down as the
      mutex provides it for CONFIG_CGROUP_WRITEBACK case and without
      CONFIG_CGROUP_WRITEBACK wb_shutdown() can be called only once from
      bdi_unregister().
      Reported-by: Nsyzbot <syzbot+4a7438e774b21ddd8eca@syzkaller.appspotmail.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3ee7e869
    • W
      rseq: Avoid infinite recursion when delivering SIGSEGV · 784e0300
      Will Deacon 提交于
      When delivering a signal to a task that is using rseq, we call into
      __rseq_handle_notify_resume() so that the registers pushed in the
      sigframe are updated to reflect the state of the restartable sequence
      (for example, ensuring that the signal returns to the abort handler if
      necessary).
      
      However, if the rseq management fails due to an unrecoverable fault when
      accessing userspace or certain combinations of RSEQ_CS_* flags, then we
      will attempt to deliver a SIGSEGV. This has the potential for infinite
      recursion if the rseq code continuously fails on signal delivery.
      
      Avoid this problem by using force_sigsegv() instead of force_sig(), which
      is explicitly designed to reset the SEGV handler to SIG_DFL in the case
      of a recursive fault. In doing so, remove rseq_signal_deliver() from the
      internal rseq API and have an optional struct ksignal * parameter to
      rseq_handle_notify_resume() instead.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: peterz@infradead.org
      Cc: paulmck@linux.vnet.ibm.com
      Cc: boqun.feng@gmail.com
      Link: https://lkml.kernel.org/r/1529664307-983-1-git-send-email-will.deacon@arm.com
      784e0300
  2. 22 6月, 2018 2 次提交
  3. 21 6月, 2018 3 次提交
    • W
      kernel.h: Fix a typo in comment · 8730662d
      Wei Wang 提交于
      Signed-off-by: NWei Wang <wvw@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Crt Mori <cmo@melexis.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: gregkh@linuxfoundation.org
      Cc: wei.vince.wang@gmail.com
      Link: https://lkml.kernel.org/lkml/20180424212241.16013-1-wvw@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8730662d
    • M
      x86/platform/UV: Add adjustable set memory block size function · f642fb58
      mike.travis@hpe.com 提交于
      Add a new function to "adjust" the current fixed UV memory block size
      of 2GB so it can be changed to a different physical boundary.  This is
      out of necessity so arch dependent code can accommodate specific BIOS
      requirements which can align these new PMEM modules at less than the
      default boundaries.
      
      A "set order" type of function was used to insure that the memory block
      size will be a power of two value without requiring a validity check.
      64GB was chosen as the upper limit for memory block size values to
      accommodate upcoming 4PB systems which have 6 more bits of physical
      address space (46 becoming 52).
      Signed-off-by: NMike Travis <mike.travis@hpe.com>
      Reviewed-by: NAndrew Banman <andrew.banman@hpe.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russ Anderson <russ.anderson@hpe.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dan.j.williams@intel.com
      Cc: jgross@suse.com
      Cc: kirill.shutemov@linux.intel.com
      Cc: mhocko@suse.com
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/lkml/20180524201711.609546602@stormcage.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f642fb58
    • M
      rseq/cleanup: Do not abort rseq c.s. in child on fork() · 9a789fcf
      Mathieu Desnoyers 提交于
      Considering that we explicitly forbid system calls in rseq critical
      sections, it is not valid to issue a fork or clone system call within a
      rseq critical section, so rseq_fork() is not required to restart an
      active rseq c.s. in the child process.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ben Maurer <bmaurer@fb.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Lameter <cl@linux.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-api@vger.kernel.org
      Cc: linux-kselftest@vger.kernel.org
      Link: https://lore.kernel.org/lkml/20180619133230.4087-4-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9a789fcf
  4. 19 6月, 2018 1 次提交
  5. 17 6月, 2018 2 次提交
    • S
      firmware: dmi: Add access to the SKU ID string · b23908d3
      Simon Glass 提交于
      This is used in some systems from user space for determining the identity
      of the device.
      
      Expose this as a file so that that user-space tools don't need to read
      from /sys/firmware/dmi/tables/DMI
      Signed-off-by: NSimon Glass <sjg@chromium.org>
      Signed-off-by: NJean Delvare <jdelvare@suse.de>
      b23908d3
    • D
      atm: Preserve value of skb->truesize when accounting to vcc · 9bbe60a6
      David Woodhouse 提交于
      ATM accounts for in-flight TX packets in sk_wmem_alloc of the VCC on
      which they are to be sent. But it doesn't take ownership of those
      packets from the sock (if any) which originally owned them. They should
      remain owned by their actual sender until they've left the box.
      
      There's a hack in pskb_expand_head() to avoid adjusting skb->truesize
      for certain skbs, precisely to avoid messing up sk_wmem_alloc
      accounting. Ideally that hack would cover the ATM use case too, but it
      doesn't — skbs which aren't owned by any sock, for example PPP control
      frames, still get their truesize adjusted when the low-level ATM driver
      adds headroom.
      
      This has always been an issue, it seems. The truesize of a packet
      increases, and sk_wmem_alloc on the VCC goes negative. But this wasn't
      for normal traffic, only for control frames. So I think we just got away
      with it, and we probably needed to send 2GiB of LCP echo frames before
      the misaccounting would ever have caused a problem and caused
      atm_may_send() to start refusing packets.
      
      Commit 14afee4b ("net: convert sock.sk_wmem_alloc from atomic_t to
      refcount_t") did exactly what it was intended to do, and turned this
      mostly-theoretical problem into a real one, causing PPPoATM to fail
      immediately as sk_wmem_alloc underflows and atm_may_send() *immediately*
      starts refusing to allow new packets.
      
      The least intrusive solution to this problem is to stash the value of
      skb->truesize that was accounted to the VCC, in a new member of the
      ATM_SKB(skb) structure. Then in atm_pop_raw() subtract precisely that
      value instead of the then-current value of skb->truesize.
      
      Fixes: 158f323b ("net: adjust skb->truesize in pskb_expand_head()")
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Tested-by: NKevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bbe60a6
  6. 16 6月, 2018 6 次提交
  7. 15 6月, 2018 5 次提交
  8. 14 6月, 2018 4 次提交
    • C
      blk-mq: remove blk_mq_tagset_iter · e6c3456a
      Christoph Hellwig 提交于
      Unused now that nvme stopped using it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      e6c3456a
    • C
      blk-mq: don't time out requests again that are in the timeout handler · da661267
      Christoph Hellwig 提交于
      We can currently call the timeout handler again on a request that has
      already been handed over to the timeout handler.  Prevent that with a new
      flag.
      
      Fixes: 12f5b931 ("blk-mq: Remove generation seqeunce")
      Reported-by: NAndrew Randrianasulu <randrianasulu@gmail.com>
      Tested-by: NAndrew Randrianasulu <randrianasulu@gmail.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      da661267
    • C
      dma-mapping: move all DMA mapping code to kernel/dma · cf65a0f6
      Christoph Hellwig 提交于
      Currently the code is split over various files with dma- prefixes in the
      lib/ and drives/base directories, and the number of files keeps growing.
      Move them into a single directory to keep the code together and remove
      the file name prefixes.  To match the irq infrastructure this directory
      is placed under the kernel/ directory.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      cf65a0f6
    • L
      Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables · 050e9baa
      Linus Torvalds 提交于
      The changes to automatically test for working stack protector compiler
      support in the Kconfig files removed the special STACKPROTECTOR_AUTO
      option that picked the strongest stack protector that the compiler
      supported.
      
      That was all a nice cleanup - it makes no sense to have the AUTO case
      now that the Kconfig phase can just determine the compiler support
      directly.
      
      HOWEVER.
      
      It also meant that doing "make oldconfig" would now _disable_ the strong
      stackprotector if you had AUTO enabled, because in a legacy config file,
      the sane stack protector configuration would look like
      
        CONFIG_HAVE_CC_STACKPROTECTOR=y
        # CONFIG_CC_STACKPROTECTOR_NONE is not set
        # CONFIG_CC_STACKPROTECTOR_REGULAR is not set
        # CONFIG_CC_STACKPROTECTOR_STRONG is not set
        CONFIG_CC_STACKPROTECTOR_AUTO=y
      
      and when you ran this through "make oldconfig" with the Kbuild changes,
      it would ask you about the regular CONFIG_CC_STACKPROTECTOR (that had
      been renamed from CONFIG_CC_STACKPROTECTOR_REGULAR to just
      CONFIG_CC_STACKPROTECTOR), but it would think that the STRONG version
      used to be disabled (because it was really enabled by AUTO), and would
      disable it in the new config, resulting in:
      
        CONFIG_HAVE_CC_STACKPROTECTOR=y
        CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
        CONFIG_CC_STACKPROTECTOR=y
        # CONFIG_CC_STACKPROTECTOR_STRONG is not set
        CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
      
      That's dangerously subtle - people could suddenly find themselves with
      the weaker stack protector setup without even realizing.
      
      The solution here is to just rename not just the old RECULAR stack
      protector option, but also the strong one.  This does that by just
      removing the CC_ prefix entirely for the user choices, because it really
      is not about the compiler support (the compiler support now instead
      automatially impacts _visibility_ of the options to users).
      
      This results in "make oldconfig" actually asking the user for their
      choice, so that we don't have any silent subtle security model changes.
      The end result would generally look like this:
      
        CONFIG_HAVE_CC_STACKPROTECTOR=y
        CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
        CONFIG_STACKPROTECTOR=y
        CONFIG_STACKPROTECTOR_STRONG=y
        CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
      
      where the "CC_" versions really are about internal compiler
      infrastructure, not the user selections.
      Acked-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      050e9baa
  9. 13 6月, 2018 4 次提交
    • K
      mm: Introduce kvcalloc() · 1c542f38
      Kees Cook 提交于
      The kv*alloc()-family was missing kvcalloc(). Adding this allows for
      2-argument multiplication conversions of kvzalloc(a * b, ...) into
      kvcalloc(a, b, ...).
      Signed-off-by: NKees Cook <keescook@chromium.org>
      1c542f38
    • A
      locking/refcounts: Implement refcount_dec_and_lock_irqsave() · 7ea959c4
      Anna-Maria Gleixner 提交于
      There are in-tree users of refcount_dec_and_lock() which must acquire the
      spin lock with interrupts disabled. To workaround the lack of an irqsave
      variant of refcount_dec_and_lock() they use local_irq_save() at the call
      site. This causes extra code and creates in some places unneeded long
      interrupt disabled times. These places need also extra treatment for
      PREEMPT_RT due to the disconnect of the irq disabling and the lock
      function.
      
      Implement the missing irqsave variant of the function.
      Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r20180612161621.22645-4-bigeasy@linutronix.de
      
      [bigeasy: s@atomic_dec_and_lock@refcount_dec_and_lock@g]
      7ea959c4
    • A
      atomic: Add irqsave variant of atomic_dec_and_lock() · ccfbb5be
      Anna-Maria Gleixner 提交于
      There are in-tree users of atomic_dec_and_lock() which must acquire the
      spin lock with interrupts disabled. To workaround the lack of an irqsave
      variant of atomic_dec_and_lock() they use local_irq_save() at the call
      site. This causes extra code and creates in some places unneeded long
      interrupt disabled times. These places need also extra treatment for
      PREEMPT_RT due to the disconnect of the irq disabling and the lock
      function.
      
      Implement the missing irqsave variant of the function.
      Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r20180612161621.22645-3-bigeasy@linutronix.de
      ccfbb5be
    • P
      netfilter: fix null-ptr-deref in nf_nat_decode_session · 155fb5c5
      Prashant Bhole 提交于
      Add null check for nat_hook in nf_nat_decode_session()
      
      [  195.648098] UBSAN: Undefined behaviour in ./include/linux/netfilter.h:348:14
      [  195.651366] BUG: KASAN: null-ptr-deref in __xfrm_policy_check+0x208/0x1d70
      [  195.653888] member access within null pointer of type 'struct nf_nat_hook'
      [  195.653896] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.17.0-rc6+ #5
      [  195.656320] Read of size 8 at addr 0000000000000008 by task ping/2469
      [  195.658715] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [  195.658721] Call Trace:
      [  195.661087]
      [  195.669341]  <IRQ>
      [  195.670574]  dump_stack+0xc6/0x150
      [  195.672156]  ? dump_stack_print_info.cold.0+0x1b/0x1b
      [  195.674121]  ? ubsan_prologue+0x31/0x92
      [  195.676546]  ubsan_epilogue+0x9/0x49
      [  195.678159]  handle_null_ptr_deref+0x11a/0x130
      [  195.679800]  ? sprint_OID+0x1a0/0x1a0
      [  195.681322]  __ubsan_handle_type_mismatch_v1+0xd5/0x11d
      [  195.683146]  ? ubsan_prologue+0x92/0x92
      [  195.684642]  __xfrm_policy_check+0x18ef/0x1d70
      [  195.686294]  ? rt_cache_valid+0x118/0x180
      [  195.687804]  ? __xfrm_route_forward+0x410/0x410
      [  195.689463]  ? fib_multipath_hash+0x700/0x700
      [  195.691109]  ? kvm_sched_clock_read+0x23/0x40
      [  195.692805]  ? pvclock_clocksource_read+0xf6/0x280
      [  195.694409]  ? graph_lock+0xa0/0xa0
      [  195.695824]  ? pvclock_clocksource_read+0xf6/0x280
      [  195.697508]  ? pvclock_read_flags+0x80/0x80
      [  195.698981]  ? kvm_sched_clock_read+0x23/0x40
      [  195.700347]  ? sched_clock+0x5/0x10
      [  195.701525]  ? sched_clock_cpu+0x18/0x1a0
      [  195.702846]  tcp_v4_rcv+0x1d32/0x1de0
      [  195.704115]  ? lock_repin_lock+0x70/0x270
      [  195.707072]  ? pvclock_read_flags+0x80/0x80
      [  195.709302]  ? tcp_v4_early_demux+0x4b0/0x4b0
      [  195.711833]  ? lock_acquire+0x195/0x380
      [  195.714222]  ? ip_local_deliver_finish+0xfc/0x770
      [  195.716967]  ? raw_rcv+0x2b0/0x2b0
      [  195.718856]  ? lock_release+0xa00/0xa00
      [  195.720938]  ip_local_deliver_finish+0x1b9/0x770
      [...]
      
      Fixes: 2c205dd3 ("netfilter: add struct nf_nat_hook and use it")
      Signed-off-by: NPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      155fb5c5
  10. 12 6月, 2018 2 次提交
  11. 11 6月, 2018 1 次提交
  12. 10 6月, 2018 1 次提交
  13. 08 6月, 2018 7 次提交