1. 13 8月, 2018 1 次提交
    • L
      init: rename and re-order boot_cpu_state_init() · b5b1404d
      Linus Torvalds 提交于
      This is purely a preparatory patch for upcoming changes during the 4.19
      merge window.
      
      We have a function called "boot_cpu_state_init()" that isn't really
      about the bootup cpu state: that is done much earlier by the similarly
      named "boot_cpu_init()" (note lack of "state" in name).
      
      This function initializes some hotplug CPU state, and needs to run after
      the percpu data has been properly initialized.  It even has a comment to
      that effect.
      
      Except it _doesn't_ actually run after the percpu data has been properly
      initialized.  On x86 it happens to do that, but on at least arm and
      arm64, the percpu base pointers are initialized by the arch-specific
      'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().
      
      This had some unexpected results, and in particular we have a patch
      pending for the merge window that did the obvious cleanup of using
      'this_cpu_write()' in the cpu hotplug init code:
      
        -       per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
        +       this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
      
      which is obviously the right thing to do.  Except because of the
      ordering issue, it actually failed miserably and unexpectedly on arm64.
      
      So this just fixes the ordering, and changes the name of the function to
      be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
      hotplug state, because the core CPU state was supposed to have already
      been done earlier.
      
      Marked for stable, since the (not yet merged) patch that will show this
      problem is marked for stable.
      Reported-by: NVlastimil Babka <vbabka@suse.cz>
      Reported-by: NMian Yousaf Kaukab <yousaf.kaukab@suse.com>
      Suggested-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5b1404d
  2. 12 8月, 2018 1 次提交
  3. 09 8月, 2018 3 次提交
    • B
      blkcg: Introduce blkg_root_lookup() · 6bad9b21
      Bart Van Assche 提交于
      This new function will be used in a later patch to verify whether a
      queue has been dissociated from the cgroup controller before being
      released.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: Alexandru Moise <00moses.alexander00@gmail.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6bad9b21
    • B
      block: Remove two superfluous #include directives · b1f4267c
      Bart Van Assche 提交于
      Commit 12f5b931 ("blk-mq: Remove generation seqeunce") removed the
      only seqcount_t and u64_stats_sync instances from <linux/blkdev.h> but
      did not remove the corresponding #include directives. Since these
      include directives are no longer needed, remove them.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Cc: Hannes Reinecke <hare@suse.com>,
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b1f4267c
    • C
      regmap: Add regmap_noinc_read API · 74fe7b55
      Crestez Dan Leonard 提交于
      The regmap API usually assumes that bulk read operations will read a
      range of registers but some I2C/SPI devices have certain registers for
      which a such a read operation will return data from an internal FIFO
      instead. Add an explicit API to support bulk read without range semantics.
      
      Some linux drivers use regmap_bulk_read or regmap_raw_read for such
      registers, for example mpu6050 or bmi150 from IIO. This only happens to
      work because when caching is disabled a single regmap read op will map
      to a single bus read op (as desired). This breaks if caching is enabled and
      reg+1 happens to be a cacheable register.
      
      Without regmap support refactoring a driver to enable regmap caching
      requires separate I2C and SPI paths. This is exactly what regmap is
      supposed to help avoid.
      Suggested-by: NJonathan Cameron <jic23@kernel.org>
      Signed-off-by: NCrestez Dan Leonard <leonard.crestez@intel.com>
      Signed-off-by: NStefan Popa <stefan.popa@analog.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      74fe7b55
  4. 08 8月, 2018 2 次提交
  5. 07 8月, 2018 1 次提交
    • T
      cpu/hotplug: Fix SMT supported evaluation · bc2d8d26
      Thomas Gleixner 提交于
      Josh reported that the late SMT evaluation in cpu_smt_state_init() sets
      cpu_smt_control to CPU_SMT_NOT_SUPPORTED in case that 'nosmt' was supplied
      on the kernel command line as it cannot differentiate between SMT disabled
      by BIOS and SMT soft disable via 'nosmt'. That wreckages the state and
      makes the sysfs interface unusable.
      
      Rework this so that during bringup of the non boot CPUs the availability of
      SMT is determined in cpu_smt_allowed(). If a newly booted CPU is not a
      'primary' thread then set the local cpu_smt_available marker and evaluate
      this explicitely right after the initial SMP bringup has finished.
      
      SMT evaulation on x86 is a trainwreck as the firmware has all the
      information _before_ booting the kernel, but there is no interface to query
      it.
      
      Fixes: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      Reported-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      bc2d8d26
  6. 06 8月, 2018 1 次提交
  7. 04 8月, 2018 2 次提交
    • A
      new helper: inode_fake_hash() · 5bef9151
      Al Viro 提交于
      open-coded in a quite a few places...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5bef9151
    • A
      new primitive: discard_new_inode() · c2b6d621
      Al Viro 提交于
      	We don't want open-by-handle picking half-set-up in-core
      struct inode from e.g. mkdir() having failed halfway through.
      In other words, we don't want such inodes returned by iget_locked()
      on their way to extinction.  However, we can't just have them
      unhashed - otherwise open-by-handle immediately *after* that would've
      ended up creating a new in-core inode over the on-disk one that
      is in process of being freed right under us.
      
      	Solution: new flag (I_CREATING) set by insert_inode_locked() and
      removed by unlock_new_inode() and a new primitive (discard_new_inode())
      to be used by such halfway-through-setup failure exits instead of
      unlock_new_inode() / iput() combinations.  That primitive unlocks new
      inode, but leaves I_CREATING in place.
      
      	iget_locked() treats finding an I_CREATING inode as failure
      (-ESTALE, once we sort out the error propagation).
      	insert_inode_locked() treats the same as instant -EBUSY.
      	ilookup() treats those as icache miss.
      
      [Fix by Dan Carpenter <dan.carpenter@oracle.com> folded in]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c2b6d621
  8. 03 8月, 2018 2 次提交
  9. 02 8月, 2018 4 次提交
    • F
      spi: spi-mem: Extend the SPI mem interface to set a custom memory name · 5d27a9c8
      Frieder Schrempf 提交于
      When porting (Q)SPI controller drivers from the MTD layer to the SPI
      layer, the naming scheme for the memory devices changes. To be able
      to keep compatibility with the old drivers naming scheme, a name
      field is added to struct spi_mem and a hook is added to let controller
      drivers set a custom name for the memory device.
      
      Example for the FSL QSPI driver:
      
      Name with the old driver: 21e0000.qspi,
      or with multiple devices: 21e0000.qspi-0, 21e0000.qspi-1, ...
      
      Name with the new driver without spi_mem_get_name: spi4.0
      Suggested-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      Signed-off-by: NFrieder Schrempf <frieder.schrempf@exceet.de>
      Reviewed-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      5d27a9c8
    • F
      spi: spi-mem: Fix a typo in the documentation of struct spi_mem · 06bcb516
      Frieder Schrempf 提交于
      Fix a typo in the @drvpriv description.
      Signed-off-by: NFrieder Schrempf <frieder.schrempf@exceet.de>
      Acked-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      06bcb516
    • A
      kill d_instantiate_no_diralias() · c971e6a0
      Al Viro 提交于
      The only user is fuse_create_new_entry(), and there it's used to
      mitigate the same mkdir/open-by-handle race as in nfs_mkdir().
      The same solution applies - unhash the mkdir argument, then
      call d_splice_alias() and if that returns a reference to preexisting
      alias, dput() and report success.  ->mkdir() argument left unhashed
      negative with the preexisting alias moved in the right place is just
      fine from the ->mkdir() callers point of view.
      
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c971e6a0
    • L
      mm: do not initialize TLB stack vma's with vma_init() · 8b11ec1b
      Linus Torvalds 提交于
      Commit 2c4541e2 ("mm: use vma_init() to initialize VMAs on stack and
      data segments") tried to initialize various left-over ad-hoc vma's
      "properly", but actually made things worse for the temporary vma's used
      for TLB flushing.
      
      vma_init() doesn't actually initialize all of the vma, just a few
      fields, so doing something like
      
         -       struct vm_area_struct vma = { .vm_mm = tlb->mm, };
         +       struct vm_area_struct vma;
         +
         +       vma_init(&vma, tlb->mm);
      
      was actually very bad: instead of having a nicely initialized vma with
      every field but "vm_mm" zeroed, you'd have an entirely uninitialized vma
      with only a couple of fields initialized.  And they weren't even fields
      that the code in question mostly cared about.
      
      The flush_tlb_range() function takes a "struct vma" rather than a
      "struct mm_struct", because a few architectures actually care about what
      kind of range it is - being able to only do an ITLB flush if it's a
      range that doesn't have data accesses enabled, for example.  And all the
      normal users already have the vma for doing the range invalidation.
      
      But a few people want to call flush_tlb_range() with a range they just
      made up, so they also end up using a made-up vma.  x86 just has a
      special "flush_tlb_mm_range()" function for this, but other
      architectures (arm and ia64) do the "use fake vma" thing instead, and
      thus got caught up in the vma_init() changes.
      
      At the same time, the TLB flushing code really doesn't care about most
      other fields in the vma, so vma_init() is just unnecessary and
      pointless.
      
      This fixes things by having an explicit "this is just an initializer for
      the TLB flush" initializer macro, which is used by the arm/arm64/ia64
      people who mis-use this interface with just a dummy vma.
      
      Fixes: 2c4541e2 ("mm: use vma_init() to initialize VMAs on stack and data segments")
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8b11ec1b
  10. 01 8月, 2018 5 次提交
    • L
      spi: spi-gpio: add SPI_3WIRE support · 4b859db2
      Lorenzo Bianconi 提交于
      Add SPI_3WIRE support to spi-gpio controller introducing
      set_line_direction function pointer in spi_bitbang data structure.
      Spi-gpio controller has been tested using hts221 temp/rh iio sensor
      running in 3wire mode and lsm6dsm running in 4wire mode
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      4b859db2
    • L
      spi: add flags parameter to txrx_word function pointers · 304d3436
      Lorenzo Bianconi 提交于
      Add the capability to specify the flag parameter used in
      bitbang_txrx_be_cpha{0,1} through the txrx_word function pointers of
      spi_bitbang data structure. That feature will be used to add spi-3wire
      support to the spi-gpio controller
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      304d3436
    • M
      mtd: rawnand: allocate dynamically ONFI parameters during detection · 3d3fe3c0
      Miquel Raynal 提交于
      Now that it is possible to do dynamic allocations during the
      identification phase, convert the onfi_params structure (which is only
      needed with ONFI compliant chips) into a pointer that will be allocated
      only if needed.
      Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      Reviewed-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      3d3fe3c0
    • B
      mtd: spi-nor: only apply reset hacks to broken hardware · bb276262
      Brian Norris 提交于
      Commit 59b356ff ("mtd: m25p80: restore the status of SPI flash when
      exiting") is the latest from a long history of attempts to add reboot
      handling to handle stateful addressing modes on SPI flash. Some prior
      mostly-related discussions:
      
      http://lists.infradead.org/pipermail/linux-mtd/2013-March/046343.html
      [PATCH 1/3] mtd: m25p80: utilize dedicated 4-byte addressing commands
      
      http://lists.infradead.org/pipermail/barebox/2014-September/020682.html
      [RFC] MTD m25p80 3-byte addressing and boot problem
      
      http://lists.infradead.org/pipermail/linux-mtd/2015-February/057683.html
      [PATCH 2/2] m25p80: if supported put chip to deep power down if not used
      
      Previously, attempts to add reboot-time software reset handling were
      rejected, but the latest attempt was not.
      
      Quick summary of the problem:
      Some systems (e.g., boot ROM or bootloader) assume that they can read
      initial boot code from their SPI flash using 3-byte addressing. If the
      flash is left in 4-byte mode after reset, these systems won't boot. The
      above patch provided a shutdown/remove hook to attempt to reset the
      addressing mode before we reboot. Notably, this patch misses out on
      huge classes of unexpected reboots (e.g., crashes, watchdog resets).
      
      Unfortunately, it is essentially impossible to solve this problem 100%:
      if your system doesn't know how to reset the SPI flash to power-on
      defaults at initialization time, no amount of software can really rescue
      you -- there will always be a chance of some unexpected reset that
      leaves your flash in an addressing mode that your boot sequence didn't
      expect.
      
      While it is not directly harmful to perform hacks like the
      aforementioned commit on all 4-byte addressing flash, a
      properly-designed system should not need the hack -- and in fact,
      providing this hack may mask the fact that a given system is indeed
      broken. So this patch attempts to apply this unsound hack more narrowly,
      providing a strong suggestion to developers and system designers that
      this is truly a hack. With luck, system designers can catch their errors
      early on in their development cycle, rather than applying this hack long
      term. But apparently enough systems are out in the wild that we still
      have to provide this hack.
      
      Document a new device tree property to denote systems that do not have a
      proper hardware (or software) reset mechanism, and apply the hack (with
      a loud warning) only in this case.
      Signed-off-by: NBrian Norris <computersforpeace@gmail.com>
      Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      bb276262
    • H
      PCI: Fix is_added/is_busmaster race condition · 44bda4b7
      Hari Vyas 提交于
      When a PCI device is detected, pdev->is_added is set to 1 and proc and
      sysfs entries are created.
      
      When the device is removed, pdev->is_added is checked for one and then
      device is detached with clearing of proc and sys entries and at end,
      pdev->is_added is set to 0.
      
      is_added and is_busmaster are bit fields in pci_dev structure sharing same
      memory location.
      
      A strange issue was observed with multiple removal and rescan of a PCIe
      NVMe device using sysfs commands where is_added flag was observed as zero
      instead of one while removing device and proc,sys entries are not cleared.
      This causes issue in later device addition with warning message
      "proc_dir_entry" already registered.
      
      Debugging revealed a race condition between the PCI core setting the
      is_added bit in pci_bus_add_device() and the NVMe driver reset work-queue
      setting the is_busmaster bit in pci_set_master().  As these fields are not
      handled atomically, that clears the is_added bit.
      
      Move the is_added bit to a separate private flag variable and use atomic
      functions to set and retrieve the device addition state.  This avoids the
      race because is_added no longer shares a memory location with is_busmaster.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=200283Signed-off-by: NHari Vyas <hari.vyas@broadcom.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NLukas Wunner <lukas@wunner.de>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      44bda4b7
  11. 31 7月, 2018 6 次提交
    • J
      t10-pi: provide empty t10_pi_complete() for !CONFIG_BLK_DEV_INTEGRITY · 08fcf813
      Jens Axboe 提交于
      Fixes a link failure whtn BLK_DEV_INTEGRITY isn't defined.
      
      Fixes: 10c41ddd ("block: move dif_prepare/dif_complete functions to block layer")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      08fcf813
    • M
      mtd: rawnand: allocate model parameter dynamically · 2023f1fa
      Miquel Raynal 提交于
      Thanks to the migration of all drivers to use nand_scan() and the
      related nand_controller_ops, we can now allocate data during the
      detection phase. Let's do it first for the NAND model parameter which
      is allocated in nand_detect().
      Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      Reviewed-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      2023f1fa
    • M
      mtd: rawnand: do not export nand_scan_[ident|tail]() anymore · 98732da1
      Miquel Raynal 提交于
      Both nand_scan_ident() and nand_scan_tail() helpers used to be called
      directly from controller drivers that needed to tweak some ECC-related
      parameters before nand_scan_tail(). This separation prevented dynamic
      allocations during the phase of NAND identification, which was
      inconvenient.
      
      All controller drivers have been moved to use nand_scan(), in
      conjunction with the chip->ecc.[attach|detach]_chip() hooks that
      actually do the required tweaking sequence between both ident/tail
      calls, allowing programmers to use dynamic allocation as they need all
      across the scanning sequence.
      
      Declare nand_scan_[ident|tail]() statically now.
      Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      Reviewed-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      98732da1
    • M
      mtd: rawnand: add hooks that may be called during nand_scan() · 05b54c7b
      Miquel Raynal 提交于
      In order to remove the limitation that forbids dynamic allocation in
      nand_scan_ident(), we must create a path that will be the same for all
      controller drivers. The idea is to use nand_scan() instead of the widely
      used nand_scan_ident()/nand_scan_tail() couple. In order to achieve
      this, controller drivers will need to adjust some parameters between
      these two functions depending on the NAND chip wired on them.
      
      This takes the form of two new hooks (->{attach,detach}_chip()) that are
      placed in a new nand_controller_ops structure, which is then attached
      to the nand_controller object at driver initialization time.
      ->attach_chip() is called between nand_scan_ident() and
      nand_scan_tail(), and ->detach_chip() is called in the error path of
      nand_scan() and in nand_cleanup().
      
      Note that some NAND controller drivers don't have a dedicated
      nand_controller object and instead rely on the default/dummy one
      embedded in nand_chip. If you're in this case and still want to
      initialize the controller ops, you'll have to manipulate
      chip->dummy_controller directly.
      
      Last but not least, it's worth mentioning that we plan to move some of
      the controller related hooks placed in nand_chip into
      nand_controller_ops to make the separation between NAND chip and NAND
      controller methods clearer.
      Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      Acked-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      05b54c7b
    • M
      mtd: rawnand: better name for the controller structure · 7da45139
      Miquel Raynal 提交于
      In the raw NAND core, a NAND chip is described by a nand_chip structure,
      while a NAND controller is described with a nand_hw_control structure
      which is not very meaningful.
      
      Rename this structure nand_controller.
      
      As the structure gets renamed, it is logical to also rename the core
      function initializing it from nand_hw_control_init() to
      nand_controller_init().
      
      Lastly, the 'hwcontrol' entry of the nand_chip structure is not
      meaningful neither while it has the role of fallback when no controller
      structure is provided by the driver (the controller driver is dumb and
      can only control a single chip). Thus, it is renamed dummy_controller.
      Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      Acked-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      7da45139
    • M
      mtd: rawnand: make subop helpers return unsigned values · 760c435e
      Miquel Raynal 提交于
      A report from Colin Ian King pointed a CoverityScan issue where error
      values on these helpers where not checked in the drivers. These
      helpers can error out only in case of a software bug in driver code,
      not because of a runtime/hardware error. Hence, let's WARN_ON() in this
      case and return 0 which is harmless anyway.
      
      Fixes: 8878b126 ("mtd: nand: add ->exec_op() implementation")
      Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      Reviewed-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      760c435e
  12. 30 7月, 2018 3 次提交
  13. 28 7月, 2018 3 次提交
  14. 27 7月, 2018 5 次提交
    • A
      include/linux/eventfd.h: include linux/errno.h · fa3fc2ad
      Arnd Bergmann 提交于
      The new gasket staging driver ran into a randconfig build failure when
      CONFIG_EVENTFD is disabled:
      
        In file included from drivers/staging/gasket/gasket_interrupt.h:11,
                         from drivers/staging/gasket/gasket_interrupt.c:4:
        include/linux/eventfd.h: In function 'eventfd_ctx_fdget':
        include/linux/eventfd.h:51:9: error: implicit declaration of function 'ERR_PTR' [-Werror=implicit-function-declaration]
      
      I can't see anything wrong with including eventfd.h before err.h, so the
      easiest fix is to make it possible to do this by including the file
      where it is needed.
      
      Link: http://lkml.kernel.org/r/20180724110737.3985088-1-arnd@arndb.de
      Fixes: 9a69f508 ("drivers/staging: Gasket driver framework + Apex driver")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa3fc2ad
    • K
      mm: fix vma_is_anonymous() false-positives · bfd40eaf
      Kirill A. Shutemov 提交于
      vma_is_anonymous() relies on ->vm_ops being NULL to detect anonymous
      VMA.  This is unreliable as ->mmap may not set ->vm_ops.
      
      False-positive vma_is_anonymous() may lead to crashes:
      
      	next ffff8801ce5e7040 prev ffff8801d20eca50 mm ffff88019c1e13c0
      	prot 27 anon_vma ffff88019680cdd8 vm_ops 0000000000000000
      	pgoff 0 file ffff8801b2ec2d00 private_data 0000000000000000
      	flags: 0xff(read|write|exec|shared|mayread|maywrite|mayexec|mayshare)
      	------------[ cut here ]------------
      	kernel BUG at mm/memory.c:1422!
      	invalid opcode: 0000 [#1] SMP KASAN
      	CPU: 0 PID: 18486 Comm: syz-executor3 Not tainted 4.18.0-rc3+ #136
      	Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google
      	01/01/2011
      	RIP: 0010:zap_pmd_range mm/memory.c:1421 [inline]
      	RIP: 0010:zap_pud_range mm/memory.c:1466 [inline]
      	RIP: 0010:zap_p4d_range mm/memory.c:1487 [inline]
      	RIP: 0010:unmap_page_range+0x1c18/0x2220 mm/memory.c:1508
      	Call Trace:
      	 unmap_single_vma+0x1a0/0x310 mm/memory.c:1553
      	 zap_page_range_single+0x3cc/0x580 mm/memory.c:1644
      	 unmap_mapping_range_vma mm/memory.c:2792 [inline]
      	 unmap_mapping_range_tree mm/memory.c:2813 [inline]
      	 unmap_mapping_pages+0x3a7/0x5b0 mm/memory.c:2845
      	 unmap_mapping_range+0x48/0x60 mm/memory.c:2880
      	 truncate_pagecache+0x54/0x90 mm/truncate.c:800
      	 truncate_setsize+0x70/0xb0 mm/truncate.c:826
      	 simple_setattr+0xe9/0x110 fs/libfs.c:409
      	 notify_change+0xf13/0x10f0 fs/attr.c:335
      	 do_truncate+0x1ac/0x2b0 fs/open.c:63
      	 do_sys_ftruncate+0x492/0x560 fs/open.c:205
      	 __do_sys_ftruncate fs/open.c:215 [inline]
      	 __se_sys_ftruncate fs/open.c:213 [inline]
      	 __x64_sys_ftruncate+0x59/0x80 fs/open.c:213
      	 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
      	 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Reproducer:
      
      	#include <stdio.h>
      	#include <stddef.h>
      	#include <stdint.h>
      	#include <stdlib.h>
      	#include <string.h>
      	#include <sys/types.h>
      	#include <sys/stat.h>
      	#include <sys/ioctl.h>
      	#include <sys/mman.h>
      	#include <unistd.h>
      	#include <fcntl.h>
      
      	#define KCOV_INIT_TRACE			_IOR('c', 1, unsigned long)
      	#define KCOV_ENABLE			_IO('c', 100)
      	#define KCOV_DISABLE			_IO('c', 101)
      	#define COVER_SIZE			(1024<<10)
      
      	#define KCOV_TRACE_PC  0
      	#define KCOV_TRACE_CMP 1
      
      	int main(int argc, char **argv)
      	{
      		int fd;
      		unsigned long *cover;
      
      		system("mount -t debugfs none /sys/kernel/debug");
      		fd = open("/sys/kernel/debug/kcov", O_RDWR);
      		ioctl(fd, KCOV_INIT_TRACE, COVER_SIZE);
      		cover = mmap(NULL, COVER_SIZE * sizeof(unsigned long),
      				PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
      		munmap(cover, COVER_SIZE * sizeof(unsigned long));
      		cover = mmap(NULL, COVER_SIZE * sizeof(unsigned long),
      				PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
      		memset(cover, 0, COVER_SIZE * sizeof(unsigned long));
      		ftruncate(fd, 3UL << 20);
      		return 0;
      	}
      
      This can be fixed by assigning anonymous VMAs own vm_ops and not relying
      on it being NULL.
      
      If ->mmap() failed to set ->vm_ops, mmap_region() will set it to
      dummy_vm_ops.  This way we will have non-NULL ->vm_ops for all VMAs.
      
      Link: http://lkml.kernel.org/r/20180724121139.62570-4-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reported-by: syzbot+3f84280d52be9b7083cc@syzkaller.appspotmail.com
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bfd40eaf
    • K
      mm: introduce vma_init() · 027232da
      Kirill A. Shutemov 提交于
      Not all VMAs allocated with vm_area_alloc().  Some of them allocated on
      stack or in data segment.
      
      The new helper can be use to initialize VMA properly regardless where it
      was allocated.
      
      Link: http://lkml.kernel.org/r/20180724121139.62570-2-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      027232da
    • T
      delayacct: fix crash in delayacct_blkio_end() after delayacct init failure · b512719f
      Tejun Heo 提交于
      While forking, if delayacct init fails due to memory shortage, it
      continues expecting all delayacct users to check task->delays pointer
      against NULL before dereferencing it, which all of them used to do.
      
      Commit c96f5471 ("delayacct: Account blkio completion on the correct
      task"), while updating delayacct_blkio_end() to take the target task
      instead of always using %current, made the function test NULL on
      %current->delays and then continue to operated on @p->delays.  If
      %current succeeded init while @p didn't, it leads to the following
      crash.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
       IP: __delayacct_blkio_end+0xc/0x40
       PGD 8000001fd07e1067 P4D 8000001fd07e1067 PUD 1fcffbb067 PMD 0
       Oops: 0000 [#1] SMP PTI
       CPU: 4 PID: 25774 Comm: QIOThread0 Not tainted 4.16.0-9_fbk1_rc2_1180_g6b593215b4d7 #9
       RIP: 0010:__delayacct_blkio_end+0xc/0x40
       Call Trace:
        try_to_wake_up+0x2c0/0x600
        autoremove_wake_function+0xe/0x30
        __wake_up_common+0x74/0x120
        wake_up_page_bit+0x9c/0xe0
        mpage_end_io+0x27/0x70
        blk_update_request+0x78/0x2c0
        scsi_end_request+0x2c/0x1e0
        scsi_io_completion+0x20b/0x5f0
        blk_mq_complete_request+0xa2/0x100
        ata_scsi_qc_complete+0x79/0x400
        ata_qc_complete_multiple+0x86/0xd0
        ahci_handle_port_interrupt+0xc9/0x5c0
        ahci_handle_port_intr+0x54/0xb0
        ahci_single_level_irq_intr+0x3b/0x60
        __handle_irq_event_percpu+0x43/0x190
        handle_irq_event_percpu+0x20/0x50
        handle_irq_event+0x2a/0x50
        handle_edge_irq+0x80/0x1c0
        handle_irq+0xaf/0x120
        do_IRQ+0x41/0xc0
        common_interrupt+0xf/0xf
      
      Fix it by updating delayacct_blkio_end() check @p->delays instead.
      
      Link: http://lkml.kernel.org/r/20180724175542.GP1934745@devbig577.frc2.facebook.com
      Fixes: c96f5471 ("delayacct: Account blkio completion on the correct task")
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NDave Jones <dsj@fb.com>
      Debugged-by: NDave Jones <dsj@fb.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Josh Snyder <joshs@netflix.com>
      Cc: <stable@vger.kernel.org>	[4.15+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b512719f
    • G
      block: move bio_integrity_{intervals,bytes} into blkdev.h · 359f6427
      Greg Edwards 提交于
      This allows bio_integrity_bytes() to be called from drivers instead of
      open coding it.
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NGreg Edwards <gedwards@ddn.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      359f6427
  15. 26 7月, 2018 1 次提交