1. 22 6月, 2017 1 次提交
    • N
      md: use a separate bio_set for synchronous IO. · 5a85071c
      NeilBrown 提交于
      md devices allocate a bio_set and use it for two
      distinct purposes.
      mddev->bio_set is used to clone bios as part of sending
      upper level requests down to lower level devices,
      and it is also use for synchronous IO such as superblock
      and bitmap updates, and for correcting read errors.
      
      This multiple usage can lead to deadlocks.  It is likely
      that cloned bios might be queued for write and to be
      waiting for a metadata update before the write can be permitted.
      If the cloning exhausted mddev->bio_set, the metadata update
      may not be able to proceed.
      
      This scenario has been seen during heavy testing, with lots of IO and
      lots of memory pressure.
      
      Address this by adding a new bio_set specifically for synchronous IO.
      All synchronous IO goes directly to the underlying device and is not
      queued at the md level, so request using entries from the new
      mddev->sync_set will complete in a timely fashion.
      Requests that use mddev->bio_set will sometimes need to wait
      for synchronous IO, but will no longer risk deadlocking that iO.
      
      Also: small simplification in mddev_put(): there is no need to
      wait until the spinlock is released before calling bioset_free().
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      5a85071c
  2. 17 6月, 2017 3 次提交
  3. 14 6月, 2017 2 次提交
    • M
      md: don't use flush_signals in userspace processes · f9c79bc0
      Mikulas Patocka 提交于
      The function flush_signals clears all pending signals for the process. It
      may be used by kernel threads when we need to prepare a kernel thread for
      responding to signals. However using this function for an userspaces
      processes is incorrect - clearing signals without the program expecting it
      can cause misbehavior.
      
      The raid1 and raid5 code uses flush_signals in its request routine because
      it wants to prepare for an interruptible wait. This patch drops
      flush_signals and uses sigprocmask instead to block all signals (including
      SIGKILL) around the schedule() call. The signals are not lost, but the
      schedule() call won't respond to them.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Acked-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      f9c79bc0
    • N
      md: fix deadlock between mddev_suspend() and md_write_start() · cc27b0c7
      NeilBrown 提交于
      If mddev_suspend() races with md_write_start() we can deadlock
      with mddev_suspend() waiting for the request that is currently
      in md_write_start() to complete the ->make_request() call,
      and md_write_start() waiting for the metadata to be updated
      to mark the array as 'dirty'.
      As metadata updates done by md_check_recovery() only happen then
      the mddev_lock() can be claimed, and as mddev_suspend() is often
      called with the lock held, these threads wait indefinitely for each
      other.
      
      We fix this by having md_write_start() abort if mddev_suspend()
      is happening, and ->make_request() aborts if md_write_start()
      aborted.
      md_make_request() can detect this abort, decrease the ->active_io
      count, and wait for mddev_suspend().
      Reported-by: NNix <nix@esperi.org.uk>
      Fix: 68866e42(MD: no sync IO while suspended)
      Cc: stable@vger.kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      cc27b0c7
  4. 10 6月, 2017 1 次提交
  5. 09 6月, 2017 5 次提交
    • D
      device-dax: fix 'dax' device filesystem inode destruction crash · b9d39d17
      Dan Williams 提交于
      The inode destruction path for the 'dax' device filesystem incorrectly
      assumes that the inode was initialized through 'alloc_dax()'. However,
      if someone attempts to directly mount the dax filesystem with 'mount -t
      dax dax mnt' that will bypass 'alloc_dax()' and the following failure
      signatures may occur as a result:
      
       kill_dax() must be called before final iput()
       WARNING: CPU: 2 PID: 1188 at drivers/dax/super.c:243 dax_destroy_inode+0x48/0x50
       RIP: 0010:dax_destroy_inode+0x48/0x50
       Call Trace:
        destroy_inode+0x3b/0x60
        evict+0x139/0x1c0
        iput+0x1f9/0x2d0
        dentry_unlink_inode+0xc3/0x160
        __dentry_kill+0xcf/0x180
        ? dput+0x37/0x3b0
        dput+0x3a3/0x3b0
        do_one_tree+0x36/0x40
        shrink_dcache_for_umount+0x2d/0x90
        generic_shutdown_super+0x1f/0x120
        kill_anon_super+0x12/0x20
        deactivate_locked_super+0x43/0x70
        deactivate_super+0x4e/0x60
      
       general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
       RIP: 0010:kfree+0x6d/0x290
       Call Trace:
        <IRQ>
        dax_i_callback+0x22/0x60
        ? dax_destroy_inode+0x50/0x50
        rcu_process_callbacks+0x298/0x740
      
       ida_remove called for id=0 which is not allocated.
       WARNING: CPU: 0 PID: 0 at lib/idr.c:383 ida_remove+0x110/0x120
       [..]
       Call Trace:
        <IRQ>
        ida_simple_remove+0x2b/0x50
        ? dax_destroy_inode+0x50/0x50
        dax_i_callback+0x3c/0x60
        rcu_process_callbacks+0x298/0x740
      
      Add missing initialization of the 'struct dax_device' and inode so that
      the destruction path does not kfree() or ida_simple_remove()
      uninitialized data.
      
      Fixes: 7b6be844 ("dax: refactor dax-fs into a generic provider of 'struct dax_device' instances")
      Reported-by: NSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      b9d39d17
    • D
      efi: Fix boot panic because of invalid BGRT image address · 792ef14d
      Dave Young 提交于
      Maniaxx reported a kernel boot crash in the EFI code, which I emulated
      by using same invalid phys addr in code:
      
        BUG: unable to handle kernel paging request at ffffffffff280001
        IP: efi_bgrt_init+0xfb/0x153
        ...
        Call Trace:
         ? bgrt_init+0xbc/0xbc
         acpi_parse_bgrt+0xe/0x12
         acpi_table_parse+0x89/0xb8
         acpi_boot_init+0x445/0x4e2
         ? acpi_parse_x2apic+0x79/0x79
         ? dmi_ignore_irq0_timer_override+0x33/0x33
         setup_arch+0xb63/0xc82
         ? early_idt_handler_array+0x120/0x120
         start_kernel+0xb7/0x443
         ? early_idt_handler_array+0x120/0x120
         x86_64_start_reservations+0x29/0x2b
         x86_64_start_kernel+0x154/0x177
         secondary_startup_64+0x9f/0x9f
      
      There is also a similar bug filed in bugzilla.kernel.org:
      
        https://bugzilla.kernel.org/show_bug.cgi?id=195633
      
      The crash is caused by this commit:
      
        7b0a9114 efi/x86: Move the EFI BGRT init code to early init code
      
      The root cause is the firmware on those machines provides invalid BGRT
      image addresses.
      
      In a kernel before above commit BGRT initializes late and uses ioremap()
      to map the image address. Ioremap validates the address, if it is not a
      valid physical address ioremap() just fails and returns. However in current
      kernel EFI BGRT initializes early and uses early_memremap() which does not
      validate the image address, and kernel panic happens.
      
      According to ACPI spec the BGRT image address should fall into
      EFI_BOOT_SERVICES_DATA, see the section 5.2.22.4 of below document:
      
        http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf
      
      Fix this issue by validating the image address in efi_bgrt_init(). If the
      image address does not fall into any EFI_BOOT_SERVICES_DATA areas we just
      bail out with a warning message.
      Reported-by: NManiaxx <tripleshiftone@gmail.com>
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Fixes: 7b0a9114 ("efi/x86: Move the EFI BGRT init code to early init code")
      Link: http://lkml.kernel.org/r/20170609084558.26766-2-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      792ef14d
    • V
      cxl: Avoid double free_irq() for psl,slice interrupts · ed45509b
      Vaibhav Jain 提交于
      During an eeh call to cxl_remove can result in double free_irq of
      psl,slice interrupts. This can happen if perst_reloads_same_image == 1
      and call to cxl_configure_adapter() fails during slot_reset
      callback. In such a case we see a kernel oops with following back-trace:
      
      Oops: Kernel access of bad area, sig: 11 [#1]
      Call Trace:
        free_irq+0x88/0xd0 (unreliable)
        cxl_unmap_irq+0x20/0x40 [cxl]
        cxl_native_release_psl_irq+0x78/0xd8 [cxl]
        pci_deconfigure_afu+0xac/0x110 [cxl]
        cxl_remove+0x104/0x210 [cxl]
        pci_device_remove+0x6c/0x110
        device_release_driver_internal+0x204/0x2e0
        pci_stop_bus_device+0xa0/0xd0
        pci_stop_and_remove_bus_device+0x28/0x40
        pci_hp_remove_devices+0xb0/0x150
        pci_hp_remove_devices+0x68/0x150
        eeh_handle_normal_event+0x140/0x580
        eeh_handle_event+0x174/0x360
        eeh_event_handler+0x1e8/0x1f0
      
      This patch fixes the issue of double free_irq by checking that
      variables that hold the virqs (err_hwirq, serr_hwirq, psl_virq) are
      not '0' before un-mapping and resetting these variables to '0' when
      they are un-mapped.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ed45509b
    • R
      gpio: mvebu: fix gpio bank registration when pwm is used · fc7a9068
      Richard Genoud 提交于
      If more than one gpio bank has the "pwm" property, only one will be
      registered successfully, all the others will fail with:
      mvebu-gpio: probe of f1018140.gpio failed with error -17
      
      That's because in alloc_pwms(), the chip->base (aka "int pwm"), was not
      set (thus, ==0) ; and 0 is a meaningful start value in alloc_pwm().
      What was intended is mvpwm->chip->base = -1.
      Like that, the numbering will be done auto-magically
      
      Moreover, as the region might be already occupied by another pwm, we
      shouldn't force:
      mvpwm->chip->base = 0
      nor
      mvpwm->chip->base = id * MVEBU_MAX_GPIO_PER_BANK;
      
      Tested on clearfog-pro (Marvell 88F6828)
      
      Fixes: 757642f9 ("gpio: mvebu: Add limited PWM support")
      Signed-off-by: NRichard Genoud <richard.genoud@gmail.com>
      Reviewed-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      fc7a9068
    • R
      gpio: mvebu: fix blink counter register selection · c528eb27
      Richard Genoud 提交于
      The blink counter A was always selected because 0 was forced in the
      blink select counter register.
      The variable 'set' was obviously there to be used as the register value,
      selecting the B counter when id==1 and A counter when id==0.
      
      Tested on clearfog-pro (Marvell 88F6828)
      
      Fixes: 757642f9 ("gpio: mvebu: Add limited PWM support")
      Reviewed-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      Reviewed-by: NRalph Sennhauser <ralph.sennhauser@gmail.com>
      Signed-off-by: NRichard Genoud <richard.genoud@gmail.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      c528eb27
  6. 08 6月, 2017 6 次提交
  7. 07 6月, 2017 22 次提交