1. 26 11月, 2015 6 次提交
  2. 25 11月, 2015 2 次提交
    • C
      nvme: add missing unmaps in nvme_queue_rq · bf508e91
      Christoph Hellwig 提交于
      When we fail various metadata related operations in nvme_queue_rq we
      need to unmap the data SGL.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      bf508e91
    • N
      NVMe: default to 4k device page size · c5c9f25b
      Nishanth Aravamudan 提交于
      We received a bug report recently when DDW (64-bit direct DMA on Power)
      is not enabled for NVMe devices. In that case, we fall back to 32-bit
      DMA via the IOMMU, which is always done via 4K TCEs (Translation Control
      Entries).
      
      The NVMe device driver, though, assumes that the DMA alignment for the
      PRP entries will match the device's page size, and that the DMA aligment
      matches the kernel's page aligment. On Power, the the IOMMU page size,
      as mentioned above, can be 4K, while the device can have a page size of
      8K, while the kernel has a page size of 64K. This eventually trips the
      BUG_ON in nvme_setup_prps(), as we have a 'dma_len' that is a multiple
      of 4K but not 8K (e.g., 0xF000).
      
      In this particular case of page sizes, we clearly want to use the
      IOMMU's page size in the driver. And generally, the NVMe driver in this
      function should be using the IOMMU's page size for the default device
      page size, rather than the kernel's page size. There is not currently an
      API to obtain the IOMMU's page size across all architectures and in the
      interest of a stop-gap fix to this functional issue, default the NVMe
      device page size to 4K, with the intent of adding such an API and
      implementation across all architectures in the next merge window.
      
      With the functionally equivalent v3 of this patch, our hardware test
      exerciser survives when using 32-bit DMA; without the patch, the kernel
      will BUG within a few minutes.
      
      Signed-off-by: Nishanth Aravamudan <nacc at linux.vnet.ibm.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c5c9f25b
  3. 24 11月, 2015 1 次提交
    • M
      dm thin: fix regression in advertised discard limits · 0fcb04d5
      Mike Snitzer 提交于
      When establishing a thin device's discard limits we cannot rely on the
      underlying thin-pool device's discard capabilities (which are inherited
      from the thin-pool's underlying data device) given that DM thin devices
      must provide discard support even when the thin-pool's underlying data
      device doesn't support discards.
      
      Users were exposed to this thin device discard limits regression if
      their thin-pool's underlying data device does _not_ support discards.
      This regression caused all upper-layers that called the
      blkdev_issue_discard() interface to not be able to issue discards to
      thin devices (because discard_granularity was 0).  This regression
      wasn't caught earlier because the device-mapper-test-suite's extensive
      'thin-provisioning' discard tests are only ever performed against
      thin-pool's with data devices that support discards.
      
      Fix is to have thin_io_hints() test the pool's 'discard_enabled' feature
      rather than inferring whether or not a thin device's discard support
      should be enabled by looking at the thin-pool's discard_granularity.
      
      Fixes: 21607670 ("dm thin: disable discard support for thin devices if pool's is disabled")
      Reported-by: NMike Gerber <mike@sprachgewalt.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.1+
      0fcb04d5
  4. 21 11月, 2015 9 次提交
  5. 20 11月, 2015 21 次提交
    • R
      mtip32xx: use formatting capability of kthread_create_on_node · 8aeea031
      Rasmus Villemoes 提交于
      kthread_create_on_node takes format+args, so there's no need to do the
      pretty-printing in advance. Moreover, "mtip_svc_thd_99" (including its
      '\0') only just fits in 16 bytes, so if index could ever go above 99
      we'd have a stack buffer overflow.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8aeea031
    • K
      NVMe: reap completion entries when deleting queue · 604e8c8d
      Keith Busch 提交于
      Make sure that there are no unprocesssed entries on a completion
      queue before deleting it, and check for validity of the CQ
      door bell before writing completions to it.
      
      This fixes problems with doing a sysfs reset of the device while
      it's handling IO.
      Tested-by: NJon Derrick <jonathan.derrick@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      604e8c8d
    • J
      lightnvm: add free and bad lun info to show luns · 2fde0e48
      Javier Gonzalez 提交于
      Add free block, used block, and bad block information to the show debug
      interface. This information is used to debug how targets track blocks.
      
      Also, change debug function name to make it more generic.
      Signed-off-by: NJavier Gonzalez <javier@cnexlabs.com>
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2fde0e48
    • J
      lightnvm: keep track of block counts · 0b59733b
      Javier Gonzalez 提交于
      Maintain number of in use blocks, free blocks, and bad blocks in a per
      lun basis. This allows the upper layers to get information about the
      state of each lun.
      
      Also, account for blocks reserved to the device on the free block count.
      nr_free_blocks matches now the actual number of blocks on the free list
      when the device is booted.
      Signed-off-by: NJavier Gonzalez <javier@cnexlabs.com>
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0b59733b
    • W
      nvme: lightnvm: use admin queues for admin cmds · 47b3115a
      Wenwei Tao 提交于
      According to the Open-Channel SSD Specification, the NVMe-NVM admin
      commands use vendor specific opcodes of NVMe, so use the NVMe admin
      queue to dispatch these commands.
      Signed-off-by: NWenwei Tao <ww.tao0320@gmail.com>
      Updated by me to include set bad block table as well and also use
      the admin queue for l2p len calculation.
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      47b3115a
    • M
      lightnvm: missing free on init error · 93e70c1f
      Matias Bjørling 提交于
      If either max_phys_sect is out of bound, the nvm_dev structure is not
      freed.
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      93e70c1f
    • W
      lightnvm: wrong return value and redundant free · 480fc0db
      Wenwei Tao 提交于
      The return value should be non-zero under error conditions.
      Remove nvme_free(dev) to avoid free dev more than once.
      Signed-off-by: NWenwei Tao <ww.tao0320@gmail.com>
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      480fc0db
    • A
      i2c: i801: add Intel Lewisburg device IDs · cdc5a311
      Alexandra Yates 提交于
      Adding Intel codename Lewisburg platform device IDs for SMBus.
      Signed-off-by: NAlexandra Yates <alexandra.yates@linux.intel.com>
      Reviewed-by: NJean Delvare <jdelvare@suse.de>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      cdc5a311
    • G
      i2c: fix wakeup irq parsing · c18fba23
      Grygorii Strashko 提交于
      This patch fixes obvious copy-past error in wake up irq parsing
      code which leads to the fact that dev_pm_set_wake_irq() will
      be called with wrong IRQ number when "wakeup" IRQ is not
      defined in DT.
      
      Fixes: 3fffd128 ("i2c: allow specifying separate wakeup interrupt in device tree")
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      Cc: <stable@vger.kernel.org> # v4.3
      c18fba23
    • L
      i2c: xiic: Prevent concurrent running of the IRQ handler and __xiic_start_xfer() · d0fe5258
      Lars-Peter Clausen 提交于
      Prior to commit e6c9a037 ("i2c: xiic: Remove the disabling of
      interrupts") IRQs where disabled when the initial __xiic_start_xfer() was
      called. After the commit the interrupt is enabled while the function is
      running, this means it is possible for the interrupt to be triggered while
      the function is still running. When this happens the internal data
      structures get corrupted and undefined behavior can occur like the
      following crash:
      
      	Internal error: Oops: 17 [#1] PREEMPT SMP ARM
      	Modules linked in:
      	CPU: 0 PID: 2040 Comm: i2cdetect Not tainted 4.0.0-02856-g047a308 #10956
      	Hardware name: Xilinx Zynq Platform
      	task: ee0c9500 ti: e99a2000 task.ti: e99a2000
      	PC is at __xiic_start_xfer+0x6c4/0x7c8
      	LR is at __xiic_start_xfer+0x690/0x7c8
      	pc : [<c02bbffc>]    lr : [<c02bbfc8>]    psr: 800f0013
      	sp : e99a3da8  ip : 00000000  fp : 00000000
      	r10: 00000001  r9 : 600f0013  r8 : f0180000
      	r7 : f0180000  r6 : c064e444  r5 : 00000017  r4 : ee031010
      	r3 : 00000000  r2 : 00000000  r1 : 600f0013  r0 : 0000000f
      	Flags: Nzcv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
      	Control: 18c5387d  Table: 29a5404a  DAC: 00000015
      	Process i2cdetect (pid: 2040, stack limit = 0xe99a2210)
      	Stack: (0xe99a3da8 to 0xe99a4000)
      	3da0:                   ee031010 00000000 00000001 ee031020 ee031224 c02bc5ec
      	3dc0: ee34c604 00000000 ee0c9500 e99a3dcc e99a3dd0 e99a3dd0 e99a3dd8 c069f0e8
      	3de0: 00000000 ee031020 c064e100 ffff90bb e99a3e48 c02b6590 ee031020 00000001
      	3e00: e99a3e48 ee031020 00000000 e99a3e63 00000001 c02b6ec4 00000000 00000000
      	3e20: 00000000 c02b7320 e99a3ef0 00000000 00000000 e99e3df0 00000000 00000000
      	3e40: 00000103 2814575f 0000003e c00a0000 e99a3e85 0001003e ee0c0000 e99a3e63
      	3e60: eefd3578 c064e61c ee0c9500 c0041e04 0000056c e9a56db8 00006e5a b6f5c000
      	3e80: ee0c9548 eefd0040 00000001 eefd3540 ee0c9500 eefd39a0 c064b540 ee0c9500
      	3ea0: 00000000 ee92b000 00000000 bef4862c ee34c600 e99ecdc0 00000720 00000003
      	3ec0: e99a2000 00000000 00000000 c02b8b30 00000000 00000000 00000000 e99a3f24
      	3ee0: b6e80000 00000000 00000000 c04257e8 00000000 e99a3f24 c02b8f08 00000703
      	3f00: 00000003 c02116bc ee935300 00000000 bef4862c ee34c600 e99ecdc0 c02b91f0
      	3f20: e99ecdc0 00000720 bef4862c eeb725f8 e99ecdc0 c00c9e2c 00000003 00000003
      	3f40: ee248dc0 00000000 ee248dc8 00000002 eeb7c1a8 00000000 00000000 c00bb360
      	3f60: 00000000 00000000 00000003 ee248dc0 bef4862c e99ecdc0 e99ecdc0 00000720
      	3f80: 00000003 e99a2000 00000000 c00c9f68 00000000 00000000 b6f22000 00000036
      	3fa0: c000dfa4 c000de20 00000000 00000000 00000003 00000720 bef4862c bef4862c
      	3fc0: 00000000 00000000 b6f22000 00000036 00000000 00000000 b6f60000 00000000
      	3fe0: 00013040 bef48614 00008cab b6ecdbe6 400f0030 00000003 2f7fd821 2f7fdc21
      	[<c02bbffc>] (__xiic_start_xfer) from [<c02bc5ec>] (xiic_xfer+0x94/0x168)
      	[<c02bc5ec>] (xiic_xfer) from [<c02b6590>] (__i2c_transfer+0x4c/0x7c)
      	[<c02b6590>] (__i2c_transfer) from [<c02b6ec4>] (i2c_transfer+0x9c/0xc4)
      	[<c02b6ec4>] (i2c_transfer) from [<c02b7320>] (i2c_smbus_xfer+0x3a0/0x4ec)
      	[<c02b7320>] (i2c_smbus_xfer) from [<c02b8b30>] (i2cdev_ioctl_smbus+0xb0/0x214)
      	[<c02b8b30>] (i2cdev_ioctl_smbus) from [<c02b91f0>] (i2cdev_ioctl+0xa0/0x1d4)
      	[<c02b91f0>] (i2cdev_ioctl) from [<c00c9e2c>] (do_vfs_ioctl+0x4b0/0x5b8)
      	[<c00c9e2c>] (do_vfs_ioctl) from [<c00c9f68>] (SyS_ioctl+0x34/0x5c)
      	[<c00c9f68>] (SyS_ioctl) from [<c000de20>] (ret_fast_syscall+0x0/0x34)
      	Code: e283300c e5843210 eafffe64 e5943210 (e1d320b4)
      
      The issue can easily be reproduced by performing I2C access under high
      system load or IO load.
      
      To fix the issue protect the invocation to __xiic_start_xfer() form
      xiic_start_xfer() with the same lock that is used to protect the interrupt
      handler.
      
      Fixes: e6c9a037 ("i2c: xiic: Remove the disabling of interrupts")
      Signed-off-by: NLars-Peter Clausen <lars@metafoo.de>
      Reviewed-by: NShubhrajyoti Datta <shubhraj@xilinx.com>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      d0fe5258
    • L
      i2c: Revert "i2c: xiic: Do not reset controller before every transfer" · 9656eeeb
      Lars-Peter Clausen 提交于
      Commit d701667b ("i2c: xiic: Do not reset controller before every
      transfer") removed the reinitialization of the controller before the start
      of each transfer. Apparently this change is not safe to make and the commit
      results in random I2C bus failures.
      
      An easy way to trigger the issue is to run i2cdetect.
      
      Without the patch applied:
           0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
      	 00:          -- -- -- -- -- -- -- -- -- -- -- -- --
      	 10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      	 20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      	 30: -- -- -- -- -- -- -- -- UU UU -- UU 3c -- -- UU
      	 40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      	 50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      	 60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      	 70: -- -- -- -- -- -- -- --
      
      With the patch applied every other or so invocation:
           0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
      	 00:          03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f
      	 10: 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
      	 20: 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f
      	 30: -- -- -- -- -- -- -- -- UU UU -- UU 3c -- -- UU
      	 40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      	 50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      	 60: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
      	 70: -- -- -- -- -- -- -- --
      
      So revert the commit for now.
      
      Fixes: d701667b ("i2c: xiic: Do not reset controller before every transfer")
      Signed-off-by: NLars-Peter Clausen <lars@metafoo.de>
      Acked-by: NShubhrajyoti Datta <shubhraj@xilinx.com>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      9656eeeb
    • H
      i2c: imx: fix a compiling error · 8bb6fd58
      Hou Zhiqiang 提交于
      drivers/i2c/busses/i2c-imx.c:978:2: error: implicit declaration of
      function ‘pinctrl_select_state’ [-Werror=implicit-function-declaration]
        pinctrl_select_state(i2c_imx->pinctrl, i2c_imx->pinctrl_pins_gpio);
        ^
      Signed-off-by: NHou Zhiqiang <Zhiqiang.Hou@freescale.com>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      8bb6fd58
    • J
      usblp: do not set TASK_INTERRUPTIBLE before lock · 19cd80a2
      Jiri Slaby 提交于
      It is not permitted to set task state before lock. usblp_wwait sets
      the state to TASK_INTERRUPTIBLE and calls mutex_lock_interruptible.
      Upon return from that function, the state will be TASK_RUNNING again.
      
      This is clearly a bug and a warning is generated with LOCKDEP too:
      WARNING: CPU: 1 PID: 5109 at kernel/sched/core.c:7404 __might_sleep+0x7d/0x90()
      do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffffa0c588d0>] usblp_wwait+0xa0/0x310 [usblp]
      Modules linked in: ...
      CPU: 1 PID: 5109 Comm: captmon Tainted: G        W       4.2.5-0.gef2823b-default #1
      Hardware name: LENOVO 23252SG/23252SG, BIOS G2ET33WW (1.13 ) 07/24/2012
       ffffffff81a4edce ffff880236ec7ba8 ffffffff81716651 0000000000000000
       ffff880236ec7bf8 ffff880236ec7be8 ffffffff8106e146 0000000000000282
       ffffffff81a50119 000000000000028b 0000000000000000 ffff8802dab7c508
      Call Trace:
      ...
       [<ffffffff8106e1c6>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff8109a8bd>] __might_sleep+0x7d/0x90
       [<ffffffff8171b20f>] mutex_lock_interruptible_nested+0x2f/0x4b0
       [<ffffffffa0c588fc>] usblp_wwait+0xcc/0x310 [usblp]
       [<ffffffffa0c58bb2>] usblp_write+0x72/0x350 [usblp]
       [<ffffffff8121ed98>] __vfs_write+0x28/0xf0
      ...
      
      Commit 7f477358 (usblp: Implement the
      ENOSPC convention) moved the set prior locking. So move it back after
      the lock.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Fixes: 7f477358 ("usblp: Implement the ENOSPC convention")
      Acked-By: NPete Zaitcev <zaitcev@yahoo.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      19cd80a2
    • P
      usb: kconfig: fix warning of select USB_OTG · c4f16130
      Peter Chen 提交于
      When choose randconfig for kernel build, it reports below warning:
      "warning: (USB_OTG_FSM && FSL_USB2_OTG && USB_MV_OTG) selects USB_OTG
      which has unmet direct dependencies (USB_SUPPORT && USB && PM)"
      
      In fact, USB_OTG is visible symbol and depends on PM, so the driver
      needs to depend on it to reduce dependency problem.
      Signed-off-by: NPeter Chen <peter.chen@freescale.com>
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Felipe Balbi <balbi@ti.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4f16130
    • B
      USB: option: add XS Stick W100-2 from 4G Systems · 638148e2
      Bjørn Mork 提交于
      Thomas reports
      "
      4gsystems sells two total different LTE-surfsticks under the same name.
      ..
      The newer version of XS Stick W100 is from "omega"
      ..
      Under windows the driver switches to the same ID, and uses MI03\6 for
      network and MI01\6 for modem.
      ..
      echo "1c9e 9b01" > /sys/bus/usb/drivers/qmi_wwan/new_id
      echo "1c9e 9b01" > /sys/bus/usb-serial/drivers/option1/new_id
      
      T:  Bus=01 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#=  4 Spd=480 MxCh= 0
      D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
      P:  Vendor=1c9e ProdID=9b01 Rev=02.32
      S:  Manufacturer=USB Modem
      S:  Product=USB Modem
      S:  SerialNumber=
      C:  #Ifs= 5 Cfg#= 1 Atr=80 MxPwr=500mA
      I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
      I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
      I:  If#= 4 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage
      
      Now all important things are there:
      
      wwp0s29f7u2i3 (net), ttyUSB2 (at), cdc-wdm0 (qmi), ttyUSB1 (at)
      
      There is also ttyUSB0, but it is not usable, at least not for at.
      
      The device works well with qmi and ModemManager-NetworkManager.
      "
      Reported-by: NThomas Schäfer <tschaefer@t-online.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      638148e2
    • S
      PCI: Fix OF logic in pci_dma_configure() · 768acd64
      Suravee Suthikulpanit 提交于
      This patch fixes a bug introduced by previous commit,
      which incorrectly checkes the of_node of the end-point device.
      Instead, it should check the of_node of the host bridge.
      
      Fixes: 50230713 ("PCI: OF: Move of_pci_dma_configure() to pci_dma_configure()")
      Reported-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      768acd64
    • M
      null_blk: do not del gendisk with lightnvm · 54514aa4
      Matias Bjørling 提交于
      The gendisk structure has not been initialized when using lightnvm.
      Make sure to not delete it upon exit. Also make sure that we use the
      appropriate disk_name at unregistration.
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      54514aa4
    • M
      null_blk: use device addressing mode · 5b40db99
      Matias Bjørling 提交于
      The linear addressing mode was removed in 7386af27. Make null_blk instead
      expose the ppa format geometry and support the generic addressing mode.
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      5b40db99
    • M
      null_blk: use ppa_cache pool · 6bb9535b
      Matias Bjørling 提交于
      Instead of using a page pool, we can save memory by only allocating room
      for 64 entries for the ppa command. Introduce a ppa_cache to allocate only
      the required memory for the ppa list.
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      6bb9535b
    • K
      6824c5ef
    • M
      dm crypt: fix a possible hang due to race condition on exit · bcbd94ff
      Mikulas Patocka 提交于
      A kernel thread executes __set_current_state(TASK_INTERRUPTIBLE),
      __add_wait_queue, spin_unlock_irq and then tests kthread_should_stop().
      It is possible that the processor reorders memory accesses so that
      kthread_should_stop() is executed before __set_current_state().  If such
      reordering happens, there is a possible race on thread termination:
      
      CPU 0:
      calls kthread_should_stop()
      	it tests KTHREAD_SHOULD_STOP bit, returns false
      CPU 1:
      calls kthread_stop(cc->write_thread)
      	sets the KTHREAD_SHOULD_STOP bit
      	calls wake_up_process on the kernel thread, that sets the thread
      	state to TASK_RUNNING
      CPU 0:
      sets __set_current_state(TASK_INTERRUPTIBLE)
      spin_unlock_irq(&cc->write_thread_wait.lock)
      schedule() - and the process is stuck and never terminates, because the
      	state is TASK_INTERRUPTIBLE and wake_up_process on CPU 1 already
      	terminated
      
      Fix this race condition by using a new flag DM_CRYPT_EXIT_THREAD to
      signal that the kernel thread should exit.  The flag is set and tested
      while holding cc->write_thread_wait.lock, so there is no possibility of
      racy access to the flag.
      
      Also, remove the unnecessary set_task_state(current, TASK_RUNNING)
      following the schedule() call.  When the process was woken up, its state
      was already set to TASK_RUNNING.  Other kernel code also doesn't set the
      state to TASK_RUNNING following schedule() (for example,
      do_wait_for_common in completion.c doesn't do it).
      
      Fixes: dc267621 ("dm crypt: offload writes to thread")
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org # v4.0+
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      bcbd94ff
  6. 19 11月, 2015 1 次提交