1. 21 6月, 2019 2 次提交
  2. 20 6月, 2019 3 次提交
    • G
      block: drbd: no need to check return value of debugfs_create functions · d27e84a3
      Greg Kroah-Hartman 提交于
      When calling debugfs functions, there is no need to ever check the
      return value.  The function can work or not, but the code logic should
      never do something different based on this.
      
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: drbd-dev@lists.linbit.com
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d27e84a3
    • C
      null_blk: remove duplicate 0 initialization · 8c54803b
      Chaitanya Kulkarni 提交于
      In function null_add_dev() struct nullb *nullb member is allocated
      using kzalloc_node() which returns 0red memory.
      
      In function setup_queues() which is called from the null_add_dev(), on
      successful queue allocation we set the nullb->nr_queues = 0 which is not
      needed due to earlier use of kzalloc_node().
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8c54803b
    • A
      floppy: fix harmless clang build warning · 2af47c10
      Arnd Bergmann 提交于
      clang warns about unusual code in floppy.c that looks like it
      was intended to be a bit mask operation, checking for a specific
      bit in the UDP->cmos variable (FLOPPY1_TYPE expands to '4' on
      ARM):
      
      drivers/block/floppy.c:3902:17: error: use of logical '&&' with constant operand [-Werror,-Wconstant-logical-operand]
              if (!UDP->cmos && FLOPPY1_TYPE)
                             ^  ~~~~~~~~~~~~
      drivers/block/floppy.c:3902:17: note: use '&' for a bitwise operation
              if (!UDP->cmos && FLOPPY1_TYPE)
      
      The check here is redundant anyway, if FLOPPY1_TYPE is zero, then
      assigning it to a zero UDP->cmos field does not change anything,
      so removing the extra check here has no effect other than shutting
      up the warning.
      
      On x86, this will no longer read a hardware register, as the
      FLOPPY1_TYPE macro is not expanded if UDP->cmos is already
      zero, but the result is the same.
      
      Cc: Robert Elliott <elliott@hpe.com>
      Cc: Keith Busch <kbusch@kernel.org>
      Link: https://patchwork.kernel.org/patch/10851841/Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2af47c10
  3. 15 6月, 2019 9 次提交
  4. 14 6月, 2019 4 次提交
  5. 13 6月, 2019 5 次提交
    • G
      block/ps3vram: Use %llu to format sector_t after LBDAF removal · 1d0c0651
      Geert Uytterhoeven 提交于
      The removal of CONFIG_LBDAF changed the type of sector_t from "unsigned
      long" to "u64" aka "unsigned long long" on 64-bit platforms, leading to
      a compiler warning regression:
      
          drivers/block/ps3vram.c: In function ‘ps3vram_probe’:
          drivers/block/ps3vram.c:770:23: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 4 has type ‘sector_t {aka long long unsigned int}’ [-Wformat=]
      
      Fix this by using "%llu" instead.
      
      Fixes: 72deb455 ("block: remove CONFIG_LBDAF")
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1d0c0651
    • H
      libata: Extend quirks for the ST1000LM024 drives with NOLPM quirk · 31f6264e
      Hans de Goede 提交于
      We've received a bugreport that using LPM with ST1000LM024 drives leads
      to system lockups. So it seems that these models are buggy in more then
      1 way. Add NOLPM quirk to the existing quirks entry for BROKEN_FPDMA_AA.
      
      BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1571330
      Cc: stable@vger.kernel.org
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      31f6264e
    • C
      bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached · 1f0ffa67
      Coly Li 提交于
      When people set a writeback percent via sysfs file,
        /sys/block/bcache<N>/bcache/writeback_percent
      current code directly sets BCACHE_DEV_WB_RUNNING to dc->disk.flags
      and schedules kworker dc->writeback_rate_update.
      
      If there is no cache set attached to, the writeback kernel thread is
      not running indeed, running dc->writeback_rate_update does not make
      sense and may cause NULL pointer deference when reference cache set
      pointer inside update_writeback_rate().
      
      This patch checks whether the cache set point (dc->disk.c) is NULL in
      sysfs interface handler, and only set BCACHE_DEV_WB_RUNNING and
      schedule dc->writeback_rate_update when dc->disk.c is not NULL (it
      means the cache device is attached to a cache set).
      
      This problem might be introduced from initial bcache commit, but
      commit 3fd47bfe ("bcache: stop dc->writeback_rate_update properly")
      changes part of the original code piece, so I add 'Fixes: 3fd47bfe'
      to indicate from which commit this patch can be applied.
      
      Fixes: 3fd47bfe ("bcache: stop dc->writeback_rate_update properly")
      Reported-by: NBjørn Forsman <bjorn.forsman@gmail.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NBjørn Forsman <bjorn.forsman@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1f0ffa67
    • C
      bcache: fix stack corruption by PRECEDING_KEY() · 31b90956
      Coly Li 提交于
      Recently people report bcache code compiled with gcc9 is broken, one of
      the buggy behavior I observe is that two adjacent 4KB I/Os should merge
      into one but they don't. Finally it turns out to be a stack corruption
      caused by macro PRECEDING_KEY().
      
      See how PRECEDING_KEY() is defined in bset.h,
      437 #define PRECEDING_KEY(_k)                                       \
      438 ({                                                              \
      439         struct bkey *_ret = NULL;                               \
      440                                                                 \
      441         if (KEY_INODE(_k) || KEY_OFFSET(_k)) {                  \
      442                 _ret = &KEY(KEY_INODE(_k), KEY_OFFSET(_k), 0);  \
      443                                                                 \
      444                 if (!_ret->low)                                 \
      445                         _ret->high--;                           \
      446                 _ret->low--;                                    \
      447         }                                                       \
      448                                                                 \
      449         _ret;                                                   \
      450 })
      
      At line 442, _ret points to address of a on-stack variable combined by
      KEY(), the life range of this on-stack variable is in line 442-446,
      once _ret is returned to bch_btree_insert_key(), the returned address
      points to an invalid stack address and this address is overwritten in
      the following called bch_btree_iter_init(). Then argument 'search' of
      bch_btree_iter_init() points to some address inside stackframe of
      bch_btree_iter_init(), exact address depends on how the compiler
      allocates stack space. Now the stack is corrupted.
      
      Fixes: 0eacac22 ("bcache: PRECEDING_KEY()")
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NRolf Fokkens <rolf@rolffokkens.nl>
      Reviewed-by: NPierre JUHEN <pierre.juhen@orange.fr>
      Tested-by: NShenghui Wang <shhuiw@foxmail.com>
      Tested-by: NPierre JUHEN <pierre.juhen@orange.fr>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Nix <nix@esperi.org.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      31b90956
    • C
      null_blk: remove duplicate check for report zone · 154085ff
      Chaitanya Kulkarni 提交于
      This patch removes the check in the null_blk_zoned for report zone
      command, where it checks for the dev-,>zoned before executing the report
      zone.
      
      The null_zone_report() function is a block_device operation callback
      which is initialized in the null_blk_main.c and gets called as a part
      of blkdev for report zone IOCTL (BLKREPORTZONE).
      
      blkdev_ioctl()
      blkdev_report_zones_ioctl()
              blkdev_report_zones()
                      blk_report_zones()
                              disk->fops->report_zones()
                                      nullb_zone_report();
      
      The null_zone_report() will never get executed on the non-zoned block
      device, in the non zoned block device blk_queue_is_zoned() will always
      be false which is first check the blkdev_report_zones_ioctl()
      before actual low level driver report zone callback is executed.
      
      Here is the detailed scenario:-
      
      1. modprobe null_blk
      null_init
      null_alloc_dev
              dev->zoned = 0
      null_add_dev
              dev->zoned == 0
                      so we don't set the q->limits.zoned = BLK_ZONED_HR
      
      2. blkzone report /dev/nullb0
      
      blkdev_ioctl()
      blkdev_report_zones_ioctl()
              blk_queue_is_zoned()
                      blk_queue_is_zoned
                              q->limits.zoned == 0
                              return false
              if (!blk_queue_is_zoned(q)) <--- true
                      return -ENOTTY;
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      154085ff
  6. 12 6月, 2019 3 次提交
  7. 08 6月, 2019 2 次提交
    • R
      i2c: xiic: Add max_read_len quirk · 49b80958
      Robert Hancock 提交于
      This driver does not support reading more than 255 bytes at once because
      the register for storing the number of bytes to read is only 8 bits. Add
      a max_read_len quirk to enforce this.
      
      This was found when using this driver with the SFP driver, which was
      previously reading all 256 bytes in the SFP EEPROM in one transaction.
      This caused a bunch of hard-to-debug errors in the xiic driver since the
      driver/logic was treating the number of bytes to read as zero.
      Rejecting transactions that aren't supported at least allows the problem
      to be diagnosed more easily.
      Signed-off-by: NRobert Hancock <hancock@sedsystems.ca>
      Reviewed-by: NMichal Simek <michal.simek@xilinx.com>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      Cc: stable@kernel.org
      49b80958
    • H
      gpio: pca953x: hack to fix 24 bit gpio expanders · 3b00691c
      H. Nikolaus Schaller 提交于
      24 bit expanders use REG_ADDR_AI in combination with register addressing. This
      conflicts with regmap which takes this bit as part of the register number,
      i.e. a second cache entry is defined for accessed with REG_ADDR_AI being
      set although on the chip it is the same register as with REG_ADDR_AI being
      cleared.
      
      The problem was introduced by
      
      	commit b32cecb4 ("gpio: pca953x: Extract the register address mangling to single function")
      
      but only became visible by
      
      	commit 8b9f9d4d ("regmap: verify if register is writeable before writing operations")
      
      because before, the regmap size was effectively ignored and
      pca953x_writeable_register() did know to ignore REG_ADDR_AI. Still, there
      were two separate cache entries created.
      
      Since the use of REG_ADDR_AI seems to be static we can work around this
      issue by simply increasing the size of the regmap to cover the "virtual"
      registers with REG_ADDR_AI being set. This only means that half of the
      regmap buffer will be unused.
      Reported-by: NH. Nikolaus Schaller <hns@goldelico.com>
      Suggested-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NH. Nikolaus Schaller <hns@goldelico.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      3b00691c
  8. 07 6月, 2019 12 次提交
    • B
      drm/nouveau/secboot/gp10[2467]: support newer FW to fix SEC2 failures on some boards · ab4bec16
      Ben Skeggs 提交于
      Some newer boards with these chipsets aren't compatible with the prior
      version of the SEC2 FW, and fail to load as a result.
      
      This newer FW is actually the one we already use on >=GP108.
      
      Unfortunately, there are interface differences in GP108's FW, making it
      impossible to simply move files around in linux-firmware to solve this.
      
      We need to be able to keep compatibility with all linux-firmware/kernel
      combinations, which means supporting both firmwares.
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      ab4bec16
    • B
      drm/nouveau/secboot: enable loading of versioned LS PMU/SEC2 ACR msgqueue FW · 9352ce37
      Ben Skeggs 提交于
      Some chipsets will be switching to updated SEC2 LS firmware, so we need to
      plumb that through.
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      9352ce37
    • B
      drm/nouveau/secboot: split out FW version-specific LS function pointers · 5f0f8b57
      Ben Skeggs 提交于
      It's not enough to have per-falcon structures anymore, we have multiple
      versions of some firmware now that have interface differences.
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      5f0f8b57
    • B
      drm/nouveau/secboot: pass max supported FW version to LS load funcs · c26f3061
      Ben Skeggs 提交于
      Will be passed to the FW loader function as an upper bound on the supported
      FW version to attempt to load.
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      c26f3061
    • B
      drm/nouveau/core: support versioned firmware loading · 475cf02b
      Ben Skeggs 提交于
      We have a need for this now with updated SEC2 LS FW images that have an
      incompatible interface from the previous version.
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      475cf02b
    • B
      drm/nouveau/core: pass subdev into nvkm_firmware_get, rather than device · 8854eed1
      Ben Skeggs 提交于
      It'd be nice to have FW loading debug messages to appear for the relevant
      subsystem, when enabled.
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      8854eed1
    • P
      vfio/mdev: Synchronize device create/remove with parent removal · 5715c4dd
      Parav Pandit 提交于
      In following sequences, child devices created while removing mdev parent
      device can be left out, or it may lead to race of removing half
      initialized child mdev devices.
      
      issue-1:
      --------
             cpu-0                         cpu-1
             -----                         -----
                                        mdev_unregister_device()
                                          device_for_each_child()
                                            mdev_device_remove_cb()
                                              mdev_device_remove()
      create_store()
        mdev_device_create()                   [...]
          device_add()
                                        parent_remove_sysfs_files()
      
      /* BUG: device added by cpu-0
       * whose parent is getting removed
       * and it won't process this mdev.
       */
      
      issue-2:
      --------
      Below crash is observed when user initiated remove is in progress
      and mdev_unregister_driver() completes parent unregistration.
      
             cpu-0                         cpu-1
             -----                         -----
      remove_store()
         mdev_device_remove()
         active = false;
                                        mdev_unregister_device()
                                        parent device removed.
         [...]
         parents->ops->remove()
       /*
        * BUG: Accessing invalid parent.
        */
      
      This is similar race like create() racing with mdev_unregister_device().
      
      BUG: unable to handle kernel paging request at ffffffffc0585668
      PGD e8f618067 P4D e8f618067 PUD e8f61a067 PMD 85adca067 PTE 0
      Oops: 0000 [#1] SMP PTI
      CPU: 41 PID: 37403 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
      Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
      RIP: 0010:mdev_device_remove+0xfa/0x140 [mdev]
      Call Trace:
       remove_store+0x71/0x90 [mdev]
       kernfs_fop_write+0x113/0x1a0
       vfs_write+0xad/0x1b0
       ksys_write+0x5a/0xe0
       do_syscall_64+0x5a/0x210
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Therefore, mdev core is improved as below to overcome above issues.
      
      Wait for any ongoing mdev create() and remove() to finish before
      unregistering parent device.
      This continues to allow multiple create and remove to progress in
      parallel for different mdev devices as most common case.
      At the same time guard parent removal while parent is being accessed by
      create() and remove() callbacks.
      create()/remove() and unregister_device() are synchronized by the rwsem.
      
      Refactor device removal code to mdev_device_remove_common() to avoid
      acquiring unreg_sem of the parent.
      
      Fixes: 7b96953b ("vfio: Mediated device Core driver")
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      5715c4dd
    • P
      vfio/mdev: Avoid creating sysfs remove file on stale device removal · 26c9e398
      Parav Pandit 提交于
      If device is removal is initiated by two threads as below, mdev core
      attempts to create a syfs remove file on stale device.
      During this flow, below [1] call trace is observed.
      
           cpu-0                                    cpu-1
           -----                                    -----
        mdev_unregister_device()
          device_for_each_child
             mdev_device_remove_cb
                mdev_device_remove
                                             user_syscall
                                               remove_store()
                                                 mdev_device_remove()
                                              [..]
         unregister device();
                                             /* not found in list or
                                              * active=false.
                                              */
                                                sysfs_create_file()
                                                ..Call trace
      
      Now that mdev core follows correct device removal sequence of the linux
      bus model, remove shouldn't fail in normal cases. If it fails, there is
      no point of creating a stale file or checking for specific error status.
      
      kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327
      sysfs_create_file_ns+0x7f/0x90
      kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted
      5.1.0-rc6-vdevbus+ #6
      kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b
      08/09/2016
      kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
      kernel: Call Trace:
      kernel: remove_store+0xdc/0x100 [mdev]
      kernel: kernfs_fop_write+0x113/0x1a0
      kernel: vfs_write+0xad/0x1b0
      kernel: ksys_write+0x5a/0xe0
      kernel: do_syscall_64+0x5a/0x210
      kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      26c9e398
    • M
      net: mvpp2: Use strscpy to handle stat strings · d37acd5a
      Maxime Chevallier 提交于
      Use a safe strscpy call to copy the ethtool stat strings into the
      relevant buffers, instead of a memcpy that will be accessing
      out-of-bound data.
      
      Fixes: 118d6298 ("net: mvpp2: add ethtool GOP statistics")
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d37acd5a
    • M
      nvme-rdma: use dynamic dma mapping per command · 62f99b62
      Max Gurtovoy 提交于
      Commit 87fd1253 ("nvme-rdma: remove redundant reference between
      ib_device and tagset") caused a kernel panic when disconnecting from an
      inaccessible controller (disconnect during re-connection).
      
      --
      nvme nvme0: Removing ctrl: NQN "testnqn1"
      nvme_rdma: nvme_rdma_exit_request: hctx 0 queue_idx 1
      BUG: unable to handle kernel paging request at 0000000080000228
      PGD 0 P4D 0
      Oops: 0000 [#1] SMP PTI
      ...
      Call Trace:
       blk_mq_exit_hctx+0x5c/0xf0
       blk_mq_exit_queue+0xd4/0x100
       blk_cleanup_queue+0x9a/0xc0
       nvme_rdma_destroy_io_queues+0x52/0x60 [nvme_rdma]
       nvme_rdma_shutdown_ctrl+0x3e/0x80 [nvme_rdma]
       nvme_do_delete_ctrl+0x53/0x80 [nvme_core]
       nvme_sysfs_delete+0x45/0x60 [nvme_core]
       kernfs_fop_write+0x105/0x180
       vfs_write+0xad/0x1a0
       ksys_write+0x5a/0xd0
       do_syscall_64+0x55/0x110
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x7fa215417154
      --
      
      The reason for this crash is accessing an already freed ib_device for
      performing dma_unmap during exit_request commands. The root cause for
      that is that during re-connection all the queues are destroyed and
      re-created (and the ib_device is reference counted by the queues and
      freed as well) but the tagset stays alive and all the DMA mappings (that
      we perform in init_request) kept in the request context. The original
      commit fixed a different bug that was introduced during bonding (aka nic
      teaming) tests that for some scenarios change the underlying ib_device
      and caused memory leakage and possible segmentation fault. This commit
      is a complementary commit that also changes the wrong DMA mappings that
      were saved in the request context and making the request sqe dma
      mappings dynamic with the command lifetime (i.e. mapped in .queue_rq and
      unmapped in .complete). It also fixes the above crash of accessing freed
      ib_device during destruction of the tagset.
      
      Fixes: 87fd1253 ("nvme-rdma: remove redundant reference between ib_device and tagset")
      Reported-by: NJim Harris <james.r.harris@intel.com>
      Suggested-by: NSagi Grimberg <sagi@grimberg.me>
      Tested-by: NJim Harris <james.r.harris@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      62f99b62
    • J
      nvme: Fix u32 overflow in the number of namespace list calculation · c8e8c77b
      Jaesoo Lee 提交于
      The Number of Namespaces (nn) field in the identify controller data structure is
      defined as u32 and the maximum allowed value in NVMe specification is
      0xFFFFFFFEUL. This change fixes the possible overflow of the DIV_ROUND_UP()
      operation used in nvme_scan_ns_list() by casting the nn to u64.
      Signed-off-by: NJaesoo Lee <jalee@purestorage.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      c8e8c77b
    • P
      vfio/mdev: Improve the create/remove sequence · 522ecce0
      Parav Pandit 提交于
      This patch addresses below two issues and prepares the code to address
      3rd issue listed below.
      
      1. mdev device is placed on the mdev bus before it is created in the
      vendor driver. Once a device is placed on the mdev bus without creating
      its supporting underlying vendor device, mdev driver's probe() gets
      triggered.  However there isn't a stable mdev available to work on.
      
         create_store()
           mdev_create_device()
             device_register()
                ...
               vfio_mdev_probe()
              [...]
              parent->ops->create()
                vfio_ap_mdev_create()
                  mdev_set_drvdata(mdev, matrix_mdev);
                  /* Valid pointer set above */
      
      Due to this way of initialization, mdev driver who wants to use the mdev,
      doesn't have a valid mdev to work on.
      
      2. Current creation sequence is,
         parent->ops_create()
         groups_register()
      
      Remove sequence is,
         parent->ops->remove()
         groups_unregister()
      
      However, remove sequence should be exact mirror of creation sequence.
      Once this is achieved, all users of the mdev will be terminated first
      before removing underlying vendor device.
      (Follow standard linux driver model).
      At that point vendor's remove() ops shouldn't fail because taking the
      device off the bus should terminate any usage.
      
      3. When remove operation fails, mdev sysfs removal attempts to add the
      file back on already removed device. Following call trace [1] is observed.
      
      [1] call trace:
      kernel: WARNING: CPU: 2 PID: 9348 at fs/sysfs/file.c:327 sysfs_create_file_ns+0x7f/0x90
      kernel: CPU: 2 PID: 9348 Comm: bash Kdump: loaded Not tainted 5.1.0-rc6-vdevbus+ #6
      kernel: Hardware name: Supermicro SYS-6028U-TR4+/X10DRU-i+, BIOS 2.0b 08/09/2016
      kernel: RIP: 0010:sysfs_create_file_ns+0x7f/0x90
      kernel: Call Trace:
      kernel: remove_store+0xdc/0x100 [mdev]
      kernel: kernfs_fop_write+0x113/0x1a0
      kernel: vfs_write+0xad/0x1b0
      kernel: ksys_write+0x5a/0xe0
      kernel: do_syscall_64+0x5a/0x210
      kernel: entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Therefore, mdev core is improved in following ways.
      
      1. Split the device registration/deregistration sequence so that some
      things can be done between initialization of the device and hooking it
      up to the bus respectively after deregistering it from the bus but
      before giving up our final reference.
      In particular, this means invoking the ->create() and ->remove()
      callbacks in those new windows. This gives the vendor driver an
      initialized mdev device to work with during creation.
      At the same time, a bus driver who wish to bind to mdev driver also
      gets initialized mdev device.
      
      This follows standard Linux kernel bus and device model.
      
      2. During remove flow, first remove the device from the bus. This
      ensures that any bus specific devices are removed.
      Once device is taken off the mdev bus, invoke remove() of mdev
      from the vendor driver.
      
      3. The driver core device model provides way to register and auto
      unregister the device sysfs attribute groups at dev->groups.
      Make use of dev->groups to let core create the groups and eliminate
      code to avoid explicit groups creation and removal.
      
      To ensure, that new sequence is solid, a below stack dump of a
      process is taken who attempts to remove the device while device is in
      use by vfio driver and user application.
      This stack dump validates that vfio driver guards against such device
      removal when device is in use.
      
       cat /proc/21962/stack
      [<0>] vfio_del_group_dev+0x216/0x3c0 [vfio]
      [<0>] mdev_remove+0x21/0x40 [mdev]
      [<0>] device_release_driver_internal+0xe8/0x1b0
      [<0>] bus_remove_device+0xf9/0x170
      [<0>] device_del+0x168/0x350
      [<0>] mdev_device_remove_common+0x1d/0x50 [mdev]
      [<0>] mdev_device_remove+0x8c/0xd0 [mdev]
      [<0>] remove_store+0x71/0x90 [mdev]
      [<0>] kernfs_fop_write+0x113/0x1a0
      [<0>] vfs_write+0xad/0x1b0
      [<0>] ksys_write+0x5a/0xe0
      [<0>] do_syscall_64+0x5a/0x210
      [<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [<0>] 0xffffffffffffffff
      
      This prepares the code to eliminate calling device_create_file() in
      subsequent patch.
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Signed-off-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      522ecce0