1. 02 12月, 2020 2 次提交
  2. 01 12月, 2020 7 次提交
    • Z
      md/cluster: fix deadlock when node is doing resync job · bca5b065
      Zhao Heming 提交于
      md-cluster uses MD_CLUSTER_SEND_LOCK to make node can exclusively send msg.
      During sending msg, node can concurrently receive msg from another node.
      When node does resync job, grab token_lockres:EX may trigger a deadlock:
      ```
      nodeA                       nodeB
      --------------------     --------------------
      a.
      send METADATA_UPDATED
      held token_lockres:EX
                               b.
                               md_do_sync
                                resync_info_update
                                  send RESYNCING
                                   + set MD_CLUSTER_SEND_LOCK
                                   + wait for holding token_lockres:EX
      
                               c.
                               mdadm /dev/md0 --remove /dev/sdg
                                + held reconfig_mutex
                                + send REMOVE
                                   + wait_event(MD_CLUSTER_SEND_LOCK)
      
                               d.
                               recv_daemon //METADATA_UPDATED from A
                                process_metadata_update
                                 + (mddev_trylock(mddev) ||
                                    MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD)
                                   //this time, both return false forever
      ```
      Explaination:
      a. A send METADATA_UPDATED
         This will block another node to send msg
      
      b. B does sync jobs, which will send RESYNCING at intervals.
         This will be block for holding token_lockres:EX lock.
      
      c. B do "mdadm --remove", which will send REMOVE.
         This will be blocked by step <b>: MD_CLUSTER_SEND_LOCK is 1.
      
      d. B recv METADATA_UPDATED msg, which send from A in step <a>.
         This will be blocked by step <c>: holding mddev lock, it makes
         wait_event can't hold mddev lock. (btw,
         MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD keep ZERO in this scenario.)
      
      There is a similar deadlock in commit 0ba95977
      ("md-cluster: use sync way to handle METADATA_UPDATED msg")
      In that commit, step c is "update sb". This patch step c is
      "mdadm --remove".
      
      For fixing this issue, we can refer the solution of function:
      metadata_update_start. Which does the same grab lock_token action.
      lock_comm can use the same steps to avoid deadlock. By moving
      MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD from lock_token to lock_comm.
      It enlarge a little bit window of MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD,
      but it is safe & can break deadlock.
      
      Repro steps (I only triggered 3 times with hundreds tests):
      
      two nodes share 3 iSCSI luns: sdg/sdh/sdi. Each lun size is 1GB.
      ```
      ssh root@node2 "mdadm -S --scan"
      mdadm -S --scan
      for i in {g,h,i};do dd if=/dev/zero of=/dev/sd$i oflag=direct bs=1M \
      count=20; done
      
      mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sdg /dev/sdh \
       --bitmap-chunk=1M
      ssh root@node2 "mdadm -A /dev/md0 /dev/sdg /dev/sdh"
      
      sleep 5
      
      mkfs.xfs /dev/md0
      mdadm --manage --add /dev/md0 /dev/sdi
      mdadm --wait /dev/md0
      mdadm --grow --raid-devices=3 /dev/md0
      
      mdadm /dev/md0 --fail /dev/sdg
      mdadm /dev/md0 --remove /dev/sdg
      mdadm --grow --raid-devices=2 /dev/md0
      ```
      
      test script will hung when executing "mdadm --remove".
      
      ```
       # dump stacks by "echo t > /proc/sysrq-trigger"
      md0_cluster_rec D    0  5329      2 0x80004000
      Call Trace:
       __schedule+0x1f6/0x560
       ? _cond_resched+0x2d/0x40
       ? schedule+0x4a/0xb0
       ? process_metadata_update.isra.0+0xdb/0x140 [md_cluster]
       ? wait_woken+0x80/0x80
       ? process_recvd_msg+0x113/0x1d0 [md_cluster]
       ? recv_daemon+0x9e/0x120 [md_cluster]
       ? md_thread+0x94/0x160 [md_mod]
       ? wait_woken+0x80/0x80
       ? md_congested+0x30/0x30 [md_mod]
       ? kthread+0x115/0x140
       ? __kthread_bind_mask+0x60/0x60
       ? ret_from_fork+0x1f/0x40
      
      mdadm           D    0  5423      1 0x00004004
      Call Trace:
       __schedule+0x1f6/0x560
       ? __schedule+0x1fe/0x560
       ? schedule+0x4a/0xb0
       ? lock_comm.isra.0+0x7b/0xb0 [md_cluster]
       ? wait_woken+0x80/0x80
       ? remove_disk+0x4f/0x90 [md_cluster]
       ? hot_remove_disk+0xb1/0x1b0 [md_mod]
       ? md_ioctl+0x50c/0xba0 [md_mod]
       ? wait_woken+0x80/0x80
       ? blkdev_ioctl+0xa2/0x2a0
       ? block_ioctl+0x39/0x40
       ? ksys_ioctl+0x82/0xc0
       ? __x64_sys_ioctl+0x16/0x20
       ? do_syscall_64+0x5f/0x150
       ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      md0_resync      D    0  5425      2 0x80004000
      Call Trace:
       __schedule+0x1f6/0x560
       ? schedule+0x4a/0xb0
       ? dlm_lock_sync+0xa1/0xd0 [md_cluster]
       ? wait_woken+0x80/0x80
       ? lock_token+0x2d/0x90 [md_cluster]
       ? resync_info_update+0x95/0x100 [md_cluster]
       ? raid1_sync_request+0x7d3/0xa40 [raid1]
       ? md_do_sync.cold+0x737/0xc8f [md_mod]
       ? md_thread+0x94/0x160 [md_mod]
       ? md_congested+0x30/0x30 [md_mod]
       ? kthread+0x115/0x140
       ? __kthread_bind_mask+0x60/0x60
       ? ret_from_fork+0x1f/0x40
      ```
      
      At last, thanks for Xiao's solution.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NZhao Heming <heming.zhao@suse.com>
      Suggested-by: NXiao Ni <xni@redhat.com>
      Reviewed-by: NXiao Ni <xni@redhat.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      bca5b065
    • Z
      md/cluster: block reshape with remote resync job · a8da01f7
      Zhao Heming 提交于
      Reshape request should be blocked with ongoing resync job. In cluster
      env, a node can start resync job even if the resync cmd isn't executed
      on it, e.g., user executes "mdadm --grow" on node A, sometimes node B
      will start resync job. However, current update_raid_disks() only check
      local recovery status, which is incomplete. As a result, we see user will
      execute "mdadm --grow" successfully on local, while the remote node deny
      to do reshape job when it doing resync job. The inconsistent handling
      cause array enter unexpected status. If user doesn't observe this issue
      and continue executing mdadm cmd, the array doesn't work at last.
      
      Fix this issue by blocking reshape request. When node executes "--grow"
      and detects ongoing resync, it should stop and report error to user.
      
      The following script reproduces the issue with ~100% probability.
      (two nodes share 3 iSCSI luns: sdg/sdh/sdi. Each lun size is 1GB)
      ```
       # on node1, node2 is the remote node.
      ssh root@node2 "mdadm -S --scan"
      mdadm -S --scan
      for i in {g,h,i};do dd if=/dev/zero of=/dev/sd$i oflag=direct bs=1M \
      count=20; done
      
      mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sdg /dev/sdh
      ssh root@node2 "mdadm -A /dev/md0 /dev/sdg /dev/sdh"
      
      sleep 5
      
      mdadm --manage --add /dev/md0 /dev/sdi
      mdadm --wait /dev/md0
      mdadm --grow --raid-devices=3 /dev/md0
      
      mdadm /dev/md0 --fail /dev/sdg
      mdadm /dev/md0 --remove /dev/sdg
      mdadm --grow --raid-devices=2 /dev/md0
      ```
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NZhao Heming <heming.zhao@suse.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      a8da01f7
    • P
      md: use current request time as base for ktime comparisons · a23f2aae
      Pankaj Gupta 提交于
      Request coalescing logic uses 'prev_flush_start' as base to
      compare the current request start time. 'prev_flush_start' is
      updated in other context.
      
      This patch changes this by using ktime comparison base to
      'req_start' for better readability of code.
      Signed-off-by: NPankaj Gupta <pankaj.gupta@cloud.ionos.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      a23f2aae
    • P
      md: add comments in md_flush_request() · 204d1a64
      Pankaj Gupta 提交于
      Request coalescing logic is dependent on flush time update in other
      context. This patch adds comments to understand the code flow better.
      Signed-off-by: NPankaj Gupta <pankaj.gupta@cloud.ionos.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      204d1a64
    • P
      md: improve variable names in md_flush_request() · 81ba3c24
      Pankaj Gupta 提交于
      This patch improves readability by using better variable names
      in flush request coalescing logic.
      Signed-off-by: NPankaj Gupta <pankaj.gupta@cloud.ionos.com>
      Reviewed-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      81ba3c24
    • K
      md/raid10: initialize r10_bio->read_slot before use. · 93decc56
      Kevin Vigor 提交于
      In __make_request() a new r10bio is allocated and passed to
      raid10_read_request(). The read_slot member of the bio is not
      initialized, and the raid10_read_request() uses it to index an
      array. This leads to occasional panics.
      
      Fix by initializing the field to invalid value and checking for
      valid value in raid10_read_request().
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NKevin Vigor <kvigor@gmail.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      93decc56
    • D
      md: fix a warning caused by a race between concurrent md_ioctl()s · c731b84b
      Dae R. Jeong 提交于
      Syzkaller reports a warning as belows.
      WARNING: CPU: 0 PID: 9647 at drivers/md/md.c:7169
      ...
      Call Trace:
      ...
      RIP: 0010:md_ioctl+0x4017/0x5980 drivers/md/md.c:7169
      RSP: 0018:ffff888096027950 EFLAGS: 00010293
      RAX: ffff88809322c380 RBX: 0000000000000932 RCX: ffffffff84e266f2
      RDX: 0000000000000000 RSI: ffffffff84e299f7 RDI: 0000000000000007
      RBP: ffff888096027bc0 R08: ffff88809322c380 R09: ffffed101341a482
      R10: ffff888096027940 R11: ffff88809a0d240f R12: 0000000000000932
      R13: ffff8880a2c14100 R14: ffff88809a0d2268 R15: ffff88809a0d2408
       __blkdev_driver_ioctl block/ioctl.c:304 [inline]
       blkdev_ioctl+0xece/0x1c10 block/ioctl.c:606
       block_ioctl+0xee/0x130 fs/block_dev.c:1930
       vfs_ioctl fs/ioctl.c:46 [inline]
       file_ioctl fs/ioctl.c:509 [inline]
       do_vfs_ioctl+0xd5f/0x1380 fs/ioctl.c:696
       ksys_ioctl+0xab/0xd0 fs/ioctl.c:713
       __do_sys_ioctl fs/ioctl.c:720 [inline]
       __se_sys_ioctl fs/ioctl.c:718 [inline]
       __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
       do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      This is caused by a race between two concurrenct md_ioctl()s closing
      the array.
      CPU1 (md_ioctl())                   CPU2 (md_ioctl())
      ------                              ------
      set_bit(MD_CLOSING, &mddev->flags);
      did_set_md_closing = true;
                                          WARN_ON_ONCE(test_bit(MD_CLOSING,
                                                  &mddev->flags));
      if(did_set_md_closing)
          clear_bit(MD_CLOSING, &mddev->flags);
      
      Fix the warning by returning immediately if the MD_CLOSING bit is set
      in &mddev->flags which indicates that the array is being closed.
      
      Fixes: 065e519e ("md: MD_CLOSING needs to be cleared after called md_set_readonly or do_md_stop")
      Reported-by: syzbot+1e46a0864c1a6e9bd3d8@syzkaller.appspotmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: NDae R. Jeong <dae.r.jeong@kaist.ac.kr>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      c731b84b
  3. 16 11月, 2020 28 次提交
  4. 14 11月, 2020 3 次提交
    • L
      drm/nouveau/kms/nv50-: Use atomic encoder callbacks everywhere · 5c6fb4b2
      Lyude Paul 提交于
      It turns out that I forgot to go through and make sure that I converted all
      encoder callbacks to use atomic_enable/atomic_disable(), so let's go and
      actually do that.
      Signed-off-by: NLyude Paul <lyude@redhat.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Fixes: 09838c4e ("drm/nouveau/kms: Search for encoders' connectors properly")
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      5c6fb4b2
    • B
      drm/nouveau/ttm: avoid using nouveau_drm.ttm.type_vram prior to nv50 · 6c27ffab
      Ben Skeggs 提交于
      Pre-NV50 chipsets don't currently use the MMU subsystem that later
      chipsets use, and type_vram is negative here, leading to an OOB memory
      access.
      
      This was previously guarded by a chipset check, restore that.
      Reported-by: NThomas Zimmermann <tzimmermann@suse.de>
      Fixes: 5839172f ("drm/nouveau: explicitly specify caching to use")
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      6c27ffab
    • A
      drm/nouveau/kms: Fix NULL pointer dereference in nouveau_connector_detect_depth · 630f5122
      Alexander Kapshuk 提交于
      This oops manifests itself on the following hardware:
      01:00.0 VGA compatible controller: NVIDIA Corporation G98M [GeForce G 103M] (rev a1)
      
      Oct 09 14:17:46 lp-sasha kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
      Oct 09 14:17:46 lp-sasha kernel: #PF: supervisor read access in kernel mode
      Oct 09 14:17:46 lp-sasha kernel: #PF: error_code(0x0000) - not-present page
      Oct 09 14:17:46 lp-sasha kernel: PGD 0 P4D 0
      Oct 09 14:17:46 lp-sasha kernel: Oops: 0000 [#1] SMP PTI
      Oct 09 14:17:46 lp-sasha kernel: CPU: 1 PID: 191 Comm: systemd-udevd Not tainted 5.9.0-rc8-next-20201009 #38
      Oct 09 14:17:46 lp-sasha kernel: Hardware name: Hewlett-Packard Compaq Presario CQ61 Notebook PC/306A, BIOS F.03 03/23/2009
      Oct 09 14:17:46 lp-sasha kernel: RIP: 0010:nouveau_connector_detect_depth+0x71/0xc0 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel: Code: 0a 00 00 48 8b 49 48 c7 87 b8 00 00 00 06 00 00 00 80 b9 4d 0a 00 00 00 75 1e 83 fa 41 75 05 48 85 c0 75 29 8b 81 10 0d 00 00 <39> 06 7c 25 f6 81 14 0d 00 00 02 75 b7 c3 80 b9 0c 0d 00 00 00 75
      Oct 09 14:17:46 lp-sasha kernel: RSP: 0018:ffffc9000028f8c0 EFLAGS: 00010297
      Oct 09 14:17:46 lp-sasha kernel: RAX: 0000000000014c08 RBX: ffff8880369d4000 RCX: ffff8880369d3000
      Oct 09 14:17:46 lp-sasha kernel: RDX: 0000000000000040 RSI: 0000000000000000 RDI: ffff8880369d4000
      Oct 09 14:17:46 lp-sasha kernel: RBP: ffff88800601cc00 R08: ffff8880051da298 R09: ffffffff8226201a
      Oct 09 14:17:46 lp-sasha kernel: R10: ffff88800469aa80 R11: ffff888004c84ff8 R12: 0000000000000000
      Oct 09 14:17:46 lp-sasha kernel: R13: ffff8880051da000 R14: 0000000000002000 R15: 0000000000000003
      Oct 09 14:17:46 lp-sasha kernel: FS:  00007fd0192b3440(0000) GS:ffff8880bc900000(0000) knlGS:0000000000000000
      Oct 09 14:17:46 lp-sasha kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Oct 09 14:17:46 lp-sasha kernel: CR2: 0000000000000000 CR3: 0000000004976000 CR4: 00000000000006e0
      Oct 09 14:17:46 lp-sasha kernel: Call Trace:
      Oct 09 14:17:46 lp-sasha kernel:  nouveau_connector_get_modes+0x1e6/0x240 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel:  ? kfree+0xb9/0x240
      Oct 09 14:17:46 lp-sasha kernel:  ? drm_connector_list_iter_next+0x7c/0xa0
      Oct 09 14:17:46 lp-sasha kernel:  drm_helper_probe_single_connector_modes+0x1ba/0x7c0
      Oct 09 14:17:46 lp-sasha kernel:  drm_client_modeset_probe+0x27e/0x1360
      Oct 09 14:17:46 lp-sasha kernel:  ? nvif_object_sclass_put+0xc/0x20 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel:  ? nouveau_cli_init+0x3cc/0x440 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel:  ? ktime_get_mono_fast_ns+0x49/0xa0
      Oct 09 14:17:46 lp-sasha kernel:  ? nouveau_drm_open+0x4e/0x180 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel:  __drm_fb_helper_initial_config_and_unlock+0x3f/0x4a0
      Oct 09 14:17:46 lp-sasha kernel:  ? drm_file_alloc+0x18f/0x260
      Oct 09 14:17:46 lp-sasha kernel:  ? mutex_lock+0x9/0x40
      Oct 09 14:17:46 lp-sasha kernel:  ? drm_client_init+0x110/0x160
      Oct 09 14:17:46 lp-sasha kernel:  nouveau_fbcon_init+0x14d/0x1c0 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel:  nouveau_drm_device_init+0x1c0/0x880 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel:  nouveau_drm_probe+0x11a/0x1e0 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel:  pci_device_probe+0xcd/0x140
      Oct 09 14:17:46 lp-sasha kernel:  really_probe+0xd8/0x400
      Oct 09 14:17:46 lp-sasha kernel:  driver_probe_device+0x4a/0xa0
      Oct 09 14:17:46 lp-sasha kernel:  device_driver_attach+0x9c/0xc0
      Oct 09 14:17:46 lp-sasha kernel:  __driver_attach+0x6f/0x100
      Oct 09 14:17:46 lp-sasha kernel:  ? device_driver_attach+0xc0/0xc0
      Oct 09 14:17:46 lp-sasha kernel:  bus_for_each_dev+0x75/0xc0
      Oct 09 14:17:46 lp-sasha kernel:  bus_add_driver+0x106/0x1c0
      Oct 09 14:17:46 lp-sasha kernel:  driver_register+0x86/0xe0
      Oct 09 14:17:46 lp-sasha kernel:  ? 0xffffffffa044e000
      Oct 09 14:17:46 lp-sasha kernel:  do_one_initcall+0x48/0x1e0
      Oct 09 14:17:46 lp-sasha kernel:  ? _cond_resched+0x11/0x60
      Oct 09 14:17:46 lp-sasha kernel:  ? kmem_cache_alloc_trace+0x19c/0x1e0
      Oct 09 14:17:46 lp-sasha kernel:  do_init_module+0x57/0x220
      Oct 09 14:17:46 lp-sasha kernel:  __do_sys_finit_module+0xa0/0xe0
      Oct 09 14:17:46 lp-sasha kernel:  do_syscall_64+0x33/0x40
      Oct 09 14:17:46 lp-sasha kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Oct 09 14:17:46 lp-sasha kernel: RIP: 0033:0x7fd01a060d5d
      Oct 09 14:17:46 lp-sasha kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 70 0c 00 f7 d8 64 89 01 48
      Oct 09 14:17:46 lp-sasha kernel: RSP: 002b:00007ffc8ad38a98 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      Oct 09 14:17:46 lp-sasha kernel: RAX: ffffffffffffffda RBX: 0000563f6e7fd530 RCX: 00007fd01a060d5d
      Oct 09 14:17:46 lp-sasha kernel: RDX: 0000000000000000 RSI: 00007fd01a19f95d RDI: 000000000000000f
      Oct 09 14:17:46 lp-sasha kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000007
      Oct 09 14:17:46 lp-sasha kernel: R10: 000000000000000f R11: 0000000000000246 R12: 00007fd01a19f95d
      Oct 09 14:17:46 lp-sasha kernel: R13: 0000000000000000 R14: 0000563f6e7fbc10 R15: 0000563f6e7fd530
      Oct 09 14:17:46 lp-sasha kernel: Modules linked in: nouveau(+) ttm xt_string xt_mark xt_LOG vgem v4l2_dv_timings uvcvideo ulpi udf ts_kmp ts_fsm ts_bm snd_aloop sil164 qat_dh895xccvf nf_nat_sip nf_nat_irc nf_nat_ftp nf_nat nf_log_ipv6 nf_log_ipv4 nf_log_common ltc2990 lcd intel_qat input_leds i2c_mux gspca_main videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc drivetemp cuse fuse crc_itu_t coretemp ch7006 ath5k ath algif_hash
      Oct 09 14:17:46 lp-sasha kernel: CR2: 0000000000000000
      Oct 09 14:17:46 lp-sasha kernel: ---[ end trace 0ddafe218ad30017 ]---
      Oct 09 14:17:46 lp-sasha kernel: RIP: 0010:nouveau_connector_detect_depth+0x71/0xc0 [nouveau]
      Oct 09 14:17:46 lp-sasha kernel: Code: 0a 00 00 48 8b 49 48 c7 87 b8 00 00 00 06 00 00 00 80 b9 4d 0a 00 00 00 75 1e 83 fa 41 75 05 48 85 c0 75 29 8b 81 10 0d 00 00 <39> 06 7c 25 f6 81 14 0d 00 00 02 75 b7 c3 80 b9 0c 0d 00 00 00 75
      Oct 09 14:17:46 lp-sasha kernel: RSP: 0018:ffffc9000028f8c0 EFLAGS: 00010297
      Oct 09 14:17:46 lp-sasha kernel: RAX: 0000000000014c08 RBX: ffff8880369d4000 RCX: ffff8880369d3000
      Oct 09 14:17:46 lp-sasha kernel: RDX: 0000000000000040 RSI: 0000000000000000 RDI: ffff8880369d4000
      Oct 09 14:17:46 lp-sasha kernel: RBP: ffff88800601cc00 R08: ffff8880051da298 R09: ffffffff8226201a
      Oct 09 14:17:46 lp-sasha kernel: R10: ffff88800469aa80 R11: ffff888004c84ff8 R12: 0000000000000000
      Oct 09 14:17:46 lp-sasha kernel: R13: ffff8880051da000 R14: 0000000000002000 R15: 0000000000000003
      Oct 09 14:17:46 lp-sasha kernel: FS:  00007fd0192b3440(0000) GS:ffff8880bc900000(0000) knlGS:0000000000000000
      Oct 09 14:17:46 lp-sasha kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      Oct 09 14:17:46 lp-sasha kernel: CR2: 0000000000000000 CR3: 0000000004976000 CR4: 00000000000006e0
      
      The disassembly:
      Code: 0a 00 00 48 8b 49 48 c7 87 b8 00 00 00 06 00 00 00 80 b9 4d 0a 00 00 00 75 1e 83 fa 41 75 05 48 85 c0 75 29 8b 81 10 0d 00 00 <39> 06 7c 25 f6 81 14 0d 00 00 02 75 b7 c3 80 b9 0c 0d 00 00 00 75
      All code
      ========
         0:   0a 00                   or     (%rax),%al
         2:   00 48 8b                add    %cl,-0x75(%rax)
         5:   49                      rex.WB
         6:   48 c7 87 b8 00 00 00    movq   $0x6,0xb8(%rdi)
         d:   06 00 00 00
        11:   80 b9 4d 0a 00 00 00    cmpb   $0x0,0xa4d(%rcx)
        18:   75 1e                   jne    0x38
        1a:   83 fa 41                cmp    $0x41,%edx
        1d:   75 05                   jne    0x24
        1f:   48 85 c0                test   %rax,%rax
        22:   75 29                   jne    0x4d
        24:   8b 81 10 0d 00 00       mov    0xd10(%rcx),%eax
        2a:*  39 06                   cmp    %eax,(%rsi)              <-- trapping instruction
        2c:   7c 25                   jl     0x53
        2e:   f6 81 14 0d 00 00 02    testb  $0x2,0xd14(%rcx)
        35:   75 b7                   jne    0xffffffffffffffee
        37:   c3                      retq
        38:   80 b9 0c 0d 00 00 00    cmpb   $0x0,0xd0c(%rcx)
        3f:   75                      .byte 0x75
      
      Code starting with the faulting instruction
      ===========================================
         0:   39 06                   cmp    %eax,(%rsi)
         2:   7c 25                   jl     0x29
         4:   f6 81 14 0d 00 00 02    testb  $0x2,0xd14(%rcx)
         b:   75 b7                   jne    0xffffffffffffffc4
         d:   c3                      retq
         e:   80 b9 0c 0d 00 00 00    cmpb   $0x0,0xd0c(%rcx)
        15:   75                      .byte 0x75
      
      objdump -SF --disassemble=nouveau_connector_detect_depth
      [...]
              if (nv_connector->edid &&
         c85e1:       83 fa 41                cmp    $0x41,%edx
         c85e4:       75 05                   jne    c85eb <nouveau_connector_detect_depth+0x6b> (File Offset: 0xc866b)
         c85e6:       48 85 c0                test   %rax,%rax
         c85e9:       75 29                   jne    c8614 <nouveau_connector_detect_depth+0x94> (File Offset: 0xc8694)
                  nv_connector->type == DCB_CONNECTOR_LVDS_SPWG)
                      duallink = ((u8 *)nv_connector->edid)[121] == 2;
              else
                      duallink = mode->clock >= bios->fp.duallink_transition_clk;
      
              if ((!duallink && (bios->fp.strapless_is_24bit & 1)) ||
         c85eb:       8b 81 10 0d 00 00       mov    0xd10(%rcx),%eax
         c85f1:       39 06                   cmp    %eax,(%rsi)
         c85f3:       7c 25                   jl     c861a <nouveau_connector_detect_depth+0x9a> (File Offset: 0xc869a)
                  ( duallink && (bios->fp.strapless_is_24bit & 2)))
         c85f5:       f6 81 14 0d 00 00 02    testb  $0x2,0xd14(%rcx)
         c85fc:       75 b7                   jne    c85b5 <nouveau_connector_detect_depth+0x35> (File Offset: 0xc8635)
                      connector->display_info.bpc = 8;
      [...]
      
      % scripts/faddr2line /lib/modules/5.9.0-rc8-next-20201009/kernel/drivers/gpu/drm/nouveau/nouveau.ko nouveau_connector_detect_depth+0x71/0xc0
      nouveau_connector_detect_depth+0x71/0xc0:
      nouveau_connector_detect_depth at /home/sasha/linux-next/drivers/gpu/drm/nouveau/nouveau_connector.c:891
      
      It is actually line 889. See the disassembly below.
      889                     duallink = mode->clock >= bios->fp.duallink_transition_clk;
      
      The NULL pointer being dereferenced is mode.
      
      Git bisect has identified the following commit as bad:
      f28e32d3 drm/nouveau/kms: Don't change EDID when it hasn't actually changed
      
      Here is the chain of events that causes the oops.
      On entry to nouveau_connector_detect_lvds, edid is set to NULL.  The call
      to nouveau_connector_detect sets nv_connector->edid to valid memory,
      with status set to connector_status_connected and the flow of execution
      branching to the out label.
      
      The subsequent call to nouveau_connector_set_edid erronously clears
      nv_connector->edid, via the local edid pointer which remains set to NULL.
      
      Fix this by setting edid to the value of the just acquired
      nv_connector->edid and executing the body of nouveau_connector_set_edid
      only if nv_connector->edid and edid point to different memory addresses
      thus preventing nv_connector->edid from being turned into a dangling
      pointer.
      
      Fixes: f28e32d3 ("drm/nouveau/kms: Don't change EDID when it hasn't actually changed")
      Signed-off-by: NAlexander Kapshuk <alexander.kapshuk@gmail.com>
      Reviewed-by: NLyude Paul <lyude@redhat.com>
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      630f5122