1. 04 8月, 2019 25 次提交
  2. 31 7月, 2019 15 次提交
    • G
      Linux 4.19.63 · 9a9de33a
      Greg Kroah-Hartman 提交于
      9a9de33a
    • L
      access: avoid the RCU grace period for the temporary subjective credentials · 408af823
      Linus Torvalds 提交于
      commit d7852fbd0f0423937fa287a598bfde188bb68c22 upstream.
      
      It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU
      work because it installs a temporary credential that gets allocated and
      freed for each system call.
      
      The allocation and freeing overhead is mostly benign, but because
      credentials can be accessed under the RCU read lock, the freeing
      involves a RCU grace period.
      
      Which is not a huge deal normally, but if you have a lot of access()
      calls, this causes a fair amount of seconday damage: instead of having a
      nice alloc/free patterns that hits in hot per-CPU slab caches, you have
      all those delayed free's, and on big machines with hundreds of cores,
      the RCU overhead can end up being enormous.
      
      But it turns out that all of this is entirely unnecessary.  Exactly
      because access() only installs the credential as the thread-local
      subjective credential, the temporary cred pointer doesn't actually need
      to be RCU free'd at all.  Once we're done using it, we can just free it
      synchronously and avoid all the RCU overhead.
      
      So add a 'non_rcu' flag to 'struct cred', which can be set by users that
      know they only use it in non-RCU context (there are other potential
      users for this).  We can make it a union with the rcu freeing list head
      that we need for the RCU case, so this doesn't need any extra storage.
      
      Note that this also makes 'get_current_cred()' clear the new non_rcu
      flag, in case we have filesystems that take a long-term reference to the
      cred and then expect the RCU delayed freeing afterwards.  It's not
      entirely clear that this is required, but it makes for clear semantics:
      the subjective cred remains non-RCU as long as you only access it
      synchronously using the thread-local accessors, but you _can_ use it as
      a generic cred if you want to.
      
      It is possible that we should just remove the whole RCU markings for
      ->cred entirely.  Only ->real_cred is really supposed to be accessed
      through RCU, and the long-term cred copies that nfs uses might want to
      explicitly re-enable RCU freeing if required, rather than have
      get_current_cred() do it implicitly.
      
      But this is a "minimal semantic changes" change for the immediate
      problem.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Jan Glauber <jglauber@marvell.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      408af823
    • D
      libnvdimm/bus: Stop holding nvdimm_bus_list_mutex over __nd_ioctl() · 1a547d24
      Dan Williams 提交于
      commit b70d31d054ee3a6fc1034b9d7fc0ae1e481aa018 upstream.
      
      In preparation for fixing a deadlock between wait_for_bus_probe_idle()
      and the nvdimm_bus_list_mutex arrange for __nd_ioctl() without
      nvdimm_bus_list_mutex held. This also unifies the 'dimm' and 'bus' level
      ioctls into a common nd_ioctl() preamble implementation.
      
      Marked for -stable as it is a pre-requisite for a follow-on fix.
      
      Cc: <stable@vger.kernel.org>
      Fixes: bf9bccc1 ("libnvdimm: pmem label sets and namespace instantiation")
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Tested-by: NJane Chu <jane.chu@oracle.com>
      Link: https://lore.kernel.org/r/156341209518.292348.7183897251740665198.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a547d24
    • M
      powerpc/tm: Fix oops on sigreturn on systems without TM · b993a66d
      Michael Neuling 提交于
      commit f16d80b75a096c52354c6e0a574993f3b0dfbdfe upstream.
      
      On systems like P9 powernv where we have no TM (or P8 booted with
      ppc_tm=off), userspace can construct a signal context which still has
      the MSR TS bits set. The kernel tries to restore this context which
      results in the following crash:
      
        Unexpected TM Bad Thing exception at c0000000000022fc (msr 0x8000000102a03031) tm_scratch=800000020280f033
        Oops: Unrecoverable exception, sig: 6 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        Modules linked in:
        CPU: 0 PID: 1636 Comm: sigfuz Not tainted 5.2.0-11043-g0a8ad0ffa4 #69
        NIP:  c0000000000022fc LR: 00007fffb2d67e48 CTR: 0000000000000000
        REGS: c00000003fffbd70 TRAP: 0700   Not tainted  (5.2.0-11045-g7142b497d8)
        MSR:  8000000102a03031 <SF,VEC,VSX,FP,ME,IR,DR,LE,TM[E]>  CR: 42004242  XER: 00000000
        CFAR: c0000000000022e0 IRQMASK: 0
        GPR00: 0000000000000072 00007fffb2b6e560 00007fffb2d87f00 0000000000000669
        GPR04: 00007fffb2b6e728 0000000000000000 0000000000000000 00007fffb2b6f2a8
        GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
        GPR12: 0000000000000000 00007fffb2b76900 0000000000000000 0000000000000000
        GPR16: 00007fffb2370000 00007fffb2d84390 00007fffea3a15ac 000001000a250420
        GPR20: 00007fffb2b6f260 0000000010001770 0000000000000000 0000000000000000
        GPR24: 00007fffb2d843a0 00007fffea3a14a0 0000000000010000 0000000000800000
        GPR28: 00007fffea3a14d8 00000000003d0f00 0000000000000000 00007fffb2b6e728
        NIP [c0000000000022fc] rfi_flush_fallback+0x7c/0x80
        LR [00007fffb2d67e48] 0x7fffb2d67e48
        Call Trace:
        Instruction dump:
        e96a0220 e96a02a8 e96a0330 e96a03b8 394a0400 4200ffdc 7d2903a6 e92d0c00
        e94d0c08 e96d0c10 e82d0c18 7db242a6 <4c000024> 7db243a6 7db142a6 f82d0c18
      
      The problem is the signal code assumes TM is enabled when
      CONFIG_PPC_TRANSACTIONAL_MEM is enabled. This may not be the case as
      with P9 powernv or if `ppc_tm=off` is used on P8.
      
      This means any local user can crash the system.
      
      Fix the problem by returning a bad stack frame to the user if they try
      to set the MSR TS bits with sigreturn() on systems where TM is not
      supported.
      
      Found with sigfuz kernel selftest on P9.
      
      This fixes CVE-2019-13648.
      
      Fixes: 2b0a576d ("powerpc: Add new transactional memory state to the signal context")
      Cc: stable@vger.kernel.org # v3.9
      Reported-by: NPraveen Pandey <Praveen.Pandey@in.ibm.com>
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20190719050502.405-1-mikey@neuling.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b993a66d
    • G
      powerpc/xive: Fix loop exit-condition in xive_find_target_in_mask() · b9310c56
      Gautham R. Shenoy 提交于
      commit 4d202c8c8ed3822327285747db1765967110b274 upstream.
      
      xive_find_target_in_mask() has the following for(;;) loop which has a
      bug when @first == cpumask_first(@mask) and condition 1 fails to hold
      for every CPU in @mask. In this case we loop forever in the for-loop.
      
        first = cpu;
        for (;;) {
        	  if (cpu_online(cpu) && xive_try_pick_target(cpu)) // condition 1
      		  return cpu;
      	  cpu = cpumask_next(cpu, mask);
      	  if (cpu == first) // condition 2
      		  break;
      
      	  if (cpu >= nr_cpu_ids) // condition 3
      		  cpu = cpumask_first(mask);
        }
      
      This is because, when @first == cpumask_first(@mask), we never hit the
      condition 2 (cpu == first) since prior to this check, we would have
      executed "cpu = cpumask_next(cpu, mask)" which will set the value of
      @cpu to a value greater than @first or to nr_cpus_ids. When this is
      coupled with the fact that condition 1 is not met, we will never exit
      this loop.
      
      This was discovered by the hard-lockup detector while running LTP test
      concurrently with SMT switch tests.
      
       watchdog: CPU 12 detected hard LOCKUP on other CPUs 68
       watchdog: CPU 12 TB:85587019220796, last SMP heartbeat TB:85578827223399 (15999ms ago)
       watchdog: CPU 68 Hard LOCKUP
       watchdog: CPU 68 TB:85587019361273, last heartbeat TB:85576815065016 (19930ms ago)
       CPU: 68 PID: 45050 Comm: hxediag Kdump: loaded Not tainted 4.18.0-100.el8.ppc64le #1
       NIP:  c0000000006f5578 LR: c000000000cba9ec CTR: 0000000000000000
       REGS: c000201fff3c7d80 TRAP: 0100   Not tainted  (4.18.0-100.el8.ppc64le)
       MSR:  9000000002883033 <SF,HV,VEC,VSX,FP,ME,IR,DR,RI,LE>  CR: 24028424  XER: 00000000
       CFAR: c0000000006f558c IRQMASK: 1
       GPR00: c0000000000afc58 c000201c01c43400 c0000000015ce500 c000201cae26ec18
       GPR04: 0000000000000800 0000000000000540 0000000000000800 00000000000000f8
       GPR08: 0000000000000020 00000000000000a8 0000000080000000 c00800001a1beed8
       GPR12: c0000000000b1410 c000201fff7f4c00 0000000000000000 0000000000000000
       GPR16: 0000000000000000 0000000000000000 0000000000000540 0000000000000001
       GPR20: 0000000000000048 0000000010110000 c00800001a1e3780 c000201cae26ed18
       GPR24: 0000000000000000 c000201cae26ed8c 0000000000000001 c000000001116bc0
       GPR28: c000000001601ee8 c000000001602494 c000201cae26ec18 000000000000001f
       NIP [c0000000006f5578] find_next_bit+0x38/0x90
       LR [c000000000cba9ec] cpumask_next+0x2c/0x50
       Call Trace:
       [c000201c01c43400] [c000201cae26ec18] 0xc000201cae26ec18 (unreliable)
       [c000201c01c43420] [c0000000000afc58] xive_find_target_in_mask+0x1b8/0x240
       [c000201c01c43470] [c0000000000b0228] xive_pick_irq_target.isra.3+0x168/0x1f0
       [c000201c01c435c0] [c0000000000b1470] xive_irq_startup+0x60/0x260
       [c000201c01c43640] [c0000000001d8328] __irq_startup+0x58/0xf0
       [c000201c01c43670] [c0000000001d844c] irq_startup+0x8c/0x1a0
       [c000201c01c436b0] [c0000000001d57b0] __setup_irq+0x9f0/0xa90
       [c000201c01c43760] [c0000000001d5aa0] request_threaded_irq+0x140/0x220
       [c000201c01c437d0] [c00800001a17b3d4] bnx2x_nic_load+0x188c/0x3040 [bnx2x]
       [c000201c01c43950] [c00800001a187c44] bnx2x_self_test+0x1fc/0x1f70 [bnx2x]
       [c000201c01c43a90] [c000000000adc748] dev_ethtool+0x11d8/0x2cb0
       [c000201c01c43b60] [c000000000b0b61c] dev_ioctl+0x5ac/0xa50
       [c000201c01c43bf0] [c000000000a8d4ec] sock_do_ioctl+0xbc/0x1b0
       [c000201c01c43c60] [c000000000a8dfb8] sock_ioctl+0x258/0x4f0
       [c000201c01c43d20] [c0000000004c9704] do_vfs_ioctl+0xd4/0xa70
       [c000201c01c43de0] [c0000000004ca274] sys_ioctl+0xc4/0x160
       [c000201c01c43e30] [c00000000000b388] system_call+0x5c/0x70
       Instruction dump:
       78aad182 54a806be 3920ffff 78a50664 794a1f24 7d294036 7d43502a 7d295039
       4182001c 48000034 78a9d182 79291f24 <7d23482a> 2fa90000 409e0020 38a50040
      
      To fix this, move the check for condition 2 after the check for
      condition 3, so that we are able to break out of the loop soon after
      iterating through all the CPUs in the @mask in the problem case. Use
      do..while() to achieve this.
      
      Fixes: 243e2511 ("powerpc/xive: Native exploitation of the XIVE interrupt controller")
      Cc: stable@vger.kernel.org # v4.12+
      Reported-by: NIndira P. Joga <indira.priya@in.ibm.com>
      Signed-off-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/1563359724-13931-1-git-send-email-ego@linux.vnet.ibm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9310c56
    • H
      ALSA: hda - Add a conexant codec entry to let mute led work · c2194442
      Hui Wang 提交于
      commit 3f8809499bf02ef7874254c5e23fc764a47a21a0 upstream.
      
      This conexant codec isn't in the supported codec list yet, the hda
      generic driver can drive this codec well, but on a Lenovo machine
      with mute/mic-mute leds, we need to apply CXT_FIXUP_THINKPAD_ACPI
      to make the leds work. After adding this codec to the list, the
      driver patch_conexant.c will apply THINKPAD_ACPI to this machine.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NHui Wang <hui.wang@canonical.com>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c2194442
    • K
      ALSA: line6: Fix wrong altsetting for LINE6_PODHD500_1 · 491483ed
      Kai-Heng Feng 提交于
      commit 70256b42caaf3e13c2932c2be7903a73fbe8bb8b upstream.
      
      Commit 7b9584fa ("staging: line6: Move altsetting to properties")
      set a wrong altsetting for LINE6_PODHD500_1 during refactoring.
      
      Set the correct altsetting number to fix the issue.
      
      BugLink: https://bugs.launchpad.net/bugs/1790595
      Fixes: 7b9584fa ("staging: line6: Move altsetting to properties")
      Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      491483ed
    • D
      ALSA: ac97: Fix double free of ac97_codec_device · 60274409
      Ding Xiang 提交于
      commit 607975b30db41aad6edc846ed567191aa6b7d893 upstream.
      
      put_device will call ac97_codec_release to free
      ac97_codec_device and other resources, so remove the kfree
      and other redundant code.
      
      Fixes: 74426fbf ("ALSA: ac97: add an ac97 bus")
      Signed-off-by: NDing Xiang <dingxiang@cmss.chinamobile.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60274409
    • K
      hpet: Fix division by zero in hpet_time_div() · 9845fb5a
      Kefeng Wang 提交于
      commit 0c7d37f4d9b8446956e97b7c5e61173cdb7c8522 upstream.
      
      The base value in do_div() called by hpet_time_div() is truncated from
      unsigned long to uint32_t, resulting in a divide-by-zero exception.
      
      UBSAN: Undefined behaviour in ../drivers/char/hpet.c:572:2
      division by zero
      CPU: 1 PID: 23682 Comm: syz-executor.3 Not tainted 4.4.184.x86_64+ #4
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
       0000000000000000 b573382df1853d00 ffff8800a3287b98 ffffffff81ad7561
       ffff8800a3287c00 ffffffff838b35b0 ffffffff838b3860 ffff8800a3287c20
       0000000000000000 ffff8800a3287bb0 ffffffff81b8f25e ffffffff838b35a0
      Call Trace:
       [<ffffffff81ad7561>] __dump_stack lib/dump_stack.c:15 [inline]
       [<ffffffff81ad7561>] dump_stack+0xc1/0x120 lib/dump_stack.c:51
       [<ffffffff81b8f25e>] ubsan_epilogue+0x12/0x8d lib/ubsan.c:166
       [<ffffffff81b900cb>] __ubsan_handle_divrem_overflow+0x282/0x2c8 lib/ubsan.c:262
       [<ffffffff823560dd>] hpet_time_div drivers/char/hpet.c:572 [inline]
       [<ffffffff823560dd>] hpet_ioctl_common drivers/char/hpet.c:663 [inline]
       [<ffffffff823560dd>] hpet_ioctl_common.cold+0xa8/0xad drivers/char/hpet.c:577
       [<ffffffff81e63d56>] hpet_ioctl+0xc6/0x180 drivers/char/hpet.c:676
       [<ffffffff81711590>] vfs_ioctl fs/ioctl.c:43 [inline]
       [<ffffffff81711590>] file_ioctl fs/ioctl.c:470 [inline]
       [<ffffffff81711590>] do_vfs_ioctl+0x6e0/0xf70 fs/ioctl.c:605
       [<ffffffff81711eb4>] SYSC_ioctl fs/ioctl.c:622 [inline]
       [<ffffffff81711eb4>] SyS_ioctl+0x94/0xc0 fs/ioctl.c:613
       [<ffffffff82846003>] tracesys_phase2+0x90/0x95
      
      The main C reproducer autogenerated by syzkaller,
      
        syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
        memcpy((void*)0x20000100, "/dev/hpet\000", 10);
        syscall(__NR_openat, 0xffffffffffffff9c, 0x20000100, 0, 0);
        syscall(__NR_ioctl, r[0], 0x40086806, 0x40000000000000);
      
      Fix it by using div64_ul().
      Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NZhang HongJun <zhanghongjun2@huawei.com>
      Cc: stable <stable@vger.kernel.org>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Link: https://lore.kernel.org/r/20190711132757.130092-1-wangkefeng.wang@huawei.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9845fb5a
    • A
      mei: me: add mule creek canyon (EHL) device ids · e4c91583
      Alexander Usyskin 提交于
      commit 1be8624a0cbef720e8da39a15971e01abffc865b upstream.
      
      Add Mule Creek Canyon (PCH) MEI device ids for Elkhart Lake (EHL) Platform.
      Signed-off-by: NAlexander Usyskin <alexander.usyskin@intel.com>
      Signed-off-by: NTomas Winkler <tomas.winkler@intel.com>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20190712095814.20746-1-tomas.winkler@intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4c91583
    • Y
      fpga-manager: altera-ps-spi: Fix build error · 3d0a6926
      YueHaibing 提交于
      commit 3d139703d397f6281368047ba7ad1c8bf95aa8ab upstream.
      
      If BITREVERSE is m and FPGA_MGR_ALTERA_PS_SPI is y,
      build fails:
      
      drivers/fpga/altera-ps-spi.o: In function `altera_ps_write':
      altera-ps-spi.c:(.text+0x4ec): undefined reference to `byte_rev_table'
      
      Select BITREVERSE to fix this.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Fixes: fcfe18f8 ("fpga-manager: altera-ps-spi: use bitrev8x4")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Cc: stable <stable@vger.kernel.org>
      Acked-by: NMoritz Fischer <mdf@kernel.org>
      Link: https://lore.kernel.org/r/20190708071356.50928-1-yuehaibing@huawei.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d0a6926
    • H
      binder: prevent transactions to context manager from its own process. · e907b131
      Hridya Valsaraju 提交于
      commit 49ed96943a8e0c62cc5a9b0a6cfc88be87d1fcec upstream.
      
      Currently, a transaction to context manager from its own process
      is prevented by checking if its binder_proc struct is the same as
      that of the sender. However, this would not catch cases where the
      process opens the binder device again and uses the new fd to send
      a transaction to the context manager.
      
      Reported-by: syzbot+8b3c354d33c4ac78bfad@syzkaller.appspotmail.com
      Signed-off-by: NHridya Valsaraju <hridya@google.com>
      Acked-by: NTodd Kjos <tkjos@google.com>
      Cc: stable <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20190715191804.112933-1-hridya@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e907b131
    • Z
      x86/speculation/mds: Apply more accurate check on hypervisor platform · 7d20e3ba
      Zhenzhong Duan 提交于
      commit 517c3ba00916383af6411aec99442c307c23f684 upstream.
      
      X86_HYPER_NATIVE isn't accurate for checking if running on native platform,
      e.g. CONFIG_HYPERVISOR_GUEST isn't set or "nopv" is enabled.
      
      Checking the CPU feature bit X86_FEATURE_HYPERVISOR to determine if it's
      running on native platform is more accurate.
      
      This still doesn't cover the platforms on which X86_FEATURE_HYPERVISOR is
      unsupported, e.g. VMware, but there is nothing which can be done about this
      scenario.
      
      Fixes: 8a4b06d391b0 ("x86/speculation/mds: Add sysfs reporting for MDS")
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/1564022349-17338-1-git-send-email-zhenzhong.duan@oracle.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7d20e3ba
    • H
      x86/sysfb_efi: Add quirks for some devices with swapped width and height · 5e87e8b4
      Hans de Goede 提交于
      commit d02f1aa39189e0619c3525d5cd03254e61bf606a upstream.
      
      Some Lenovo 2-in-1s with a detachable keyboard have a portrait screen but
      advertise a landscape resolution and pitch, resulting in a messed up
      display if the kernel tries to show anything on the efifb (because of the
      wrong pitch).
      
      Fix this by adding a new DMI match table for devices which need to have
      their width and height swapped.
      
      At first it was tried to use the existing table for overriding some of the
      efifb parameters, but some of the affected devices have variants with
      different LCD resolutions which will not work with hardcoded override
      values.
      
      Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1730783Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190721152418.11644-1-hdegoede@redhat.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5e87e8b4
    • Q
      btrfs: inode: Don't compress if NODATASUM or NODATACOW set · e3dc9ea5
      Qu Wenruo 提交于
      commit 42c16da6d684391db83788eb680accd84f6c2083 upstream.
      
      As btrfs(5) specified:
      
      	Note
      	If nodatacow or nodatasum are enabled, compression is disabled.
      
      If NODATASUM or NODATACOW set, we should not compress the extent.
      
      Normally NODATACOW is detected properly in run_delalloc_range() so
      compression won't happen for NODATACOW.
      
      However for NODATASUM we don't have any check, and it can cause
      compressed extent without csum pretty easily, just by:
        mkfs.btrfs -f $dev
        mount $dev $mnt -o nodatasum
        touch $mnt/foobar
        mount -o remount,datasum,compress $mnt
        xfs_io -f -c "pwrite 0 128K" $mnt/foobar
      
      And in fact, we have a bug report about corrupted compressed extent
      without proper data checksum so even RAID1 can't recover the corruption.
      (https://bugzilla.kernel.org/show_bug.cgi?id=199707)
      
      Running compression without proper checksum could cause more damage when
      corruption happens, as compressed data could make the whole extent
      unreadable, so there is no need to allow compression for
      NODATACSUM.
      
      The fix will refactor the inode compression check into two parts:
      
      - inode_can_compress()
        As the hard requirement, checked at btrfs_run_delalloc_range(), so no
        compression will happen for NODATASUM inode at all.
      
      - inode_need_compress()
        As the soft requirement, checked at btrfs_run_delalloc_range() and
        compress_file_range().
      Reported-by: NJames Harvey <jamespharvey20@gmail.com>
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e3dc9ea5