1. 28 7月, 2018 7 次提交
    • L
      Merge branch 'akpm' (patches from Andrew) · 864af0d4
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "11 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        kvm, mm: account shadow page tables to kmemcg
        zswap: re-check zswap_is_full() after do zswap_shrink()
        include/linux/eventfd.h: include linux/errno.h
        mm: fix vma_is_anonymous() false-positives
        mm: use vma_init() to initialize VMAs on stack and data segments
        mm: introduce vma_init()
        mm: fix exports that inadvertently make put_page() EXPORT_SYMBOL_GPL
        ipc/sem.c: prevent queue.status tearing in semop
        mm: disallow mappings that conflict for devm_memremap_pages()
        kasan: only select SLUB_DEBUG with SYSFS=y
        delayacct: fix crash in delayacct_blkio_end() after delayacct init failure
      864af0d4
    • L
      Merge tag 'pci-v4.18-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 1a3d8691
      Linus Torvalds 提交于
      Pull PCI fix from Bjorn Helgaas:
       "Fix a use-after-free error in fatal error recovery (Thomas Tai)"
      
      * tag 'pci-v4.18-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI/AER: Work around use-after-free in pcie_do_fatal_recovery()
      1a3d8691
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 6284c99c
      Linus Torvalds 提交于
      Pull arm64 fixes from Will Deacon:
       "Inevitably, after saying that I hoped we would be done on the fixes
        front, a couple of issues have cropped up over the last week. Next
        time I'll stay schtum.
      
        We've fixed an over-eager BUILD_BUG_ON() which Arnd ran into with
        arndconfig, as well as ensuring that KPTI really is disabled on
        Thunder-X1, where the cure is worse than the disease (this regressed
        when we reworked the heterogeneous CPU feature checking).
      
        Summary:
      
         - Fix disabling of kpti on Thunder-X machines
      
         - Fix premature BUILD_BUG_ON() found with randconfig"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap setups
        arm64: Check for errata before evaluating cpu features
      6284c99c
    • L
      Merge tag 'drm-fixes-2018-07-27' of git://anongit.freedesktop.org/drm/drm · 3c9fdefe
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "Not much happening this week which is good: two imx display fixes and
        one i915 quirk addition"
      
      * tag 'drm-fixes-2018-07-27' of git://anongit.freedesktop.org/drm/drm:
        drm/i915/glk: Add Quirk for GLK NUC HDMI port issues.
        gpu: ipu-csi: Check for field type alternate
        drm/imx: imx-ldb: check if channel is enabled before printing warning
        drm/imx: imx-ldb: disable LDB on driver bind
      3c9fdefe
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 49b1622b
      Linus Torvalds 提交于
      Pull input fixes from Dmitry Torokhov:
      
       - a couple of new device IDs added to Elan i2c touchpad controller
         driver
      
       - another entry in i8042 reset quirk list
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: i8042 - add Lenovo LaVie Z to the i8042 reset list
        Input: elan_i2c - add another ACPI ID for Lenovo Ideapad 330-15AST
        MAINTAINERS: Add file patterns for serio device tree bindings
        Input: elan_i2c - add ACPI ID for lenovo ideapad 330
      49b1622b
    • L
      Merge tag 'trace-v4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · 3ebb6fb0
      Linus Torvalds 提交于
      Pull tracing fixes from Steven Rostedt:
       "Various fixes to the tracing infrastructure:
      
         - Fix double free when the reg() call fails in
           event_trigger_callback()
      
         - Fix anomoly of snapshot causing tracing_on flag to change
      
         - Add selftest to test snapshot and tracing_on affecting each other
      
         - Fix setting of tracepoint flag on error that prevents probes from
           being deleted.
      
         - Fix another possible double free that is similar to
           event_trigger_callback()
      
         - Quiet a gcc warning of a false positive unused variable
      
         - Fix crash of partial exposed task->comm to trace events"
      
      * tag 'trace-v4.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        kthread, tracing: Don't expose half-written comm when creating kthreads
        tracing: Quiet gcc warning about maybe unused link variable
        tracing: Fix possible double free in event_enable_trigger_func()
        tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure
        selftests/ftrace: Add snapshot and tracing_on test case
        ring_buffer: tracing: Inherit the tracing setting to next ring buffer
        tracing: Fix double free of event_trigger_data
      3ebb6fb0
    • L
      Merge tag 'xfs-4.18-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · f636d300
      Linus Torvalds 提交于
      Pull xfs fixes from Darrick Wong:
      
       - Fix some uninitialized variable errors
      
       - Fix an incorrect check in metadata verifiers
      
      * tag 'xfs-4.18-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: properly handle free inodes in extent hint validators
        xfs: Initialize variables in xfs_alloc_get_rec before using them
      f636d300
  2. 27 7月, 2018 19 次提交
  3. 26 7月, 2018 6 次提交
    • S
      kthread, tracing: Don't expose half-written comm when creating kthreads · 3e536e22
      Snild Dolkow 提交于
      There is a window for racing when printing directly to task->comm,
      allowing other threads to see a non-terminated string. The vsnprintf
      function fills the buffer, counts the truncated chars, then finally
      writes the \0 at the end.
      
      	creator                     other
      	vsnprintf:
      	  fill (not terminated)
      	  count the rest            trace_sched_waking(p):
      	  ...                         memcpy(comm, p->comm, TASK_COMM_LEN)
      	  write \0
      
      The consequences depend on how 'other' uses the string. In our case,
      it was copied into the tracing system's saved cmdlines, a buffer of
      adjacent TASK_COMM_LEN-byte buffers (note the 'n' where 0 should be):
      
      	crash-arm64> x/1024s savedcmd->saved_cmdlines | grep 'evenk'
      	0xffffffd5b3818640:     "irq/497-pwr_evenkworker/u16:12"
      
      ...and a strcpy out of there would cause stack corruption:
      
      	[224761.522292] Kernel panic - not syncing: stack-protector:
      	    Kernel stack is corrupted in: ffffff9bf9783c78
      
      	crash-arm64> kbt | grep 'comm\|trace_print_context'
      	#6  0xffffff9bf9783c78 in trace_print_context+0x18c(+396)
      	      comm (char [16]) =  "irq/497-pwr_even"
      
      	crash-arm64> rd 0xffffffd4d0e17d14 8
      	ffffffd4d0e17d14:  2f71726900000000 5f7277702d373934   ....irq/497-pwr_
      	ffffffd4d0e17d24:  726f776b6e657665 3a3631752f72656b   evenkworker/u16:
      	ffffffd4d0e17d34:  f9780248ff003231 cede60e0ffffff9b   12..H.x......`..
      	ffffffd4d0e17d44:  cede60c8ffffffd4 00000fffffffffd4   .....`..........
      
      The workaround in e09e2867 (use strlcpy in __trace_find_cmdline) was
      likely needed because of this same bug.
      
      Solved by vsnprintf:ing to a local buffer, then using set_task_comm().
      This way, there won't be a window where comm is not terminated.
      
      Link: http://lkml.kernel.org/r/20180726071539.188015-1-snild@sony.com
      
      Cc: stable@vger.kernel.org
      Fixes: bc0c38d1 ("ftrace: latency tracer infrastructure")
      Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NSnild Dolkow <snild@sony.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      3e536e22
    • S
      tracing: Quiet gcc warning about maybe unused link variable · 2519c1bb
      Steven Rostedt (VMware) 提交于
      Commit 57ea2a34 ("tracing/kprobes: Fix trace_probe flags on
      enable_trace_kprobe() failure") added an if statement that depends on another
      if statement that gcc doesn't see will initialize the "link" variable and
      gives the warning:
      
       "warning: 'link' may be used uninitialized in this function"
      
      It is really a false positive, but to quiet the warning, and also to make
      sure that it never actually is used uninitialized, initialize the "link"
      variable to NULL and add an if (!WARN_ON_ONCE(!link)) where the compiler
      thinks it could be used uninitialized.
      
      Cc: stable@vger.kernel.org
      Fixes: 57ea2a34 ("tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure")
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      2519c1bb
    • S
      tracing: Fix possible double free in event_enable_trigger_func() · 15cc7864
      Steven Rostedt (VMware) 提交于
      There was a case that triggered a double free in event_trigger_callback()
      due to the called reg() function freeing the trigger_data and then it
      getting freed again by the error return by the caller. The solution there
      was to up the trigger_data ref count.
      
      Code inspection found that event_enable_trigger_func() has the same issue,
      but is not as easy to trigger (requires harder to trigger failures). It
      needs to be solved slightly different as it needs more to clean up when the
      reg() function fails.
      
      Link: http://lkml.kernel.org/r/20180725124008.7008e586@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Fixes: 7862ad18 ("tracing: Add 'enable_event' and 'disable_event' event trigger commands")
      Reivewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      15cc7864
    • C
      drm/i915/glk: Add Quirk for GLK NUC HDMI port issues. · 0ca94881
      Clint Taylor 提交于
      On GLK NUC platforms the HDMI retiming buffer needs additional disabled
      time to correctly sync to a faster incoming signal.
      
      When measured on a scope the highspeed lines of the HDMI clock turn off
       for ~400uS during a normal resolution change. The HDMI retimer on the
       GLK NUC appears to require at least a full frame of quiet time before a
      new faster clock can be correctly sync'd. Wait 100ms due to msleep
      inaccuracies while waiting for a completed frame. Add a quirk to the
      driver for GLK boards that use ITE66317 HDMI retimers.
      
      V2: Add more devices to the quirk list
      V3: Delay increased to 100ms, check to confirm crtc type is HDMI.
      V4: crtc type check extended to include _DDI and whitespace fixes
      v5: Fix white spaces, remove the macro for delay. Revert the crtc type
          check introduced in v4.
      
      Cc: Imre Deak <imre.deak@intel.com>
      Cc: <stable@vger.kernel.org> # v4.14+
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105887Signed-off-by: NClint Taylor <clinton.a.taylor@intel.com>
      Tested-by: NDaniel Scheller <d.scheller.oss@gmail.com>
      Signed-off-by: NRadhakrishna Sripada <radhakrishna.sripada@intel.com>
      Signed-off-by: NImre Deak <imre.deak@intel.com>
      Reviewed-by: NImre Deak <imre.deak@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20180710200205.1478-1-radhakrishna.sripada@intel.com
      (cherry picked from commit 90c3e219)
      Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com>
      0ca94881
    • L
      Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 6e77b267
      Linus Torvalds 提交于
      Pull clk fixes from Stephen Boyd:
       "One more round of updates for problems seen this -rc series. Drivers
        fixes are:
      
         - Amlogic Meson audio divider fix and CPU clk critical marking
      
         - Qualcomm multimedia GDSC marked as 'always on' to keep display
           working
      
         - Aspeed fixes for critical clks, resets causing clks to stay
           disabled, and an incorrect HPLL frequency calculation
      
         - Marvell Armada 3700 cpu clks would undervolt when switching from
           low frequencies to high frequencies because the voltage didn't
           stabilize in time so now we switch to an intermediate frequency
      
        Plus we have a core framework thinko that messed up the debugfs flag
        printing logic to make it not very useful"
      
      * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
        clk: aspeed: Support HPLL strapping on ast2400
        clk: mvebu: armada-37xx-periph: Fix switching CPU rate from 300Mhz to 1.2GHz
        clk: aspeed: Mark bclk (PCIe) and dclk (VGA) as critical
        clk/mmcc-msm8996: Make mmagic_bimc_gdsc ALWAYS_ON
        clk: aspeed: Treat a gate in reset as disabled
        clk: Really show symbolic clock flags in debugfs
        clk: qcom: gcc-msm8996: Disable halt check on UFS tx clock
        clk: meson: audio-divider is one based
        clk: meson-gxbb: set fclk_div2 as CLK_IS_CRITICAL
      6e77b267
    • L
      Merge tag 'fscache-fixes-20180725' of... · 5c61ef1b
      Linus Torvalds 提交于
      Merge tag 'fscache-fixes-20180725' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
      
      Pull fscache/cachefiles fixes from David Howells:
      
       - Allow cancelled operations to be queued so they can be cleaned up.
      
       - Fix a refcounting bug in the monitoring of reads on backend files
         whereby a race can occur between monitor objects being listed for
         work, the work processing being queued and the work processor running
         and destroying the monitor objects.
      
       - Fix a ref overput in object attachment, whereby a tentatively
         considered object is put in error handling without first being 'got'.
      
       - Fix a missing clear of the CACHEFILES_OBJECT_ACTIVE flag whereby an
         assertion occurs when we retry because it seems the object is now
         active.
      
       - Wait rather BUG'ing on an object collision in the depths of
         cachefiles as the active object should be being cleaned up - also
         depends on the one above.
      
      * tag 'fscache-fixes-20180725' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        cachefiles: Wait rather than BUG'ing on "Unexpected object collision"
        cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag
        fscache: Fix reference overput in fscache_attach_object() error handling
        cachefiles: Fix refcounting bug in backing-file read monitoring
        fscache: Allow cancelled operations to be enqueued
      5c61ef1b
  4. 25 7月, 2018 8 次提交
    • A
      tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure · 57ea2a34
      Artem Savkov 提交于
      If enable_trace_kprobe fails to enable the probe in enable_k(ret)probe
      it returns an error, but does not unset the tp flags it set previously.
      This results in a probe being considered enabled and failures like being
      unable to remove the probe through kprobe_events file since probes_open()
      expects every probe to be disabled.
      
      Link: http://lkml.kernel.org/r/20180725102826.8300-1-asavkov@redhat.com
      Link: http://lkml.kernel.org/r/20180725142038.4765-1-asavkov@redhat.com
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 41a7dd42 ("tracing/kprobes: Support ftrace_event_file base multibuffer")
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NArtem Savkov <asavkov@redhat.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      57ea2a34
    • M
      selftests/ftrace: Add snapshot and tracing_on test case · 82f4f3e6
      Masami Hiramatsu 提交于
      Add a testcase for checking snapshot and tracing_on
      relationship. This ensures that the snapshotting doesn't
      affect current tracing on/off settings.
      
      Link: http://lkml.kernel.org/r/153149932412.11274.15289227592627901488.stgit@devbox
      
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: linux-kselftest@vger.kernel.org
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      82f4f3e6
    • M
      ring_buffer: tracing: Inherit the tracing setting to next ring buffer · 73c8d894
      Masami Hiramatsu 提交于
      Maintain the tracing on/off setting of the ring_buffer when switching
      to the trace buffer snapshot.
      
      Taking a snapshot is done by swapping the backup ring buffer
      (max_tr_buffer). But since the tracing on/off setting is defined
      by the ring buffer, when swapping it, the tracing on/off setting
      can also be changed. This causes a strange result like below:
      
        /sys/kernel/debug/tracing # cat tracing_on
        1
        /sys/kernel/debug/tracing # echo 0 > tracing_on
        /sys/kernel/debug/tracing # cat tracing_on
        0
        /sys/kernel/debug/tracing # echo 1 > snapshot
        /sys/kernel/debug/tracing # cat tracing_on
        1
        /sys/kernel/debug/tracing # echo 1 > snapshot
        /sys/kernel/debug/tracing # cat tracing_on
        0
      
      We don't touch tracing_on, but snapshot changes tracing_on
      setting each time. This is an anomaly, because user doesn't know
      that each "ring_buffer" stores its own tracing-enable state and
      the snapshot is done by swapping ring buffers.
      
      Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
      Cc: stable@vger.kernel.org
      Fixes: debdd57f ("tracing: Make a snapshot feature available from userspace")
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      [ Updated commit log and comment in the code ]
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      73c8d894
    • S
      tracing: Fix double free of event_trigger_data · 1863c387
      Steven Rostedt (VMware) 提交于
      Running the following:
      
       # cd /sys/kernel/debug/tracing
       # echo 500000 > buffer_size_kb
      [ Or some other number that takes up most of memory ]
       # echo snapshot > events/sched/sched_switch/trigger
      
      Triggers the following bug:
      
       ------------[ cut here ]------------
       kernel BUG at mm/slub.c:296!
       invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
       CPU: 6 PID: 6878 Comm: bash Not tainted 4.18.0-rc6-test+ #1066
       Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
       RIP: 0010:kfree+0x16c/0x180
       Code: 05 41 0f b6 72 51 5b 5d 41 5c 4c 89 d7 e9 ac b3 f8 ff 48 89 d9 48 89 da 41 b8 01 00 00 00 5b 5d 41 5c 4c 89 d6 e9 f4 f3 ff ff <0f> 0b 0f 0b 48 8b 3d d9 d8 f9 00 e9 c1 fe ff ff 0f 1f 40 00 0f 1f
       RSP: 0018:ffffb654436d3d88 EFLAGS: 00010246
       RAX: ffff91a9d50f3d80 RBX: ffff91a9d50f3d80 RCX: ffff91a9d50f3d80
       RDX: 00000000000006a4 RSI: ffff91a9de5a60e0 RDI: ffff91a9d9803500
       RBP: ffffffff8d267c80 R08: 00000000000260e0 R09: ffffffff8c1a56be
       R10: fffff0d404543cc0 R11: 0000000000000389 R12: ffffffff8c1a56be
       R13: ffff91a9d9930e18 R14: ffff91a98c0c2890 R15: ffffffff8d267d00
       FS:  00007f363ea64700(0000) GS:ffff91a9de580000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000055c1cacc8e10 CR3: 00000000d9b46003 CR4: 00000000001606e0
       Call Trace:
        event_trigger_callback+0xee/0x1d0
        event_trigger_write+0xfc/0x1a0
        __vfs_write+0x33/0x190
        ? handle_mm_fault+0x115/0x230
        ? _cond_resched+0x16/0x40
        vfs_write+0xb0/0x190
        ksys_write+0x52/0xc0
        do_syscall_64+0x5a/0x160
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
       RIP: 0033:0x7f363e16ab50
       Code: 73 01 c3 48 8b 0d 38 83 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 79 db 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e e3 01 00 48 89 04 24
       RSP: 002b:00007fff9a4c6378 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
       RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f363e16ab50
       RDX: 0000000000000009 RSI: 000055c1cacc8e10 RDI: 0000000000000001
       RBP: 000055c1cacc8e10 R08: 00007f363e435740 R09: 00007f363ea64700
       R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000009
       R13: 0000000000000001 R14: 00007f363e4345e0 R15: 00007f363e4303c0
       Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device i915 snd_pcm snd_timer i2c_i801 snd soundcore i2c_algo_bit drm_kms_helper
      86_pkg_temp_thermal video kvm_intel kvm irqbypass wmi e1000e
       ---[ end trace d301afa879ddfa25 ]---
      
      The cause is because the register_snapshot_trigger() call failed to
      allocate the snapshot buffer, and then called unregister_trigger()
      which freed the data that was passed to it. Then on return to the
      function that called register_snapshot_trigger(), as it sees it
      failed to register, it frees the trigger_data again and causes
      a double free.
      
      By calling event_trigger_init() on the trigger_data (which only ups
      the reference counter for it), and then event_trigger_free() afterward,
      the trigger_data would not get freed by the registering trigger function
      as it would only up and lower the ref count for it. If the register
      trigger function fails, then the event_trigger_free() called after it
      will free the trigger data normally.
      
      Link: http://lkml.kernel.org/r/20180724191331.738eb819@gandalf.local.home
      
      Cc: stable@vger.kerne.org
      Fixes: 93e31ffb ("tracing: Add 'snapshot' event trigger command")
      Reported-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      1863c387
    • K
      cachefiles: Wait rather than BUG'ing on "Unexpected object collision" · c2412ac4
      Kiran Kumar Modukuri 提交于
      If we meet a conflicting object that is marked FSCACHE_OBJECT_IS_LIVE in
      the active object tree, we have been emitting a BUG after logging
      information about it and the new object.
      
      Instead, we should wait for the CACHEFILES_OBJECT_ACTIVE flag to be cleared
      on the old object (or return an error).  The ACTIVE flag should be cleared
      after it has been removed from the active object tree.  A timeout of 60s is
      used in the wait, so we shouldn't be able to get stuck there.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Signed-off-by: NKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      c2412ac4
    • K
      cachefiles: Fix missing clear of the CACHEFILES_OBJECT_ACTIVE flag · 5ce83d4b
      Kiran Kumar Modukuri 提交于
      In cachefiles_mark_object_active(), the new object is marked active and
      then we try to add it to the active object tree.  If a conflicting object
      is already present, we want to wait for that to go away.  After the wait,
      we go round again and try to re-mark the object as being active - but it's
      already marked active from the first time we went through and a BUG is
      issued.
      
      Fix this by clearing the CACHEFILES_OBJECT_ACTIVE flag before we try again.
      
      Analysis from Kiran Kumar Modukuri:
      
      [Impact]
      Oops during heavy NFS + FSCache + Cachefiles
      
      CacheFiles: Error: Overlong wait for old active object to go away.
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
      
      CacheFiles: Error: Object already active kernel BUG at
      fs/cachefiles/namei.c:163!
      
      [Cause]
      In a heavily loaded system with big files being read and truncated, an
      fscache object for a cookie is being dropped and a new object being
      looked. The new object being looked for has to wait for the old object
      to go away before the new object is moved to active state.
      
      [Fix]
      Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when
      retrying the object lookup.
      
      [Testcase]
      Have run ~100 hours of NFS stress tests and have not seen this bug recur.
      
      [Regression Potential]
       - Limited to fscache/cachefiles.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Signed-off-by: NKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5ce83d4b
    • K
      fscache: Fix reference overput in fscache_attach_object() error handling · f29507ce
      Kiran Kumar Modukuri 提交于
      When a cookie is allocated that causes fscache_object structs to be
      allocated, those objects are initialised with the cookie pointer, but
      aren't blessed with a ref on that cookie unless the attachment is
      successfully completed in fscache_attach_object().
      
      If attachment fails because the parent object was dying or there was a
      collision, fscache_attach_object() returns without incrementing the cookie
      counter - but upon failure of this function, the object is released which
      then puts the cookie, whether or not a ref was taken on the cookie.
      
      Fix this by taking a ref on the cookie when it is assigned in
      fscache_object_init(), even when we're creating a root object.
      
      
      Analysis from Kiran Kumar:
      
      This bug has been seen in 4.4.0-124-generic #148-Ubuntu kernel
      
      BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277
      
      fscache cookie ref count updated incorrectly during fscache object
      allocation resulting in following Oops.
      
      kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
      kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
      
      [Cause]
      Two threads are trying to do operate on a cookie and two objects.
      
      (1) One thread tries to unmount the filesystem and in process goes over a
          huge list of objects marking them dead and deleting the objects.
          cookie->usage is also decremented in following path:
      
            nfs_fscache_release_super_cookie
             -> __fscache_relinquish_cookie
              ->__fscache_cookie_put
              ->BUG_ON(atomic_read(&cookie->usage) <= 0);
      
      (2) A second thread tries to lookup an object for reading data in following
          path:
      
          fscache_alloc_object
          1) cachefiles_alloc_object
              -> fscache_object_init
                 -> assign cookie, but usage not bumped.
          2) fscache_attach_object -> fails in cant_attach_object because the
               cookie's backing object or cookie's->parent object are going away
          3) fscache_put_object
              -> cachefiles_put_object
                ->fscache_object_destroy
                  ->fscache_cookie_put
                     ->BUG_ON(atomic_read(&cookie->usage) <= 0);
      
      [NOTE from dhowells] It's unclear as to the circumstances in which (2) can
      take place, given that thread (1) is in nfs_kill_super(), however a
      conflicting NFS mount with slightly different parameters that creates a
      different superblock would do it.  A backtrace from Kiran seems to show
      that this is a possibility:
      
          kernel BUG at/build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!
          ...
          RIP: __fscache_cookie_put+0x3a/0x40 [fscache]
          Call Trace:
           __fscache_relinquish_cookie+0x87/0x120 [fscache]
           nfs_fscache_release_super_cookie+0x2d/0xb0 [nfs]
           nfs_kill_super+0x29/0x40 [nfs]
           deactivate_locked_super+0x48/0x80
           deactivate_super+0x5c/0x60
           cleanup_mnt+0x3f/0x90
           __cleanup_mnt+0x12/0x20
           task_work_run+0x86/0xb0
           exit_to_usermode_loop+0xc2/0xd0
           syscall_return_slowpath+0x4e/0x60
           int_ret_from_sys_call+0x25/0x9f
      
      [Fix] Bump up the cookie usage in fscache_object_init, when it is first
      being assigned a cookie atomically such that the cookie is added and bumped
      up if its refcount is not zero.  Remove the assignment in
      fscache_attach_object().
      
      [Testcase]
      I have run ~100 hours of NFS stress tests and not seen this bug recur.
      
      [Regression Potential]
       - Limited to fscache/cachefiles.
      
      Fixes: ccc4fc3d ("FS-Cache: Implement the cookie management part of the netfs API")
      Signed-off-by: NKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f29507ce
    • K
      cachefiles: Fix refcounting bug in backing-file read monitoring · 934140ab
      Kiran Kumar Modukuri 提交于
      cachefiles_read_waiter() has the right to access a 'monitor' object by
      virtue of being called under the waitqueue lock for one of the pages in its
      purview.  However, it has no ref on that monitor object or on the
      associated operation.
      
      What it is allowed to do is to move the monitor object to the operation's
      to_do list, but once it drops the work_lock, it's actually no longer
      permitted to access that object.  However, it is trying to enqueue the
      retrieval operation for processing - but it can only do this via a pointer
      in the monitor object, something it shouldn't be doing.
      
      If it doesn't enqueue the operation, the operation may not get processed.
      If the order is flipped so that the enqueue is first, then it's possible
      for the work processor to look at the to_do list before the monitor is
      enqueued upon it.
      
      Fix this by getting a ref on the operation so that we can trust that it
      will still be there once we've added the monitor to the to_do list and
      dropped the work_lock.  The op can then be enqueued after the lock is
      dropped.
      
      The bug can manifest in one of a couple of ways.  The first manifestation
      looks like:
      
       FS-Cache:
       FS-Cache: Assertion failed
       FS-Cache: 6 == 5 is false
       ------------[ cut here ]------------
       kernel BUG at fs/fscache/operation.c:494!
       RIP: 0010:fscache_put_operation+0x1e3/0x1f0
       ...
       fscache_op_work_func+0x26/0x50
       process_one_work+0x131/0x290
       worker_thread+0x45/0x360
       kthread+0xf8/0x130
       ? create_worker+0x190/0x190
       ? kthread_cancel_work_sync+0x10/0x10
       ret_from_fork+0x1f/0x30
      
      This is due to the operation being in the DEAD state (6) rather than
      INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through
      fscache_put_operation().
      
      The bug can also manifest like the following:
      
       kernel BUG at fs/fscache/operation.c:69!
       ...
          [exception RIP: fscache_enqueue_operation+246]
       ...
       #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6
       #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48
       #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028
      
      I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not
      entirely clear which assertion failed.
      
      Fixes: 9ae326a6 ("CacheFiles: A cache that backs onto a mounted filesystem")
      Reported-by: NLei Xue <carmark.dlut@gmail.com>
      Reported-by: NVegard Nossum <vegard.nossum@gmail.com>
      Reported-by: NAnthony DeRobertis <aderobertis@metrics.net>
      Reported-by: NNeilBrown <neilb@suse.com>
      Reported-by: NDaniel Axtens <dja@axtens.net>
      Reported-by: NKiran Kumar Modukuri <kiran.modukuri@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NDaniel Axtens <dja@axtens.net>
      934140ab