1. 17 3月, 2019 1 次提交
  2. 16 3月, 2019 23 次提交
    • A
      fix sysfs_init_fs_context() in !CONFIG_NET_NS case · ab81dabd
      Al Viro 提交于
      Permission checks on current's netns should be done only when
      netns are enabled.
      Reported-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Fixes: 23bf1b6bSigned-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ab81dabd
    • L
      Merge tag '5.1-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6 · 9c7dc824
      Linus Torvalds 提交于
      Pull more smb3 updates from Steve French:
       "Various tracing and debugging improvements, crediting fixes, some
        cleanup, and important fallocate fix (fixes three xfstests) and lock
        fix.
      
        Summary:
      
         - Various additional dynamic tracing tracepoints
      
         - Debugging improvements (including ability to query the server via
           SMB3 fsctl from userspace tools which can help with stats and
           debugging)
      
         - One minor performance improvement (root directory inode caching)
      
         - Crediting (SMB3 flow control) fixes
      
         - Some cleanup (docs and to mknod)
      
         - Important fixes: one to smb3 implementation of fallocate zero range
           (which fixes three xfstests) and a POSIX lock fix"
      
      * tag '5.1-rc-smb3' of git://git.samba.org/sfrench/cifs-2.6: (22 commits)
        CIFS: fix POSIX lock leak and invalid ptr deref
        SMB3: Allow SMB3 FSCTL queries to be sent to server from tools
        cifs: fix incorrect handling of smb2_set_sparse() return in smb3_simple_falloc
        smb2: fix typo in definition of a few error flags
        CIFS: make mknod() an smb_version_op
        cifs: minor documentation updates
        cifs: remove unused value pointed out by Coverity
        SMB3: passthru query info doesn't check for SMB3 FSCTL passthru
        smb3: add dynamic tracepoints for simple fallocate and zero range
        cifs: fix smb3_zero_range so it can expand the file-size when required
        cifs: add SMB2_ioctl_init/free helpers to be used with compounding
        smb3: Add dynamic trace points for various compounded smb3 ops
        cifs: cache FILE_ALL_INFO for the shared root handle
        smb3: display volume serial number for shares in /proc/fs/cifs/DebugData
        cifs: simplify how we handle credits in compound_send_recv()
        smb3: add dynamic tracepoint for timeout waiting for credits
        smb3: display security information in /proc/fs/cifs/DebugData more accurately
        cifs: add a timeout argument to wait_for_free_credits
        cifs: prevent starvation in wait_for_free_credits for multi-credit requests
        cifs: wait_for_free_credits() make it possible to wait for >=1 credits
        ...
      9c7dc824
    • L
      Merge branch 'for-linus-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml · 6c83d0d5
      Linus Torvalds 提交于
      Pull UML updates from Richard Weinberger:
       "Bugfix for the UML block device driver"
      
      * 'for-linus-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
        um: Fix for a possible OOPS in ubd initialization
        um: Remove duplicated include from vector_user.c
      6c83d0d5
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 636deed6
      Linus Torvalds 提交于
      Pull KVM updates from Paolo Bonzini:
       "ARM:
         - some cleanups
         - direct physical timer assignment
         - cache sanitization for 32-bit guests
      
        s390:
         - interrupt cleanup
         - introduction of the Guest Information Block
         - preparation for processor subfunctions in cpu models
      
        PPC:
         - bug fixes and improvements, especially related to machine checks
           and protection keys
      
        x86:
         - many, many cleanups, including removing a bunch of MMU code for
           unnecessary optimizations
         - AVIC fixes
      
        Generic:
         - memcg accounting"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (147 commits)
        kvm: vmx: fix formatting of a comment
        KVM: doc: Document the life cycle of a VM and its resources
        MAINTAINERS: Add KVM selftests to existing KVM entry
        Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()"
        KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()
        KVM: PPC: Fix compilation when KVM is not enabled
        KVM: Minor cleanups for kvm_main.c
        KVM: s390: add debug logging for cpu model subfunctions
        KVM: s390: implement subfunction processor calls
        arm64: KVM: Fix architecturally invalid reset value for FPEXC32_EL2
        KVM: arm/arm64: Remove unused timer variable
        KVM: PPC: Book3S: Improve KVM reference counting
        KVM: PPC: Book3S HV: Fix build failure without IOMMU support
        Revert "KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"
        x86: kvmguest: use TSC clocksource if invariant TSC is exposed
        KVM: Never start grow vCPU halt_poll_ns from value below halt_poll_ns_grow_start
        KVM: Expose the initial start value in grow_halt_poll_ns() as a module parameter
        KVM: grow_halt_poll_ns() should never shrink vCPU halt_poll_ns
        KVM: x86/mmu: Consolidate kvm_mmu_zap_all() and kvm_mmu_zap_mmio_sptes()
        KVM: x86/mmu: WARN if zapping a MMIO spte results in zapping children
        ...
      636deed6
    • L
      Merge tag 'trace-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · aa2e3ac6
      Linus Torvalds 提交于
      Pull tracing fixes and cleanups from Steven Rostedt:
       "This contains a series of last minute clean ups, small fixes and error
        checks"
      
      * tag 'trace-v5.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        tracing/probe: Verify alloc_trace_*probe() result
        tracing/probe: Check event/group naming rule at parsing
        tracing/probe: Check the size of argument name and body
        tracing/probe: Check event name length correctly
        tracing/probe: Check maxactive error cases
        tracing: kdb: Fix ftdump to not sleep
        trace/probes: Remove kernel doc style from non kernel doc comment
        tracing/probes: Make reserved_field_names static
      aa2e3ac6
    • L
      Merge tag 'iommu-fix-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 323ea40f
      Linus Torvalds 提交于
      Pull IOMMU fix from Joerg Roedel:
       "Fix a NULL-pointer dereference issue in the ACPI device matching code
        of the AMD IOMMU driver"
      
      * tag 'iommu-fix-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
        iommu/amd: Fix NULL dereference bug in match_hid_uid
      323ea40f
    • L
      Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · 0be28863
      Linus Torvalds 提交于
      Pull ARM updates from Russell King:
      
       - An improvement from Ard Biesheuvel, who noted that the identity map
         setup was taking a long time due to flush_cache_louis().
      
       - Update a comment about dma_ops from Wolfram Sang.
      
       - Remove use of "-p" with ld, where this flag has been a no-op since
         2004.
      
       - Remove the printing of the virtual memory layout, which is no longer
         useful since we hide pointers.
      
       - Correct SCU help text.
      
       - Remove legacy TWD registration method.
      
       - Add pgprot_device() implementation for mapping PCI sysfs resource
         files.
      
       - Initialise PFN limits earlier for kmemleak.
      
       - Fix argument count to match macro definition (affects clang builds)
      
       - Use unified assembler language almost everywhere for clang, and other
         clang improvements (from Stefan Agner, Nathan Chancellor).
      
       - Support security extension for noMMU and other noMMU cleanups (from
         Vladimir Murzin).
      
       - Remove unnecessary SMP bringup code (which was incorrectly copy'n'
         pasted from the ARM platform implementations) and remove it from the
         arch code to discourge further copys of it appearing.
      
       - Add Cortex A9 erratum preventing kexec working on some SoCs.
      
       - AMBA bus identification updates from Mike Leach.
      
       - More use of raw spinlocks to avoid -RT kernel issues (from Yang Shi
         and Sebastian Andrzej Siewior).
      
       - MCPM hyp/svc mode mismatch fixes from Marek Szyprowski.
      
      * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: (32 commits)
        ARM: 8849/1: NOMMU: Fix encodings for PMSAv8's PRBAR4/PRLAR4
        ARM: 8848/1: virt: Align GIC version check with arm64 counterpart
        ARM: 8847/1: pm: fix HYP/SVC mode mismatch when MCPM is used
        ARM: 8845/1: use unified assembler in c files
        ARM: 8844/1: use unified assembler in assembly files
        ARM: 8843/1: use unified assembler in headers
        ARM: 8841/1: use unified assembler in macros
        ARM: 8840/1: use a raw_spinlock_t in unwind
        ARM: 8839/1: kprobe: make patch_lock a raw_spinlock_t
        ARM: 8837/1: coresight: etmv4: Update ID register table to add UCI support
        ARM: 8836/1: drivers: amba: Update component matching to use the CoreSight UCI values.
        ARM: 8838/1: drivers: amba: Updates to component identification for driver matching.
        ARM: 8833/1: Ensure that NEON code always compiles with Clang
        ARM: avoid Cortex-A9 livelock on tight dmb loops
        ARM: smp: remove arch-provided "pen_release"
        ARM: actions: remove boot_lock and pen_release
        ARM: oxnas: remove CPU hotplug implementation
        ARM: qcom: remove unnecessary boot_lock
        ARM: 8832/1: NOMMU: Limit visibility for CONFIG_FLASH_{MEM_BASE,SIZE}
        ARM: 8831/1: NOMMU: pmsa-v8: remove unneeded semicolon
        ...
      0be28863
    • L
      Merge tag 'ntb-5.1' of git://github.com/jonmason/ntb · e8a71a38
      Linus Torvalds 提交于
      Pull NTB updates from Jon Mason:
      
       - fixes for switchtec debugability and mapping table entries
      
       - NTB transport improvements
      
       - a reworking of the peer_db_addr for better abstraction
      
      * tag 'ntb-5.1' of git://github.com/jonmason/ntb:
        NTB: add new parameter to peer_db_addr() db_bit and db_data
        NTB: ntb_transport: Ensure the destination buffer is mapped for TX DMA
        NTB: ntb_transport: Free MWs in ntb_transport_link_cleanup()
        ntb_hw_switchtec: Added support of >=4G memory windows
        ntb_hw_switchtec: NT req id mapping table register entry number should be 512
        ntb_hw_switchtec: debug print 64bit aligned crosslink BAR Numbers
      e8a71a38
    • L
      Merge tag 'fbdev-v5.1' of git://github.com/bzolnier/linux · 2b9c272c
      Linus Torvalds 提交于
      Pull fbdev updates from Bartlomiej Zolnierkiewicz:
       "Just a couple of small fixes and cleanups:
      
         - fix memory access if logo is bigger than the screen (Manfred
           Schlaegl)
      
         - silence fbcon logo on 'quiet' boots (Prarit Bhargava)
      
         - use kvmalloc() for scrollback buffer in fbcon (Konstantin Khorenko)
      
         - misc fixes (Colin Ian King, YueHaibing, Matteo Croce, Mathieu
           Malaterre, Anders Roxell, Arnd Bergmann)
      
         - misc cleanups (Rob Herring, Lubomir Rintel, Greg Kroah-Hartman,
           Jani Nikula, Michal Vokáč)"
      
      * tag 'fbdev-v5.1' of git://github.com/bzolnier/linux:
        fbdev: mbx: fix a misspelled variable name
        fbdev: omap2: fix warnings in dss core
        video: fbdev: Fix potential NULL pointer dereference
        fbcon: Silence fbcon logo on 'quiet' boots
        printk: Export console_printk
        ARM: dts: imx28-cfa10036: Fix the reset gpio signal polarity
        video: ssd1307fb: Do not hard code active-low reset sequence
        dt-bindings: display: ssd1307fb: Remove reset-active-low from examples
        fbdev: fbmem: fix memory access if logo is bigger than the screen
        video/fbdev: refactor video= cmdline parsing
        fbdev: mbx: fix up debugfs file creation
        fbdev: omap2: no need to check return value of debugfs_create functions
        video: fbdev: geode: remove ifdef OLPC noise
        video: offb: annotate implicit fall throughs
        omapfb: fix typo
        fbdev: Use of_node_name_eq for node name comparisons
        fbcon: use kvmalloc() for scrollback buffer
        fbdev: chipsfb: remove set but not used variable 'size'
        fbdev/via: fix spelling mistake "Expandsion" -> "Expansion"
      2b9c272c
    • L
      Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 51b1ac0f
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "A set of driver bugfixes and an improvement for a core helper"
      
      * 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: i2c-designware-platdrv: Always use a dynamic adapter number
        i2c: i2c-designware-platdrv: Cleanup setting of the adapter number
        i2c: add extra check to safe DMA buffer helper
        i2c: i2c-stm32f7: Fix SDADEL minimum formula
        i2c: rcar: explain the lockless design
        i2c: rcar: fix concurrency issue related to ICDMAER
        i2c: sis630: correct format strings
        i2c: mediatek: modify threshold passed to i2c_get_dma_safe_msg_buf()
      51b1ac0f
    • L
      Merge tag 'sound-fix-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 2dbb0e6c
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "Some cleaning after the first batch; mostly about HD-audio quirks but
        also some NULL dereference fixes in corner cases and a random build
        error fix, too"
      
      * tag 'sound-fix-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda/realtek - Add support headset mode for New DELL WYSE NB
        ALSA: hda/realtek - Add support headset mode for DELL WYSE AIO
        ALSA: hda/realtek: merge alc_fixup_headset_jack to alc295_fixup_chromebook
        ALSA: pcm: Fix function name in kernel-doc comment
        ALSA: hda: hdmi - add Icelake support
        ALSA: hda - add more quirks for HP Z2 G4 and HP Z240
        ALSA: hda/realtek - Fixed Headset Mic JD not stable
        ALSA: hda/realtek: Enable headset MIC of Acer TravelMate X514-51T with ALC255
        ALSA: hda/tegra: avoid build error without CONFIG_PM
        ALSA: usx2y: Fix potential NULL pointer dereference
        ALSA: hda: Avoid NULL pointer dereference at snd_hdac_stream_start()
      2dbb0e6c
    • L
      Merge tag 'drm-next-2019-03-15' of git://anongit.freedesktop.org/drm/drm · 8264fd04
      Linus Torvalds 提交于
      Pull drm fixes and updates from Dave Airlie:
       "A few various fixes pulls and one late etnaviv pull but it was nearly
        all fixes anyways.
      
        etnaviv:
         - late next pull
         - mmu mapping fix
         - build non-ARM arches
         - misc fixes
      
        i915:
         - HDCP state handling fix
         - shrinker interaction fix
         - atomic state leak fix
      
        qxl:
         - kick out framebuffers early fix
      
        amdgpu:
         - Powerplay fixes
         - DC fixes
         - BACO turned off for now on vega20
         - Locking fix
         - KFD MQD fix
         - gfx9 golden register updates"
      
      * tag 'drm-next-2019-03-15' of git://anongit.freedesktop.org/drm/drm: (43 commits)
        drm/amdgpu: Update gc golden setting for vega family
        drm/amd/powerplay: correct power reading on fiji
        drm/amd/powerplay: set max fan target temperature as 105C
        drm/i915: Relax mmap VMA check
        drm/i915: Fix atomic state leak when resetting HDMI link
        drm/i915: Acquire breadcrumb ref before cancelling
        drm/i915/selftests: Always free spinner on __sseu_prepare error
        drm/i915: Reacquire priolist cache after dropping the engine lock
        drm/i915: Protect i915_active iterators from the shrinker
        drm/i915: HDCP state handling in ddi_update_pipe
        drm/qxl: remove conflicting framebuffers earlier
        drm/fb-helper: call vga_remove_vgacon automatically.
        drm: move i915_kick_out_vgacon to vgaarb
        drm/amd/display: don't call dm_pp_ function from an fpu block
        drm: add __user attribute to ptr_to_compat()
        drm/amdgpu: clear PDs/PTs only after initializing them
        drm/amd/display: Pass app_tf by value rather than by reference
        Revert "drm/amdgpu: use BACO reset on vega20 if platform support"
        drm/amd/powerplay: show the right override pcie parameters
        drm/amd/powerplay: honor the OD settings
        ...
      8264fd04
    • L
      Merge tag 'xfs-5.1-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · de578188
      Linus Torvalds 提交于
      Pull xfs cleanups from Darrick Wong:
       "Here's a few more cleanups that trickled in for the merge window.
      
        It's all fixes for static checker complaints and slowly unwinding
        typedef usage. The four patches here have gone through a few days
        worth of fstest runs with no new problems observed.
      
        Summary:
      
         - Fix some clang/smatch/sparse warnings about uninitialized
           variables.
      
         - Clean up some typedef usage"
      
      * tag 'xfs-5.1-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: clean up xfs_dir2_leaf_addname
        xfs: zero initialize highstale and lowstale in xfs_dir2_leaf_addname
        xfs: clean up xfs_dir2_leafn_add
        xfs: Zero initialize highstale and lowstale in xfs_dir2_leafn_add
      de578188
    • L
      Merge tag 'f2fs-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs · 5160bcce
      Linus Torvalds 提交于
      Pull f2fs updates from Jaegeuk Kim:
       "We've continued mainly to fix bugs in this round, as f2fs has been
        shipped in more devices. Especially, we've focused on stabilizing
        checkpoint=disable feature, and provided some interfaces for QA.
      
        Enhancements:
         - expose FS_NOCOW_FL for pin_file
         - run discard jobs at unmount time with timeout
         - tune discarding thread to avoid idling which consumes power
         - some checking codes to address vulnerabilities
         - give random value to i_generation
         - shutdown with more flags for QA
      
        Bug fixes:
         - clean up stale objects when mount is failed along with
           checkpoint=disable
         - fix system being stuck due to wrong count by atomic writes
         - handle some corrupted disk cases
         - fix a deadlock in f2fs_read_inline_dir
      
        We've also added some minor build error fixes and clean-up patches"
      
      * tag 'f2fs-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (53 commits)
        f2fs: set pin_file under CAP_SYS_ADMIN
        f2fs: fix to avoid deadlock in f2fs_read_inline_dir()
        f2fs: fix to adapt small inline xattr space in __find_inline_xattr()
        f2fs: fix to do sanity check with inode.i_inline_xattr_size
        f2fs: give some messages for inline_xattr_size
        f2fs: don't trigger read IO for beyond EOF page
        f2fs: fix to add refcount once page is tagged PG_private
        f2fs: remove wrong comment in f2fs_invalidate_page()
        f2fs: fix to use kvfree instead of kzfree
        f2fs: print more parameters in trace_f2fs_map_blocks
        f2fs: trace f2fs_ioc_shutdown
        f2fs: fix to avoid deadlock of atomic file operations
        f2fs: fix to dirty inode for i_mode recovery
        f2fs: give random value to i_generation
        f2fs: no need to take page lock in readdir
        f2fs: fix to update iostat correctly in IPU path
        f2fs: fix encrypted page memory leak
        f2fs: make fault injection covering __submit_flush_wait()
        f2fs: fix to retry fill_super only if recovery failed
        f2fs: silence VM_WARN_ON_ONCE in mempool_alloc
        ...
      5160bcce
    • L
      Merge branch 'akpm' (rest of patches from Andrew) · f91f2ee5
      Linus Torvalds 提交于
      Merge the left-over patches from Andrew Morton.
      
      This merges the remaining two patches from Andrew's pile of "little bit
      more MM".  I mulled it over, and we emailed back and forth with Josef,
      and he pointed out where I was wrong.
      
      Rule #51 of kernel maintenance: when somebody makes it clear that they
      know the code better than you did, stop arguing and just apply the damn
      patch.
      
      Add a third patch by me to add a comment for the case that I had thought
      was buggy and Josef corrected me on.
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        filemap: add a comment about FAULT_FLAG_RETRY_NOWAIT behavior
        filemap: drop the mmap_sem for all blocking operations
        filemap: kill page_cache_read usage in filemap_fault
      f91f2ee5
    • L
      filemap: add a comment about FAULT_FLAG_RETRY_NOWAIT behavior · 8b0f9fa2
      Linus Torvalds 提交于
      I thought Josef Bacik's patch to drop the mmap_sem was buggy, because
      when looking at the error cases, there was one case where we returned
      VM_FAULT_RETRY without actually dropping the mmap_sem.
      
      Josef had to explain to me (using small words) that yes, that's actually
      what we're supposed to do, and his patch was correct.  Which not only
      convinced me he knew what he was doing and I should stop arguing with
      him, but also that I should add a comment to the case I was confused
      about.
      Patiently-pointed-out-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8b0f9fa2
    • P
      kvm: vmx: fix formatting of a comment · 4a605bc0
      Paolo Bonzini 提交于
      Eliminate a gratuitous conflict with 5.0.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4a605bc0
    • S
      KVM: doc: Document the life cycle of a VM and its resources · eca6be56
      Sean Christopherson 提交于
      The series to add memcg accounting to KVM allocations[1] states:
      
        There are many KVM kernel memory allocations which are tied to the
        life of the VM process and should be charged to the VM process's
        cgroup.
      
      While it is correct to account KVM kernel allocations to the cgroup of
      the process that created the VM, it's technically incorrect to state
      that the KVM kernel memory allocations are tied to the life of the VM
      process.  This is because the VM itself, i.e. struct kvm, is not tied to
      the life of the process which created it, rather it is tied to the life
      of its associated file descriptor.  In other words, kvm_destroy_vm() is
      not invoked until fput() decrements its associated file's refcount to
      zero.  A simple example is to fork() in Qemu and have the child sleep
      indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file
      descriptor *and* the rogue child is killed.
      
      The allocations are guaranteed to be *accounted* to the process which
      created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the
      ioctl() with -EIO if kvm->mm != current->mm.  I.e. the child can keep
      the VM "alive" but can't do anything useful with its reference.
      
      Note that because 'struct kvm' also holds a reference to the mm_struct
      of its owner, the above behavior also applies to userspace allocations.
      
      Given that mucking with a VM's file descriptor can lead to subtle and
      undesirable behavior, e.g. memcg charges persisting after a VM is shut
      down, explicitly document a VM's lifecycle and its impact on the VM's
      resources.
      
      Alternatively, KVM could aggressively free resources when the creating
      process exits, e.g. via mmu_notifier->release().  However, mmu_notifier
      isn't guaranteed to be available, and freeing resources when the creator
      exits is likely to be error prone and fragile as KVM would need to
      ensure that it only freed resources that are truly out of reach. In
      practice, the existing behavior shouldn't be problematic as a properly
      configured system will prevent a child process from being moved out of
      the appropriate cgroup hierarchy, i.e. prevent hiding the process from
      the OOM killer, and will prevent an unprivileged user from being able to
      to hold a reference to struct kvm via another method, e.g. debugfs.
      
      [1]https://patchwork.kernel.org/patch/10806707/Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eca6be56
    • J
      filemap: drop the mmap_sem for all blocking operations · 6b4c9f44
      Josef Bacik 提交于
      Currently we only drop the mmap_sem if there is contention on the page
      lock.  The idea is that we issue readahead and then go to lock the page
      while it is under IO and we want to not hold the mmap_sem during the IO.
      
      The problem with this is the assumption that the readahead does anything.
      In the case that the box is under extreme memory or IO pressure we may end
      up not reading anything at all for readahead, which means we will end up
      reading in the page under the mmap_sem.
      
      Even if the readahead does something, it could get throttled because of io
      pressure on the system and the process is in a lower priority cgroup.
      
      Holding the mmap_sem while doing IO is problematic because it can cause
      system-wide priority inversions.  Consider some large company that does a
      lot of web traffic.  This large company has load balancing logic in it's
      core web server, cause some engineer thought this was a brilliant plan.
      This load balancing logic gets statistics from /proc about the system,
      which trip over processes mmap_sem for various reasons.  Now the web
      server application is in a protected cgroup, but these other processes may
      not be, and if they are being throttled while their mmap_sem is held we'll
      stall, and cause this nice death spiral.
      
      Instead rework filemap fault path to drop the mmap sem at any point that
      we may do IO or block for an extended period of time.  This includes while
      issuing readahead, locking the page, or needing to call ->readpage because
      readahead did not occur.  Then once we have a fully uptodate page we can
      return with VM_FAULT_RETRY and come back again to find our nicely in-cache
      page that was gotten outside of the mmap_sem.
      
      This patch also adds a new helper for locking the page with the mmap_sem
      dropped.  This doesn't make sense currently as generally speaking if the
      page is already locked it'll have been read in (unless there was an error)
      before it was unlocked.  However a forthcoming patchset will change this
      with the ability to abort read-ahead bio's if necessary, making it more
      likely that we could contend for a page lock and still have a not uptodate
      page.  This allows us to deal with this case by grabbing the lock and
      issuing the IO without the mmap_sem held, and then returning
      VM_FAULT_RETRY to come back around.
      
      [josef@toxicpanda.com: v6]
        Link: http://lkml.kernel.org/r/20181212152757.10017-1-josef@toxicpanda.com
      [kirill@shutemov.name: fix race in filemap_fault()]
        Link: http://lkml.kernel.org/r/20181228235106.okk3oastsnpxusxs@kshutemo-mobl1
      [akpm@linux-foundation.org: coding style fixes]
      Link: http://lkml.kernel.org/r/20181211173801.29535-4-josef@toxicpanda.comSigned-off-by: NJosef Bacik <josef@toxicpanda.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Tested-by: syzbot+b437b5a429d680cf2217@syzkaller.appspotmail.com
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b4c9f44
    • J
      filemap: kill page_cache_read usage in filemap_fault · a75d4c33
      Josef Bacik 提交于
      Patch series "drop the mmap_sem when doing IO in the fault path", v6.
      
      Now that we have proper isolation in place with cgroups2 we have started
      going through and fixing the various priority inversions.  Most are all
      gone now, but this one is sort of weird since it's not necessarily a
      priority inversion that happens within the kernel, but rather because of
      something userspace does.
      
      We have giant applications that we want to protect, and parts of these
      giant applications do things like watch the system state to determine how
      healthy the box is for load balancing and such.  This involves running
      'ps' or other such utilities.  These utilities will often walk
      /proc/<pid>/whatever, and these files can sometimes need to
      down_read(&task->mmap_sem).  Not usually a big deal, but we noticed when
      we are stress testing that sometimes our protected application has latency
      spikes trying to get the mmap_sem for tasks that are in lower priority
      cgroups.
      
      This is because any down_write() on a semaphore essentially turns it into
      a mutex, so even if we currently have it held for reading, any new readers
      will not be allowed on to keep from starving the writer.  This is fine,
      except a lower priority task could be stuck doing IO because it has been
      throttled to the point that its IO is taking much longer than normal.  But
      because a higher priority group depends on this completing it is now stuck
      behind lower priority work.
      
      In order to avoid this particular priority inversion we want to use the
      existing retry mechanism to stop from holding the mmap_sem at all if we
      are going to do IO.  This already exists in the read case sort of, but
      needed to be extended for more than just grabbing the page lock.  With
      io.latency we throttle at submit_bio() time, so the readahead stuff can
      block and even page_cache_read can block, so all these paths need to have
      the mmap_sem dropped.
      
      The other big thing is ->page_mkwrite.  btrfs is particularly shitty here
      because we have to reserve space for the dirty page, which can be a very
      expensive operation.  We use the same retry method as the read path, and
      simply cache the page and verify the page is still setup properly the next
      pass through ->page_mkwrite().
      
      I've tested these patches with xfstests and there are no regressions.
      
      This patch (of 3):
      
      If we do not have a page at filemap_fault time we'll do this weird forced
      page_cache_read thing to populate the page, and then drop it again and
      loop around and find it.  This makes for 2 ways we can read a page in
      filemap_fault, and it's not really needed.  Instead add a FGP_FOR_MMAP
      flag so that pagecache_get_page() will return a unlocked page that's in
      pagecache.  Then use the normal page locking and readpage logic already in
      filemap_fault.  This simplifies the no page in page cache case
      significantly.
      
      [akpm@linux-foundation.org: fix comment text]
      [josef@toxicpanda.com: don't unlock null page in FGP_FOR_MMAP case]
        Link: http://lkml.kernel.org/r/20190312201742.22935-1-josef@toxicpanda.com
      Link: http://lkml.kernel.org/r/20181211173801.29535-2-josef@toxicpanda.comSigned-off-by: NJosef Bacik <josef@toxicpanda.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a75d4c33
    • P
      Merge tag 'kvm-ppc-next-5.1-3' of... · c7a0e83c
      Paolo Bonzini 提交于
      Merge tag 'kvm-ppc-next-5.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
      
      Third PPC KVM update for 5.1
      
      - Tell userspace about whether a particular hardware workaround for
        one of the Spectre vulnerabilities is available, so that userspace
        can inform the guest.
      c7a0e83c
    • S
      MAINTAINERS: Add KVM selftests to existing KVM entry · 46333236
      Sean Christopherson 提交于
      It's safe to assume Paolo and Radim are maintaining the KVM selftests
      given that the vast majority of commits have their SOBs.  Play nice
      with get_maintainers and make it official.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      46333236
    • B
      Revert "KVM/MMU: Flush tlb directly in the kvm_zap_gfn_range()" · 92da008f
      Ben Gardon 提交于
      This reverts commit 71883a62.
      
      The above commit contains an optimization to kvm_zap_gfn_range which
      uses gfn-limited TLB flushes, if enabled. If using these limited flushes,
      kvm_zap_gfn_range passes lock_flush_tlb=false to slot_handle_level_range
      which creates a race when the function unlocks to call cond_resched.
      See an example of this race below:
      
      CPU 0                   CPU 1                           CPU 3
      // zap_direct_gfn_range
      mmu_lock()
      // *ptep == pte_1
      *ptep = 0
      if (lock_flush_tlb)
              flush_tlbs()
      mmu_unlock()
                              // In invalidate range
                              // MMU notifier
                              mmu_lock()
                              if (pte != 0)
                                      *ptep = 0
                                      flush = true
                              if (flush)
                                      flush_remote_tlbs()
                              mmu_unlock()
                              return
                              // Host MM reallocates
                              // page previously
                              // backing guest memory.
                                                              // Guest accesses
                                                              // invalid page
                                                              // through pte_1
                                                              // in its TLB!!
      
      Tested: Ran all kvm-unit-tests on a Intel Haswell machine with and
      	without this patch. The patch introduced no new failures.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      92da008f
  3. 15 3月, 2019 16 次提交
    • A
      iommu/amd: Fix NULL dereference bug in match_hid_uid · bb6bccba
      Aaron Ma 提交于
      Add a non-NULL check to fix potential NULL pointer dereference
      Cleanup code to call function once.
      Signed-off-by: NAaron Ma <aaron.ma@canonical.com>
      Fixes: 2bf9a0a1 ('iommu/amd: Add iommu support for ACPI HID devices')
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      bb6bccba
    • R
      4c2741ac
    • A
      CIFS: fix POSIX lock leak and invalid ptr deref · bc31d0cd
      Aurelien Aptel 提交于
      We have a customer reporting crashes in lock_get_status() with many
      "Leaked POSIX lock" messages preceeding the crash.
      
       Leaked POSIX lock on dev=0x0:0x56 ...
       Leaked POSIX lock on dev=0x0:0x56 ...
       Leaked POSIX lock on dev=0x0:0x56 ...
       Leaked POSIX lock on dev=0x0:0x53 ...
       Leaked POSIX lock on dev=0x0:0x53 ...
       Leaked POSIX lock on dev=0x0:0x53 ...
       Leaked POSIX lock on dev=0x0:0x53 ...
       POSIX: fl_owner=ffff8900e7b79380 fl_flags=0x1 fl_type=0x1 fl_pid=20709
       Leaked POSIX lock on dev=0x0:0x4b ino...
       Leaked locks on dev=0x0:0x4b ino=0xf911400000029:
       POSIX: fl_owner=ffff89f41c870e00 fl_flags=0x1 fl_type=0x1 fl_pid=19592
       stack segment: 0000 [#1] SMP
       Modules linked in: binfmt_misc msr tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_gss_krb5 arc4 ecb auth_rpcgss nfsv4 md4 nfs nls_utf8 lockd grace cifs sunrpc ccm dns_resolver fscache af_packet iscsi_ibft iscsi_boot_sysfs vmw_vsock_vmci_transport vsock xfs libcrc32c sb_edac edac_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg ansi_cprng vmw_balloon aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd joydev pcspkr vmxnet3 i2c_piix4 vmw_vmci shpchp fjes processor button ac btrfs xor raid6_pq sr_mod cdrom ata_generic sd_mod ata_piix vmwgfx crc32c_intel drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm serio_raw ahci libahci drm libata vmw_pvscsi sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
      
       Supported: Yes
       CPU: 6 PID: 28250 Comm: lsof Not tainted 4.4.156-94.64-default #1
       Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
       task: ffff88a345f28740 ti: ffff88c74005c000 task.ti: ffff88c74005c000
       RIP: 0010:[<ffffffff8125dcab>]  [<ffffffff8125dcab>] lock_get_status+0x9b/0x3b0
       RSP: 0018:ffff88c74005fd90  EFLAGS: 00010202
       RAX: ffff89bde83e20ae RBX: ffff89e870003d18 RCX: 0000000049534f50
       RDX: ffffffff81a3541f RSI: ffffffff81a3544e RDI: ffff89bde83e20ae
       RBP: 0026252423222120 R08: 0000000020584953 R09: 000000000000ffff
       R10: 0000000000000000 R11: ffff88c74005fc70 R12: ffff89e5ca7b1340
       R13: 00000000000050e5 R14: ffff89e870003d30 R15: ffff89e5ca7b1340
       FS:  00007fafd64be800(0000) GS:ffff89f41fd00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 0000000001c80018 CR3: 000000a522048000 CR4: 0000000000360670
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Stack:
        0000000000000208 ffffffff81a3d6b6 ffff89e870003d30 ffff89e870003d18
        ffff89e5ca7b1340 ffff89f41738d7c0 ffff89e870003d30 ffff89e5ca7b1340
        ffffffff8125e08f 0000000000000000 ffff89bc22b67d00 ffff88c74005ff28
       Call Trace:
        [<ffffffff8125e08f>] locks_show+0x2f/0x70
        [<ffffffff81230ad1>] seq_read+0x251/0x3a0
        [<ffffffff81275bbc>] proc_reg_read+0x3c/0x70
        [<ffffffff8120e456>] __vfs_read+0x26/0x140
        [<ffffffff8120e9da>] vfs_read+0x7a/0x120
        [<ffffffff8120faf2>] SyS_read+0x42/0xa0
        [<ffffffff8161cbc3>] entry_SYSCALL_64_fastpath+0x1e/0xb7
      
      When Linux closes a FD (close(), close-on-exec, dup2(), ...) it calls
      filp_close() which also removes all posix locks.
      
      The lock struct is initialized like so in filp_close() and passed
      down to cifs
      
      	...
              lock.fl_type = F_UNLCK;
              lock.fl_flags = FL_POSIX | FL_CLOSE;
              lock.fl_start = 0;
              lock.fl_end = OFFSET_MAX;
      	...
      
      Note the FL_CLOSE flag, which hints the VFS code that this unlocking
      is done for closing the fd.
      
      filp_close()
        locks_remove_posix(filp, id);
          vfs_lock_file(filp, F_SETLK, &lock, NULL);
            return filp->f_op->lock(filp, cmd, fl) => cifs_lock()
              rc = cifs_setlk(file, flock, type, wait_flag, posix_lck, lock, unlock, xid);
                rc = server->ops->mand_unlock_range(cfile, flock, xid);
                if (flock->fl_flags & FL_POSIX && !rc)
                        rc = locks_lock_file_wait(file, flock)
      
      Notice how we don't call locks_lock_file_wait() which does the
      generic VFS lock/unlock/wait work on the inode if rc != 0.
      
      If we are closing the handle, the SMB server is supposed to remove any
      locks associated with it. Similarly, cifs.ko frees and wakes up any
      lock and lock waiter when closing the file:
      
      cifs_close()
        cifsFileInfo_put(file->private_data)
      	/*
      	 * Delete any outstanding lock records. We'll lose them when the file
      	 * is closed anyway.
      	 */
      	down_write(&cifsi->lock_sem);
      	list_for_each_entry_safe(li, tmp, &cifs_file->llist->locks, llist) {
      		list_del(&li->llist);
      		cifs_del_lock_waiters(li);
      		kfree(li);
      	}
      	list_del(&cifs_file->llist->llist);
      	kfree(cifs_file->llist);
      	up_write(&cifsi->lock_sem);
      
      So we can safely ignore unlocking failures in cifs_lock() if they
      happen with the FL_CLOSE flag hint set as both the server and the
      client take care of it during the actual closing.
      
      This is not a proper fix for the unlocking failure but it's safe and
      it seems to prevent the lock leakages and crashes the customer
      experiences.
      Signed-off-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NNeilBrown <neil@brown.name>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Acked-by: NPavel Shilovsky <pshilov@microsoft.com>
      bc31d0cd
    • R
      SMB3: Allow SMB3 FSCTL queries to be sent to server from tools · f5778c39
      Ronnie Sahlberg 提交于
      For debugging purposes we often have to be able to query
      additional information only available via SMB3 FSCTL
      from the server from user space tools (e.g. like
      cifs-utils's smbinfo).  See MS-FSCC and MS-SMB2 protocol
      specifications for more details.
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      f5778c39
    • R
      cifs: fix incorrect handling of smb2_set_sparse() return in smb3_simple_falloc · f1699479
      Ronnie Sahlberg 提交于
      smb2_set_sparse does not return -errno, it returns a boolean where
      true means success.
      Change this to just ignore the return value just like the other callsites.
      
      Additionally add code to handle the case where we must set the file sparse
      and possibly also extending it.
      
      Fixes xfstests: generic/236 generic/350 generic/420
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      f1699479
    • S
      smb2: fix typo in definition of a few error flags · dd0ac2d2
      Steve French 提交于
      As Sergey Senozhatsky pointed out __constant_cpu_to_le32()
      is misspelled in a few definitions in the list of status
      codes smb2status.h as __constanst_cpu_to_le32()
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      dd0ac2d2
    • A
      CIFS: make mknod() an smb_version_op · c847dccf
      Aurelien Aptel 提交于
      This cleanup removes cifs specific code from SMB2/SMB3 code paths
      which is cleaner and easier to maintain as the code to handle
      special files is improved.  Below is an example creating special files
      using 'sfu' mount option over SMB3 to Windows (with this patch)
      (Note that to Samba server, support for saving dos attributes
      has to be enabled for the SFU mount option to work).
      
      In the future this will also make implementation of creating
      special files as reparse points easier (as Windows NFS server does
      for example).
      
         root@smf-Thinkpad-P51:~# stat -c "%F" /mnt2/char
         character special file
      
         root@smf-Thinkpad-P51:~# stat -c "%F" /mnt2/block
         block special file
      Signed-off-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      c847dccf
    • S
      cifs: minor documentation updates · 65525802
      Steve French 提交于
      Also updated a comment describing use of the GlobalMid_Lock
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      65525802
    • S
      cifs: remove unused value pointed out by Coverity · d44d1372
      Steve French 提交于
      Detected by CoverityScan CID#1438719 ("Unused Value")
      
      buf is reset again before being used so these two lines of code
      are useless.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      d44d1372
    • S
      SMB3: passthru query info doesn't check for SMB3 FSCTL passthru · 31ba4331
      Steve French 提交于
      The passthrough queries from user space tools like smbinfo can be either
      SMB3 QUERY_INFO or SMB3 FSCTL, but we are not checking for the latter.
      Temporarily we return EOPNOTSUPP for SMB3 FSCTL passthrough requests
      but once compounding fsctls is fixed can enable.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      31ba4331
    • S
      smb3: add dynamic tracepoints for simple fallocate and zero range · 779ede04
      Steve French 提交于
      Can be helpful in debugging various xfstests that are currently
      skipped or failing due to missing features in our current
      implementation of fallocate.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      779ede04
    • R
      cifs: fix smb3_zero_range so it can expand the file-size when required · 72c419d9
      Ronnie Sahlberg 提交于
      This allows fallocate -z to work against a Windows2016 share.
      
      This is due to the SMB3 ZERO_RANGE command does not modify the filesize.
      To address this we will now append a compounded SET-INFO to update the
      end-of-file information.
      
      This brings xfstests generic/469 closer to working against a windows share.
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      72c419d9
    • R
      cifs: add SMB2_ioctl_init/free helpers to be used with compounding · ccdc77a3
      Ronnie Sahlberg 提交于
      Define an _init() and a _free() function for SMB2_init so that we will
      be able to use it with compounds.
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      ccdc77a3
    • S
      smb3: Add dynamic trace points for various compounded smb3 ops · 8191576a
      Steve French 提交于
      Adds trace points for enter and exit (done vs. error) for:
      
      	compounded query and setinfo, hardlink, rename,
      	mkdir, rmdir, set_eof, delete (unlink)
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      8191576a
    • R
      cifs: cache FILE_ALL_INFO for the shared root handle · b0f6df73
      Ronnie Sahlberg 提交于
      When we open the shared root handle also ask for FILE_ALL_INFORMATION since
      we can do this at zero cost as part of a compound.
      Cache this information as long as the lease is held and return and serve any
      future requests from cache.
      
      This allows us to serve "stat /<mountpoint>" directly from cache and avoid
      a network roundtrip.  Since clients often want to do this quite a lot
      this improve performance slightly.
      
      As an example: xfstest generic/533 performs 43 stat operations on the root
      of the share while it is run. Which are eliminated with this patch.
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      b0f6df73
    • S
      smb3: display volume serial number for shares in /proc/fs/cifs/DebugData · ab7b10cf
      Steve French 提交于
      It can be helpful for debugging.  According to MS-FSCC:
      
      "A 32-bit unsigned integer that contains the serial number of the
      volume. The serial number is an opaque value generated by the file
      system at format time"
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Acked-by: NPavel Shilovsky <pshilov@microsoft.com>
      ab7b10cf