1. 08 2月, 2018 1 次提交
    • T
      bcache: add journal statistic · a728eacb
      Tang Junhui 提交于
      Sometimes, Journal takes up a lot of CPU, we need statistics
      to know what's the journal is doing. So this patch provide
      some journal statistics:
      1) reclaim: how many times the journal try to reclaim resource,
         usually the journal bucket or/and the pin are exhausted.
      2) flush_write: how many times the journal try to flush btree node
         to cache device, usually the journal bucket are exhausted.
      3) retry_flush_write: how many times the journal retry to flush
         the next btree node, usually the previous tree node have been
         flushed by other thread.
      we show these statistic by sysfs interface. Through these statistics
      We can totally see the status of journal module when the CPU is too
      high.
      Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a728eacb
  2. 07 2月, 2018 6 次提交
    • H
      block: Add should_fail_bio() for bpf error injection · 30abb3a6
      Howard McLauchlan 提交于
      The classic error injection mechanism, should_fail_request() does not
      support use cases where more information is required (from the entire
      struct bio, for example).
      
      To that end, this patch introduces should_fail_bio(), which calls
      should_fail_request() under the hood but provides a convenient
      place for kprobes to hook into if they require the entire struct bio.
      This patch also replaces some existing calls to should_fail_request()
      with should_fail_bio() with no degradation in performance.
      Signed-off-by: NHoward McLauchlan <hmclauchlan@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      30abb3a6
    • J
      blk-wbt: account flush requests correctly · 5235553d
      Jens Axboe 提交于
      Mikulas reported a workload that saw bad performance, and figured
      out what it was due to various other types of requests being
      accounted as reads. Flush requests, for instance. Due to the
      high latency of those, we heavily throttle the writes to keep
      the latencies in balance. But they really should be accounted
      as writes.
      
      Fix this by checking the exact type of the request. If it's a
      read, account as a read, if it's a write or a flush, account
      as a write. Any other request we disregard. Previously everything
      would have been mistakenly accounted as reads.
      Reported-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org # v4.12+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5235553d
    • L
      Merge tag 'media/v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 68c5735e
      Linus Torvalds 提交于
      Pull media updates from Mauro Carvalho Chehab:
      
       - videobuf2 was moved to a media/common dir, as it is now used by the
         DVB subsystem too
      
       - Digital TV core memory mapped support interface
      
       - new sensor driver: ov7740
      
       - several improvements at ddbridge driver
      
       - new V4L2 driver: IPU3 CIO2 CSI-2 receiver unit, found on some Intel
         SoCs
      
       - new tuner driver: tda18250
      
       - finally got rid of all LIRC staging drivers
      
       - as we don't have old lirc drivers anymore, restruct the lirc device
         code
      
       - add support for UVC metadata
      
       - add a new staging driver for NVIDIA Tegra Video Decoder Engine
      
       - DVB kAPI headers moved to include/media
      
       - synchronize the kAPI and uAPI for the DVB subsystem, removing the gap
         for non-legacy APIs
      
       - reduce the kAPI gap for V4L2
      
       - lots of other driver enhancements, cleanups, etc.
      
      * tag 'media/v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (407 commits)
        media: v4l2-compat-ioctl32.c: make ctrl_is_pointer work for subdevs
        media: v4l2-compat-ioctl32.c: refactor compat ioctl32 logic
        media: v4l2-compat-ioctl32.c: don't copy back the result for certain errors
        media: v4l2-compat-ioctl32.c: drop pr_info for unknown buffer type
        media: v4l2-compat-ioctl32.c: copy clip list in put_v4l2_window32
        media: v4l2-compat-ioctl32.c: fix ctrl_is_pointer
        media: v4l2-compat-ioctl32.c: copy m.userptr in put_v4l2_plane32
        media: v4l2-compat-ioctl32.c: avoid sizeof(type)
        media: v4l2-compat-ioctl32.c: move 'helper' functions to __get/put_v4l2_format32
        media: v4l2-compat-ioctl32.c: fix the indentation
        media: v4l2-compat-ioctl32.c: add missing VIDIOC_PREPARE_BUF
        media: v4l2-ioctl.c: don't copy back the result for -ENOTTY
        media: v4l2-ioctl.c: use check_fmt for enum/g/s/try_fmt
        media: vivid: fix module load error when enabling fb and no_error_inj=1
        media: dvb_demux: improve debug messages
        media: dvb_demux: Better handle discontinuity errors
        media: cxusb, dib0700: ignore XC2028_I2C_FLUSH
        media: ts2020: avoid integer overflows on 32 bit machines
        media: i2c: ov7740: use gpio/consumer.h instead of gpio.h
        media: entity: Add a nop variant of media_entity_cleanup
        ...
      68c5735e
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma · 2246edfa
      Linus Torvalds 提交于
      Pull more rdma updates from Doug Ledford:
       "Items of note:
      
         - two patches fix a regression in the 4.15 kernel. The 4.14 kernel
           worked fine with NVMe over Fabrics and mlx5 adapters. That broke in
           4.15. The fix is here.
      
         - one of the patches (the endian notation patch from Lijun) looks
           like a lot of lines of change, but it's mostly mechanical in
           nature. It amounts to the biggest chunk of change in it (it's about
           2/3rds of the overall pull request).
      
        Summary:
      
         - Clean up some function signatures in rxe for clarity
      
         - Tidy the RDMA netlink header to remove unimplemented constants
      
         - bnxt_re driver fixes, one is a regression this window.
      
         - Minor hns driver fixes
      
         - Various fixes from Dan Carpenter and his tool
      
         - Fix IRQ cleanup race in HFI1
      
         - HF1 performance optimizations and a fix to report counters in the right units
      
         - Fix for an IPoIB startup sequence race with the external manager
      
         - Oops fix for the new kabi path
      
         - Endian cleanups for hns
      
         - Fix for mlx5 related to the new automatic affinity support"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (38 commits)
        net/mlx5: increase async EQ to avoid EQ overrun
        mlx5: fix mlx5_get_vector_affinity to start from completion vector 0
        RDMA/hns: Fix the endian problem for hns
        IB/uverbs: Use the standard kConfig format for experimental
        IB: Update references to libibverbs
        IB/hfi1: Add 16B rcvhdr trace support
        IB/hfi1: Convert kzalloc_node and kcalloc to use kcalloc_node
        IB/core: Avoid a potential OOPs for an unused optional parameter
        IB/core: Map iWarp AH type to undefined in rdma_ah_find_type
        IB/ipoib: Fix for potential no-carrier state
        IB/hfi1: Show fault stats in both TX and RX directions
        IB/hfi1: Remove blind constants from 16B update
        IB/hfi1: Convert PortXmitWait/PortVLXmitWait counters to flit times
        IB/hfi1: Do not override given pcie_pset value
        IB/hfi1: Optimize process_receive_ib()
        IB/hfi1: Remove unnecessary fecn and becn fields
        IB/hfi1: Look up ibport using a pointer in receive path
        IB/hfi1: Optimize packet type comparison using 9B and bypass code paths
        IB/hfi1: Compute BTH only for RDMA_WRITE_LAST/SEND_LAST packet
        IB/hfi1: Remove dependence on qp->s_hdrwords
        ...
      2246edfa
    • L
      Merge tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 3ff1b28c
      Linus Torvalds 提交于
      Pull libnvdimm updates from Ross Zwisler:
      
       - Require struct page by default for filesystem DAX to remove a number
         of surprising failure cases. This includes failures with direct I/O,
         gdb and fork(2).
      
       - Add support for the new Platform Capabilities Structure added to the
         NFIT in ACPI 6.2a. This new table tells us whether the platform
         supports flushing of CPU and memory controller caches on unexpected
         power loss events.
      
       - Revamp vmem_altmap and dev_pagemap handling to clean up code and
         better support future future PCI P2P uses.
      
       - Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has
         become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL
         spec, and instead rely on the generic ND_CMD_CALL approach used by
         the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}.
      
       - Enhance nfit_test so we can test some of the new things added in
         version 1.6 of the DSM specification. This includes testing firmware
         download and simulating the Last Shutdown State (LSS) status.
      
      * tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits)
        libnvdimm, namespace: remove redundant initialization of 'nd_mapping'
        acpi, nfit: fix register dimm error handling
        libnvdimm, namespace: make min namespace size 4K
        tools/testing/nvdimm: force nfit_test to depend on instrumented modules
        libnvdimm/nfit_test: adding support for unit testing enable LSS status
        libnvdimm/nfit_test: add firmware download emulation
        nfit-test: Add platform cap support from ACPI 6.2a to test
        libnvdimm: expose platform persistence attribute for nd_region
        acpi: nfit: add persistent memory control flag for nd_region
        acpi: nfit: Add support for detect platform CPU cache flush on power loss
        device-dax: Fix trailing semicolon
        libnvdimm, btt: fix uninitialized err_lock
        dax: require 'struct page' by default for filesystem dax
        ext2: auto disable dax instead of failing mount
        ext4: auto disable dax instead of failing mount
        mm, dax: introduce pfn_t_special()
        mm: Fix devm_memremap_pages() collision handling
        mm: Fix memory size alignment in devm_memremap_pages_release()
        memremap: merge find_dev_pagemap into get_dev_pagemap
        memremap: change devm_memremap_pages interface to use struct dev_pagemap
        ...
      3ff1b28c
    • L
      Merge tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 105cf3c8
      Linus Torvalds 提交于
      Pull PCI updates from Bjorn Helgaas:
      
       - skip AER driver error recovery callbacks for correctable errors
         reported via ACPI APEI, as we already do for errors reported via the
         native path (Tyler Baicar)
      
       - fix DPC shared interrupt handling (Alex Williamson)
      
       - print full DPC interrupt number (Keith Busch)
      
       - enable DPC only if AER is available (Keith Busch)
      
       - simplify DPC code (Bjorn Helgaas)
      
       - calculate ASPM L1 substate parameter instead of hardcoding it (Bjorn
         Helgaas)
      
       - enable Latency Tolerance Reporting for ASPM L1 substates (Bjorn
         Helgaas)
      
       - move ASPM internal interfaces out of public header (Bjorn Helgaas)
      
       - allow hot-removal of VGA devices (Mika Westerberg)
      
       - speed up unplug and shutdown by assuming Thunderbolt controllers
         don't support Command Completed events (Lukas Wunner)
      
       - add AtomicOps support for GPU and Infiniband drivers (Felix Kuehling,
         Jay Cornwall)
      
       - expose "ari_enabled" in sysfs to help NIC naming (Stuart Hayes)
      
       - clean up PCI DMA interface usage (Christoph Hellwig)
      
       - remove PCI pool API (replaced with DMA pool) (Romain Perier)
      
       - deprecate pci_get_bus_and_slot(), which assumed PCI domain 0 (Sinan
         Kaya)
      
       - move DT PCI code from drivers/of/ to drivers/pci/ (Rob Herring)
      
       - add PCI-specific wrappers for dev_info(), etc (Frederick Lawler)
      
       - remove warnings on sysfs mmap failure (Bjorn Helgaas)
      
       - quiet ROM validation messages (Alex Deucher)
      
       - remove redundant memory alloc failure messages (Markus Elfring)
      
       - fill in types for compile-time VGA and other I/O port resources
         (Bjorn Helgaas)
      
       - make "pci=pcie_scan_all" work for Root Ports as well as Downstream
         Ports to help AmigaOne X1000 (Bjorn Helgaas)
      
       - add SPDX tags to all PCI files (Bjorn Helgaas)
      
       - quirk Marvell 9128 DMA aliases (Alex Williamson)
      
       - quirk broken INTx disable on Ceton InfiniTV4 (Bjorn Helgaas)
      
       - fix CONFIG_PCI=n build by adding dummy pci_irqd_intx_xlate() (Niklas
         Cassel)
      
       - use DMA API to get MSI address for DesignWare IP (Niklas Cassel)
      
       - fix endpoint-mode DMA mask configuration (Kishon Vijay Abraham I)
      
       - fix ARTPEC-6 incorrect IS_ERR() usage (Wei Yongjun)
      
       - add support for ARTPEC-7 SoC (Niklas Cassel)
      
       - add endpoint-mode support for ARTPEC (Niklas Cassel)
      
       - add Cadence PCIe host and endpoint controller driver (Cyrille
         Pitchen)
      
       - handle multiple INTx status bits being set in dra7xx (Vignesh R)
      
       - translate dra7xx hwirq range to fix INTD handling (Vignesh R)
      
       - remove deprecated Exynos PHY initialization code (Jaehoon Chung)
      
       - fix MSI erratum workaround for HiSilicon Hip06/Hip07 (Dongdong Liu)
      
       - fix NULL pointer dereference in iProc BCMA driver (Ray Jui)
      
       - fix Keystone interrupt-controller-node lookup (Johan Hovold)
      
       - constify qcom driver structures (Julia Lawall)
      
       - rework Tegra config space mapping to increase space available for
         endpoints (Vidya Sagar)
      
       - simplify Tegra driver by using bus->sysdata (Manikanta Maddireddy)
      
       - remove PCI_REASSIGN_ALL_BUS usage on Tegra (Manikanta Maddireddy)
      
       - add support for Global Fabric Manager Server (GFMS) event to
         Microsemi Switchtec switch driver (Logan Gunthorpe)
      
       - add IDs for Switchtec PSX 24xG3 and PSX 48xG3 (Kelvin Cao)
      
      * tag 'pci-v4.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
        PCI: cadence: Add EndPoint Controller driver for Cadence PCIe controller
        dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe endpoint controller
        PCI: endpoint: Fix EPF device name to support multi-function devices
        PCI: endpoint: Add the function number as argument to EPC ops
        PCI: cadence: Add host driver for Cadence PCIe controller
        dt-bindings: PCI: cadence: Add DT bindings for Cadence PCIe host controller
        PCI: Add vendor ID for Cadence
        PCI: Add generic function to probe PCI host controllers
        PCI: generic: fix missing call of pci_free_resource_list()
        PCI: OF: Add generic function to parse and allocate PCI resources
        PCI: Regroup all PCI related entries into drivers/pci/Makefile
        PCI/DPC: Reformat DPC register definitions
        PCI/DPC: Add and use DPC Status register field definitions
        PCI/DPC: Squash dpc_rp_pio_get_info() into dpc_process_rp_pio_error()
        PCI/DPC: Remove unnecessary RP PIO register structs
        PCI/DPC: Push dpc->rp_pio_status assignment into dpc_rp_pio_get_info()
        PCI/DPC: Squash dpc_rp_pio_print_error() into dpc_rp_pio_get_info()
        PCI/DPC: Make RP PIO log size check more generic
        PCI/DPC: Rename local "status" to "dpc_status"
        PCI/DPC: Squash dpc_rp_pio_print_tlp_header() into dpc_rp_pio_print_error()
        ...
      105cf3c8
  3. 06 2月, 2018 10 次提交
    • L
      Merge tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · e237f98a
      Linus Torvalds 提交于
      Pull more xfs updates from Darrick Wong:
       "As promised, here's a (much smaller) second pull request for the
        second week of the merge cycle. This time around we have a couple
        patches shutting off unsupported fs configurations, and a couple of
        cleanups.
      
        Last, we turn off EXPERIMENTAL for the reverse mapping btree, since
        the primary downstream user of that information (online fsck) is now
        upstream and I haven't seen any major failures in a few kernel
        releases.
      
        Summary:
      
         - Print scrub build status in the xfs build info.
      
         - Explicitly call out the remaining two scenarios where we don't
           support reflink and never have.
      
         - Remove EXPERIMENTAL tag from reverse mapping btree!"
      
      * tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: remove experimental tag for reverse mapping
        xfs: don't allow reflink + realtime filesystems
        xfs: don't allow DAX on reflink filesystems
        xfs: add scrub to XFS_BUILD_OPTIONS
        xfs: fix u32 type usage in sb validation function
      e237f98a
    • L
      Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs · 139351f1
      Linus Torvalds 提交于
      Pull overlayfs updates from Miklos Szeredi:
       "This work from Amir adds NFS export capability to overlayfs. NFS
        exporting an overlay filesystem is a challange because we want to keep
        track of any copy-up of a file or directory between encoding the file
        handle and decoding it.
      
        This is achieved by indexing copied up objects by lower layer file
        handle. The index is already used for hard links, this patchset
        extends the use to NFS file handle decoding"
      
      * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (51 commits)
        ovl: check ERR_PTR() return value from ovl_encode_fh()
        ovl: fix regression in fsnotify of overlay merge dir
        ovl: wire up NFS export operations
        ovl: lookup indexed ancestor of lower dir
        ovl: lookup connected ancestor of dir in inode cache
        ovl: hash non-indexed dir by upper inode for NFS export
        ovl: decode pure lower dir file handles
        ovl: decode indexed dir file handles
        ovl: decode lower file handles of unlinked but open files
        ovl: decode indexed non-dir file handles
        ovl: decode lower non-dir file handles
        ovl: encode lower file handles
        ovl: copy up before encoding non-connectable dir file handle
        ovl: encode non-indexed upper file handles
        ovl: decode connected upper dir file handles
        ovl: decode pure upper file handles
        ovl: encode pure upper file handles
        ovl: document NFS export
        vfs: factor out helpers d_instantiate_anon() and d_alloc_anon()
        ovl: store 'has_upper' and 'opaque' as bit flags
        ...
      139351f1
    • L
      Merge tag 'rproc-v4.16' of git://github.com/andersson/remoteproc · 2deb41b2
      Linus Torvalds 提交于
      Pull remoteproc updates from Bjorn Andersson:
       "This contains a few bug fixes and a cleanup up of the resource-table
        handling in the framework, which removes the need for drivers with no
        resource table to provide a fake one"
      
      * tag 'rproc-v4.16' of git://github.com/andersson/remoteproc:
        remoteproc: Reset table_ptr on stop
        remoteproc: Drop dangling find_rsc_table dummies
        remoteproc: Move resource table load logic to find
        remoteproc: Don't handle empty resource table
        remoteproc: Merge rproc_ops and rproc_fw_ops
        remoteproc: Clone rproc_ops in rproc_alloc()
        remoteproc: Cache resource table size
        remoteproc: Remove depricated crash completion
        virtio_remoteproc: correct put_device virtio_device.dev
      2deb41b2
    • L
      Merge tag 'rpmsg-v4.16' of git://github.com/andersson/remoteproc · 67fb3b92
      Linus Torvalds 提交于
      Pull rpmsg updates from Bjorn Andersson:
       "This fixes a few issues found in the SMD and GLINK drivers and
        corrects the handling of SMD channels that are found in an
        (previously) unexpected state"
      
      * tag 'rpmsg-v4.16' of git://github.com/andersson/remoteproc:
        rpmsg: smd: Fix double unlock in __qcom_smd_send()
        rpmsg: glink: Fix missing mutex_init() in qcom_glink_alloc_channel()
        rpmsg: smd: Don't hold the tx lock during wait
        rpmsg: smd: Fail send on a closed channel
        rpmsg: smd: Wake up all waiters
        rpmsg: smd: Create device for all channels
        rpmsg: smd: Perform handshake during open
        rpmsg: glink: smem: Ensure ordering during tx
        drivers: rpmsg: remove duplicate includes
        remoteproc: qcom: Use PTR_ERR_OR_ZERO() in glink prob
      67fb3b92
    • L
      Merge tag 'mmc-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · ae77c958
      Linus Torvalds 提交于
      Pull MMC host fixes from Ulf Hansson:
      
       - renesas_sdhi: Fix build error in case NO_DMA=y
      
       - sdhci: Implement a bounce buffer to address throughput regressions
      
      * tag 'mmc-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: MMC_SDHI_{SYS,INTERNAL}_DMAC should depend on HAS_DMA
        mmc: sdhci: Implement an SDHCI-specific bounce buffer
      ae77c958
    • L
      Merge tag 'pwm/for-4.16-rc1' of... · 20f9aa22
      Linus Torvalds 提交于
      Merge tag 'pwm/for-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
      
      Pull pwm updates from Thierry Reding:
       "The Meson PWM controller driver gains support for the AXG series and a
        minor bug is fixed for the STMPE driver.
      
        To round things off, the class is now set for PWM channels exported
        via sysfs which allows non-root access, provided that the system has
        been configured accordingly"
      
      * tag 'pwm/for-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
        pwm: meson: Add clock source configuration for Meson-AXG
        dt-bindings: pwm: Update bindings for the Meson-AXG
        pwm: stmpe: Fix wrong register offset for hwpwm=2 case
        pwm: Set class for exported channels in sysfs
      20f9aa22
    • T
      net: mediatek: Explicitly include pinctrl headers · 140995c9
      Thierry Reding 提交于
      The Mediatek ethernet driver fails to build after commit 23c35f48
      ("pinctrl: remove include file from <linux/device.h>") because it relies
      on the pinctrl/consumer.h and pinctrl/devinfo.h being pulled in by the
      device.h header implicitly.
      
      Include these headers explicitly to avoid the build failure.
      
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      140995c9
    • T
      mmc: meson-gx-mmc: Explicitly include pinctr/consumer.h · 8fb572ac
      Thierry Reding 提交于
      The Meson GX MMC driver fails to build after commit 23c35f48
      ("pinctrl: remove include file from <linux/device.h>") because it relies
      on the pinctrl/consumer.h being pulled in by the device.h header
      implicitly.
      
      Include the header explicitly to avoid the build failure.
      
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8fb572ac
    • T
      drm/rockchip: lvds: Explicitly include pinctrl headers · 1c16a9ce
      Thierry Reding 提交于
      The Rockchip LVDS driver fails to build after commit 23c35f48
      ("pinctrl: remove include file from <linux/device.h>") because it relies
      on the pinctrl/consumer.h and pinctrl/devinfo.h being pulled in by the
      device.h header implicitly.
      
      Include these headers explicitly to avoid the build failure.
      
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c16a9ce
    • S
      pinctrl: files should directly include apis they use · 567af7fc
      Stephen Rothwell 提交于
      Fixes: 23c35f48 ("pinctrl: remove include file from <linux/device.h>")
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      567af7fc
  4. 05 2月, 2018 15 次提交
    • M
      net/mlx5: increase async EQ to avoid EQ overrun · 03ecdd2d
      Max Gurtovoy 提交于
      Currently the async EQ has 256 entries only. It might not be big enough
      for the SW to handle all the needed pending events. For example, in case
      of many QPs (let's say 1024) connected to a SRQ created using NVMeOF target
      and the target goes down, the FW will raise 1024 "last WQE reached" events
      and may cause EQ overrun. Increase the EQ to more reasonable size, that beyond
      it the FW should be able to delay the event and raise it later on using internal
      backpressure mechanism.
      Signed-off-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      03ecdd2d
    • S
      mlx5: fix mlx5_get_vector_affinity to start from completion vector 0 · 2572cf57
      Sagi Grimberg 提交于
      The consumers of this routine expects the affinity map of of vector
      index relative to the first completion vector. The upper layers are
      not aware of internal/private completion vectors that mlx5 allocates
      for its own usage.
      
      Hence, return the affinity map of vector index relative to the first
      completion vector.
      
      Fixes: 05e0cc84 ("net/mlx5: Fix get vector affinity helper function")
      Reported-by: NLogan Gunthorpe <logang@deltatee.com>
      Tested-by: NMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
      Cc: <stable@vger.kernel.org> # v4.15
      Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      2572cf57
    • O
      RDMA/hns: Fix the endian problem for hns · 8b9b8d14
      oulijun 提交于
      The hip06 and hip08 run on a little endian ARM, it needs to
      revise the annotations to indicate that the HW uses little
      endian data in the various DMA buffers, and flow the necessary
      swaps throughout.
      
      The imm_data use big endian mode. The cpu_to_le32/le32_to_cpu
      swaps are no-op for this, which makes the only substantive
      change the handling of imm_data which is now mandatory swapped.
      
      This also keep match with the userspace hns driver and resolve
      the warning by sparse.
      Signed-off-by: NLijun Ou <oulijun@huawei.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      8b9b8d14
    • A
      ovl: check ERR_PTR() return value from ovl_encode_fh() · 9b6faee0
      Amir Goldstein 提交于
      Another fix for an issue reported by 0-day robot.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Fixes: 8ed5eec9 ("ovl: encode pure upper file handles")
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      9b6faee0
    • A
      ovl: fix regression in fsnotify of overlay merge dir · 2aed489d
      Amir Goldstein 提交于
      A re-factoring patch in NFS export series has passed the wrong argument
      to ovl_get_inode() causing a regression in the very recent fix to
      fsnotify of overlay merge dir.
      
      The regression has caused merge directory inodes to be hashed by upper
      instead of lower real inode, when NFS export and directory indexing is
      disabled. That caused an inotify watch to become obsolete after directory
      copy up and drop caches.
      
      LTP test inotify07 was improved to catch this regression.
      The regression also caused multiple redirect dirs to same origin not to
      be detected on lookup with NFS export disabled. An xfstest was added to
      cover this case.
      
      Fixes: 0aceb53e ("ovl: do not pass overlay dentry to ovl_get_inode()")
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2aed489d
    • L
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 35277995
      Linus Torvalds 提交于
      Pull spectre/meltdown updates from Thomas Gleixner:
       "The next round of updates related to melted spectrum:
      
         - The initial set of spectre V1 mitigations:
      
             - Array index speculation blocker and its usage for syscall,
               fdtable and the n180211 driver.
      
             - Speculation barrier and its usage in user access functions
      
         - Make indirect calls in KVM speculation safe
      
         - Blacklisting of known to be broken microcodes so IPBP/IBSR are not
           touched.
      
         - The initial IBPB support and its usage in context switch
      
         - The exposure of the new speculation MSRs to KVM guests.
      
         - A fix for a regression in x86/32 related to the cpu entry area
      
         - Proper whitelisting for known to be safe CPUs from the mitigations.
      
         - objtool fixes to deal proper with retpolines and alternatives
      
         - Exclude __init functions from retpolines which speeds up the boot
           process.
      
         - Removal of the syscall64 fast path and related cleanups and
           simplifications
      
         - Removal of the unpatched paravirt mode which is yet another source
           of indirect unproteced calls.
      
         - A new and undisputed version of the module mismatch warning
      
         - A couple of cleanup and correctness fixes all over the place
      
        Yet another step towards full mitigation. There are a few things still
        missing like the RBS underflow mitigation for Skylake and other small
        details, but that's being worked on.
      
        That said, I'm taking a belated christmas vacation for a week and hope
        that everything is magically solved when I'm back on Feb 12th"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
        KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
        KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
        KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
        KVM/x86: Add IBPB support
        KVM/x86: Update the reverse_cpuid list to include CPUID_7_EDX
        x86/speculation: Fix typo IBRS_ATT, which should be IBRS_ALL
        x86/pti: Mark constant arrays as __initconst
        x86/spectre: Simplify spectre_v2 command line parsing
        x86/retpoline: Avoid retpolines for built-in __init functions
        x86/kvm: Update spectre-v1 mitigation
        KVM: VMX: make MSR bitmaps per-VCPU
        x86/paravirt: Remove 'noreplace-paravirt' cmdline option
        x86/speculation: Use Indirect Branch Prediction Barrier in context switch
        x86/cpuid: Fix up "virtual" IBRS/IBPB/STIBP feature bits on Intel
        x86/spectre: Fix spelling mistake: "vunerable"-> "vulnerable"
        x86/spectre: Report get_user mitigation for spectre_v1
        nl80211: Sanitize array index in parse_txq_params
        vfs, fdtable: Prevent bounds-check bypass via speculative execution
        x86/syscall: Sanitize syscall table de-references under speculation
        x86/get_user: Use pointer masking to limit speculation
        ...
      35277995
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0a646e9c
      Linus Torvalds 提交于
      Pull x86 fixes from Thomas Gleixner:
       "A small set of changes:
      
         - a fixup for kexec related to 5-level paging mode. That covers most
           of the cases except kexec from a 5-level kernel to a 4-level
           kernel. The latter needs more work and is going to come in 4.17
      
         - two trivial fixes for build warnings triggered by LTO and gcc-8"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/power: Fix swsusp_arch_resume prototype
        x86/dumpstack: Avoid uninitlized variable
        x86/kexec: Make kexec (mostly) work in 5-level paging mode
      0a646e9c
    • L
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f74a127f
      Linus Torvalds 提交于
      Pull irq fixes from Thomas Gleixner:
       "Two small changes:
      
         - a fix for a interrupt regression caused by the vector management
           changes in 4.15 affecting museum pieces which rely on interrupt
           probing for legacy (e.g. parallel port) devices.
      
           One of the startup calls in the autoprobe code was not changed to
           the new activate_and_startup() function resulting in a warning and
           as a consequence failing to discover the device interrupt.
      
         - a trivial update to the copyright/license header of the STM32 irq
           chip driver"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq: Make legacy autoprobing work again
        irqchip/stm32: Fix copyright
      f74a127f
    • L
      Merge tag 'for-linus-20180204' of git://git.kernel.dk/linux-block · 64b28683
      Linus Torvalds 提交于
      Pull more block updates from Jens Axboe:
       "Most of this is fixes and not new code/features:
      
         - skd fix from Arnd, fixing a build error dependent on sla allocator
           type.
      
         - blk-mq scheduler discard merging fixes, one from me and one from
           Keith. This fixes a segment miscalculation for blk-mq-sched, where
           we mistakenly think two segments are physically contigious even
           though the request isn't carrying real data. Also fixes a bio-to-rq
           merge case.
      
         - Don't re-set a bit on the buffer_head flags, if it's already set.
           This can cause scalability concerns on bigger machines and
           workloads. From Kemi Wang.
      
         - Add BLK_STS_DEV_RESOURCE return value to blk-mq, allowing us to
           distuingish between a local (device related) resource starvation
           and a global one. The latter might happen without IO being in
           flight, so it has to be handled a bit differently. From Ming"
      
      * tag 'for-linus-20180204' of git://git.kernel.dk/linux-block:
        block: skd: fix incorrect linux/slab_def.h inclusion
        buffer: Avoid setting buffer bits that are already set
        blk-mq-sched: Enable merging discard bio into request
        blk-mq: fix discard merge with scheduler attached
        blk-mq: introduce BLK_STS_DEV_RESOURCE
      64b28683
    • L
      Merge tag 'ntb-4.16' of git://github.com/jonmason/ntb · d3658c22
      Linus Torvalds 提交于
      Pull NTB updates from Jon Mason:
       "Bug fixes galore, removal of the ntb atom driver, and updates to the
        ntb tools and tests to support the multi-port interface"
      
      * tag 'ntb-4.16' of git://github.com/jonmason/ntb: (37 commits)
        NTB: ntb_perf: fix cast to restricted __le32
        ntb_perf: Fix an error code in perf_copy_chunk()
        ntb_hw_switchtec: Make function switchtec_ntb_remove() static
        NTB: ntb_tool: fix memory leak on 'buf' on error exit path
        NTB: ntb_perf: fix printing of resource_size_t
        NTB: ntb_hw_idt: Set NTB_TOPO_SWITCH topology
        NTB: ntb_test: Update ntb_perf tests
        NTB: ntb_test: Update ntb_tool MW tests
        NTB: ntb_test: Add ntb_tool Message tests
        NTB: ntb_test: Update ntb_tool Scratchpad tests
        NTB: ntb_test: Update ntb_tool DB tests
        NTB: ntb_test: Update ntb_tool link tests
        NTB: ntb_test: Add ntb_tool port tests
        NTB: ntb_test: Safely use paths with whitespace
        NTB: ntb_perf: Add full multi-port NTB API support
        NTB: ntb_tool: Add full multi-port NTB API support
        NTB: ntb_pp: Add full multi-port NTB API support
        NTB: Fix UB/bug in ntb_mw_get_align()
        NTB: Set dma mask and dma coherent mask to NTB devices
        NTB: Rename NTB messaging API methods
        ...
      d3658c22
    • L
      Merge tag 'mailbox-v4.16' of git://git.linaro.org/landing-teams/working/fujitsu/integration · 8ac4840a
      Linus Torvalds 提交于
      Pull mailbox updates from Jassi Brar:
       "Misc driver changes only:
      
         - TI-MsgMgr: Fix print format for a printk
      
         - TI-MSgMgr: SPDX license switch for the driver
      
         - QCOM-IPC: Convert driver to use regmap
      
         - QCOM-IPC: Spawn sibling clock device from mailbox driver"
      
      * tag 'mailbox-v4.16' of git://git.linaro.org/landing-teams/working/fujitsu/integration:
        dt-bindings: mailbox: qcom: Document the APCS clock binding
        mailbox: qcom: Create APCS child device for clock controller
        mailbox: qcom: Convert APCS IPC driver to use regmap
        mailbox: ti-msgmgr: Use %zu for size_t print format
        mailbox: ti-msgmgr: Switch to SPDX Licensing
      8ac4840a
    • L
      Merge branch 'i2c/for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 4141cf67
      Linus Torvalds 提交于
      Pull i2c updates from Wolfram Sang:
       "I2C has the following changes for you:
      
         - new flag to mark DMA safe buffers in i2c_msg. Also, some
           infrastructure around it. And docs.
      
         - huge refactoring of the at24 driver led by the new maintainer
           Bartosz
      
         - update I2C bus recovery to send STOP after recovery
      
         - conversion from gpio to gpiod for I2C bus recovery
      
         - adding a fault-injector to the i2c-gpio driver
      
         - lots of small driver improvements, and bigger ones to
           i2c-sh_mobile"
      
      * 'i2c/for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (99 commits)
        i2c: mv64xxx: Add myself as maintainer for this driver
        i2c: mv64xxx: Fix clock resource by adding an optional bus clock
        i2c: mv64xxx: Remove useless test before clk_disable_unprepare
        i2c: mxs: use true and false for boolean values
        i2c: meson: update doc description to fix build warnings
        i2c: meson: add configurable divider factors
        dt-bindings: i2c: update documentation for the Meson-AXG
        i2c: imx-lpi2c: add runtime pm support
        i2c: rcar: fix some trivial typos in comments
        i2c: davinci: fix the cpufreq transition
        i2c: rk3x: add proper kerneldoc header
        i2c: rk3x: account for const type of of_device_id.data
        i2c: acorn: remove outdated path from file header
        i2c: acorn: add MODULE_LICENSE tag
        i2c: rcar: implement bus recovery
        i2c: send STOP after successful bus recovery
        i2c: ensure SDA is released in recovery if SDA is controllable
        i2c: add 'set_sda' to bus_recovery_info
        i2c: add identifier in declarations for i2c_bus_recovery
        i2c: make kerneldoc about bus recovery more precise
        ...
      4141cf67
    • L
      Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt · 3462ac57
      Linus Torvalds 提交于
      Pull fscrypt updates from Ted Ts'o:
       "Refactor support for encrypted symlinks to move common code to fscrypt"
      
      Ted also points out about the merge:
       "This makes the f2fs symlink code use the fscrypt_encrypt_symlink()
        from the fscrypt tree. This will end up dropping the kzalloc() ->
        f2fs_kzalloc() change, which means the fscrypt-specific allocation
        won't get tested by f2fs's kmalloc error injection system; which is
        fine"
      
      * tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: (26 commits)
        fscrypt: fix build with pre-4.6 gcc versions
        fscrypt: remove 'ci' parameter from fscrypt_put_encryption_info()
        fscrypt: document symlink length restriction
        fscrypt: fix up fscrypt_fname_encrypted_size() for internal use
        fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names
        fscrypt: calculate NUL-padding length in one place only
        fscrypt: move fscrypt_symlink_data to fscrypt_private.h
        fscrypt: remove fscrypt_fname_usr_to_disk()
        ubifs: switch to fscrypt_get_symlink()
        ubifs: switch to fscrypt ->symlink() helper functions
        ubifs: free the encrypted symlink target
        f2fs: switch to fscrypt_get_symlink()
        f2fs: switch to fscrypt ->symlink() helper functions
        ext4: switch to fscrypt_get_symlink()
        ext4: switch to fscrypt ->symlink() helper functions
        fscrypt: new helper function - fscrypt_get_symlink()
        fscrypt: new helper functions for ->symlink()
        fscrypt: trim down fscrypt.h includes
        fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c
        fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h
        ...
      3462ac57
    • J
      IB/uverbs: Use the standard kConfig format for experimental · e9d1e389
      Jason Gunthorpe 提交于
      We really don't want people turning this on just yet, make it very
      clear with capital letters.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      e9d1e389
    • J
      IB: Update references to libibverbs · 46adb179
      Jason Gunthorpe 提交于
      These days the userspace comes from rdma-core, revise references
      in the kernel to point to the current repository.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      46adb179
  5. 04 2月, 2018 8 次提交
    • G
      dt-bindings: mailbox: qcom: Document the APCS clock binding · 0ae7d327
      Georgi Djakov 提交于
      Update the binding documentation for APCS to mention that the APCS
      hardware block also expose a clock controller functionality.
      
      The APCS clock controller is a mux and half-integer divider. It has the
      main CPU PLL as an input and provides the clock for the application CPU.
      Signed-off-by: NGeorgi Djakov <georgi.djakov@linaro.org>
      Reviewed-by: NRob Herring <robh@kernel.org>
      Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: NJassi Brar <jaswinder.singh@linaro.org>
      0ae7d327
    • G
      mailbox: qcom: Create APCS child device for clock controller · c815d769
      Georgi Djakov 提交于
      There is a clock controller functionality provided by the APCS hardware
      block of msm8916 devices. The device-tree would represent an APCS node
      with both mailbox and clock provider properties.
      Create a platform child device for the clock controller functionality so
      the driver can probe and use APCS as parent.
      Signed-off-by: NGeorgi Djakov <georgi.djakov@linaro.org>
      Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: NJassi Brar <jaswinder.singh@linaro.org>
      c815d769
    • G
      mailbox: qcom: Convert APCS IPC driver to use regmap · c6a8b171
      Georgi Djakov 提交于
      This hardware block provides more functionalities that just IPC. Convert
      it to regmap to allow other child platform devices to use the same regmap.
      Signed-off-by: NGeorgi Djakov <georgi.djakov@linaro.org>
      Acked-by: NBjorn Andersson <bjorn.andersson@linaro.org>
      Signed-off-by: NJassi Brar <jaswinder.singh@linaro.org>
      c6a8b171
    • L
      Merge tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 617aebe6
      Linus Torvalds 提交于
      Pull hardened usercopy whitelisting from Kees Cook:
       "Currently, hardened usercopy performs dynamic bounds checking on slab
        cache objects. This is good, but still leaves a lot of kernel memory
        available to be copied to/from userspace in the face of bugs.
      
        To further restrict what memory is available for copying, this creates
        a way to whitelist specific areas of a given slab cache object for
        copying to/from userspace, allowing much finer granularity of access
        control.
      
        Slab caches that are never exposed to userspace can declare no
        whitelist for their objects, thereby keeping them unavailable to
        userspace via dynamic copy operations. (Note, an implicit form of
        whitelisting is the use of constant sizes in usercopy operations and
        get_user()/put_user(); these bypass all hardened usercopy checks since
        these sizes cannot change at runtime.)
      
        This new check is WARN-by-default, so any mistakes can be found over
        the next several releases without breaking anyone's system.
      
        The series has roughly the following sections:
         - remove %p and improve reporting with offset
         - prepare infrastructure and whitelist kmalloc
         - update VFS subsystem with whitelists
         - update SCSI subsystem with whitelists
         - update network subsystem with whitelists
         - update process memory with whitelists
         - update per-architecture thread_struct with whitelists
         - update KVM with whitelists and fix ioctl bug
         - mark all other allocations as not whitelisted
         - update lkdtm for more sensible test overage"
      
      * tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits)
        lkdtm: Update usercopy tests for whitelisting
        usercopy: Restrict non-usercopy caches to size 0
        kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
        kvm: whitelist struct kvm_vcpu_arch
        arm: Implement thread_struct whitelist for hardened usercopy
        arm64: Implement thread_struct whitelist for hardened usercopy
        x86: Implement thread_struct whitelist for hardened usercopy
        fork: Provide usercopy whitelisting for task_struct
        fork: Define usercopy region in thread_stack slab caches
        fork: Define usercopy region in mm_struct slab caches
        net: Restrict unwhitelisted proto caches to size 0
        sctp: Copy struct sctp_sock.autoclose to userspace using put_user()
        sctp: Define usercopy region in SCTP proto slab cache
        caif: Define usercopy region in caif proto slab cache
        ip: Define usercopy region in IP proto slab cache
        net: Define usercopy region in struct proto slab cache
        scsi: Define usercopy region in scsi_sense_cache slab cache
        cifs: Define usercopy region in cifs_request slab cache
        vxfs: Define usercopy region in vxfs_inode slab cache
        ufs: Define usercopy region in ufs_inode_cache slab cache
        ...
      617aebe6
    • K
      KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL · b2ac58f9
      KarimAllah Ahmed 提交于
      [ Based on a patch from Paolo Bonzini <pbonzini@redhat.com> ]
      
      ... basically doing exactly what we do for VMX:
      
      - Passthrough SPEC_CTRL to guests (if enabled in guest CPUID)
      - Save and restore SPEC_CTRL around VMExit and VMEntry only if the guest
        actually used it.
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      Cc: kvm@vger.kernel.org
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Link: https://lkml.kernel.org/r/1517669783-20732-1-git-send-email-karahmed@amazon.de
      b2ac58f9
    • K
      KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL · d28b387f
      KarimAllah Ahmed 提交于
      [ Based on a patch from Ashok Raj <ashok.raj@intel.com> ]
      
      Add direct access to MSR_IA32_SPEC_CTRL for guests. This is needed for
      guests that will only mitigate Spectre V2 through IBRS+IBPB and will not
      be using a retpoline+IBPB based approach.
      
      To avoid the overhead of saving and restoring the MSR_IA32_SPEC_CTRL for
      guests that do not actually use the MSR, only start saving and restoring
      when a non-zero is written to it.
      
      No attempt is made to handle STIBP here, intentionally. Filtering STIBP
      may be added in a future patch, which may require trapping all writes
      if we don't want to pass it through directly to the guest.
      
      [dwmw2: Clean up CPUID bits, save/restore manually, handle reset]
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      Cc: kvm@vger.kernel.org
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Link: https://lkml.kernel.org/r/1517522386-18410-5-git-send-email-karahmed@amazon.de
      d28b387f
    • K
      KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES · 28c1c9fa
      KarimAllah Ahmed 提交于
      Intel processors use MSR_IA32_ARCH_CAPABILITIES MSR to indicate RDCL_NO
      (bit 0) and IBRS_ALL (bit 1). This is a read-only MSR. By default the
      contents will come directly from the hardware, but user-space can still
      override it.
      
      [dwmw2: The bit in kvm_cpuid_7_0_edx_x86_features can be unconditional]
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      Cc: kvm@vger.kernel.org
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Link: https://lkml.kernel.org/r/1517522386-18410-4-git-send-email-karahmed@amazon.de
      28c1c9fa
    • A
      KVM/x86: Add IBPB support · 15d45071
      Ashok Raj 提交于
      The Indirect Branch Predictor Barrier (IBPB) is an indirect branch
      control mechanism. It keeps earlier branches from influencing
      later ones.
      
      Unlike IBRS and STIBP, IBPB does not define a new mode of operation.
      It's a command that ensures predicted branch targets aren't used after
      the barrier. Although IBRS and IBPB are enumerated by the same CPUID
      enumeration, IBPB is very different.
      
      IBPB helps mitigate against three potential attacks:
      
      * Mitigate guests from being attacked by other guests.
        - This is addressed by issing IBPB when we do a guest switch.
      
      * Mitigate attacks from guest/ring3->host/ring3.
        These would require a IBPB during context switch in host, or after
        VMEXIT. The host process has two ways to mitigate
        - Either it can be compiled with retpoline
        - If its going through context switch, and has set !dumpable then
          there is a IBPB in that path.
          (Tim's patch: https://patchwork.kernel.org/patch/10192871)
        - The case where after a VMEXIT you return back to Qemu might make
          Qemu attackable from guest when Qemu isn't compiled with retpoline.
        There are issues reported when doing IBPB on every VMEXIT that resulted
        in some tsc calibration woes in guest.
      
      * Mitigate guest/ring0->host/ring0 attacks.
        When host kernel is using retpoline it is safe against these attacks.
        If host kernel isn't using retpoline we might need to do a IBPB flush on
        every VMEXIT.
      
      Even when using retpoline for indirect calls, in certain conditions 'ret'
      can use the BTB on Skylake-era CPUs. There are other mitigations
      available like RSB stuffing/clearing.
      
      * IBPB is issued only for SVM during svm_free_vcpu().
        VMX has a vmclear and SVM doesn't.  Follow discussion here:
        https://lkml.org/lkml/2018/1/15/146
      
      Please refer to the following spec for more details on the enumeration
      and control.
      
      Refer here to get documentation about mitigations.
      
      https://software.intel.com/en-us/side-channel-security-support
      
      [peterz: rebase and changelog rewrite]
      [karahmed: - rebase
                 - vmx: expose PRED_CMD if guest has it in CPUID
                 - svm: only pass through IBPB if guest has it in CPUID
                 - vmx: support !cpu_has_vmx_msr_bitmap()]
                 - vmx: support nested]
      [dwmw2: Expose CPUID bit too (AMD IBPB only for now as we lack IBRS)
              PRED_CMD is a write-only MSR]
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: kvm@vger.kernel.org
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Arjan Van De Ven <arjan.van.de.ven@intel.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Jun Nakajima <jun.nakajima@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Link: http://lkml.kernel.org/r/1515720739-43819-6-git-send-email-ashok.raj@intel.com
      Link: https://lkml.kernel.org/r/1517522386-18410-3-git-send-email-karahmed@amazon.de
      15d45071