1. 07 1月, 2020 5 次提交
    • L
      iommu/vt-d: debugfs: Add support to show page table internals · e2726dae
      Lu Baolu 提交于
      Export page table internals of the domain attached to each device.
      Example of such dump on a Skylake machine:
      
      $ sudo cat /sys/kernel/debug/iommu/intel/domain_translation_struct
      [ ... ]
      Device 0000:00:14.0 with pasid 0 @0x15f3d9000
      IOVA_PFN                PML5E                   PML4E
      0x000000008ced0 |       0x0000000000000000      0x000000015f3da003
      0x000000008ced1 |       0x0000000000000000      0x000000015f3da003
      0x000000008ced2 |       0x0000000000000000      0x000000015f3da003
      0x000000008ced3 |       0x0000000000000000      0x000000015f3da003
      0x000000008ced4 |       0x0000000000000000      0x000000015f3da003
      0x000000008ced5 |       0x0000000000000000      0x000000015f3da003
      0x000000008ced6 |       0x0000000000000000      0x000000015f3da003
      0x000000008ced7 |       0x0000000000000000      0x000000015f3da003
      0x000000008ced8 |       0x0000000000000000      0x000000015f3da003
      0x000000008ced9 |       0x0000000000000000      0x000000015f3da003
      
      PDPE                    PDE                     PTE
      0x000000015f3db003      0x000000015f3dc003      0x000000008ced0003
      0x000000015f3db003      0x000000015f3dc003      0x000000008ced1003
      0x000000015f3db003      0x000000015f3dc003      0x000000008ced2003
      0x000000015f3db003      0x000000015f3dc003      0x000000008ced3003
      0x000000015f3db003      0x000000015f3dc003      0x000000008ced4003
      0x000000015f3db003      0x000000015f3dc003      0x000000008ced5003
      0x000000015f3db003      0x000000015f3dc003      0x000000008ced6003
      0x000000015f3db003      0x000000015f3dc003      0x000000008ced7003
      0x000000015f3db003      0x000000015f3dc003      0x000000008ced8003
      0x000000015f3db003      0x000000015f3dc003      0x000000008ced9003
      [ ... ]
      Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      e2726dae
    • L
      iommu/vt-d: Flush PASID-based iotlb for iova over first level · 33cd6e64
      Lu Baolu 提交于
      When software has changed first-level tables, it should invalidate
      the affected IOTLB and the paging-structure-caches using the PASID-
      based-IOTLB Invalidate Descriptor defined in spec 6.5.2.4.
      Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      33cd6e64
    • L
      iommu/vt-d: Setup pasid entries for iova over first level · ddf09b6d
      Lu Baolu 提交于
      Intel VT-d in scalable mode supports two types of page tables for
      IOVA translation: first level and second level. The IOMMU driver
      can choose one from both for IOVA translation according to the use
      case. This sets up the pasid entry if a domain is selected to use
      the first-level page table for iova translation.
      Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      ddf09b6d
    • L
      iommu/vt-d: trace: Extend map_sg trace event · 984d03ad
      Lu Baolu 提交于
      Current map_sg stores trace message in a coarse manner. This
      extends it so that more detailed messages could be traced.
      
      The map_sg trace message looks like:
      
      map_sg: dev=0000:00:17.0 [1/9] dev_addr=0xf8f90000 phys_addr=0x158051000 size=4096
      map_sg: dev=0000:00:17.0 [2/9] dev_addr=0xf8f91000 phys_addr=0x15a858000 size=4096
      map_sg: dev=0000:00:17.0 [3/9] dev_addr=0xf8f92000 phys_addr=0x15aa13000 size=4096
      map_sg: dev=0000:00:17.0 [4/9] dev_addr=0xf8f93000 phys_addr=0x1570f1000 size=8192
      map_sg: dev=0000:00:17.0 [5/9] dev_addr=0xf8f95000 phys_addr=0x15c6d0000 size=4096
      map_sg: dev=0000:00:17.0 [6/9] dev_addr=0xf8f96000 phys_addr=0x157194000 size=4096
      map_sg: dev=0000:00:17.0 [7/9] dev_addr=0xf8f97000 phys_addr=0x169552000 size=4096
      map_sg: dev=0000:00:17.0 [8/9] dev_addr=0xf8f98000 phys_addr=0x169dde000 size=4096
      map_sg: dev=0000:00:17.0 [9/9] dev_addr=0xf8f99000 phys_addr=0x148351000 size=4096
      Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      984d03ad
    • J
      iommu/vt-d: Fix CPU and IOMMU SVM feature matching checks · ff3dc652
      Jacob Pan 提交于
      Shared Virtual Memory(SVM) is based on a collective set of hardware
      features detected at runtime. There are requirements for matching CPU
      and IOMMU capabilities.
      
      The current code checks CPU and IOMMU feature set for SVM support but
      the result is never stored nor used. Therefore, SVM can still be used
      even when these checks failed. The consequences can be:
      1. CPU uses 5-level paging mode for virtual address of 57 bits, but
      IOMMU can only support 4-level paging mode with 48 bits address for DMA.
      2. 1GB page size is used by CPU but IOMMU does not support it. VT-d
      unrecoverable faults may be generated.
      
      The best solution to fix these problems is to prevent them in the first
      place.
      
      This patch consolidates code for checking PASID, CPU vs. IOMMU paging
      mode compatibility, as well as provides specific error messages for
      each failed checks. On sane hardware configurations, these error message
      shall never appear in kernel log.
      Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      ff3dc652
  2. 21 12月, 2019 3 次提交
    • G
      net: dst: Force 4-byte alignment of dst_metrics · 258a980d
      Geert Uytterhoeven 提交于
      When storing a pointer to a dst_metrics structure in dst_entry._metrics,
      two flags are added in the least significant bits of the pointer value.
      Hence this assumes all pointers to dst_metrics structures have at least
      4-byte alignment.
      
      However, on m68k, the minimum alignment of 32-bit values is 2 bytes, not
      4 bytes.  Hence in some kernel builds, dst_default_metrics may be only
      2-byte aligned, leading to obscure boot warnings like:
      
          WARNING: CPU: 0 PID: 7 at lib/refcount.c:28 refcount_warn_saturate+0x44/0x9a
          refcount_t: underflow; use-after-free.
          Modules linked in:
          CPU: 0 PID: 7 Comm: ksoftirqd/0 Tainted: G        W         5.5.0-rc2-atari-01448-g114a1a1038af891d-dirty #261
          Stack from 10835e6c:
      	    10835e6c 0038134f 00023fa6 00394b0f 0000001c 00000009 00321560 00023fea
      	    00394b0f 0000001c 001a70f8 00000009 00000000 10835eb4 00000001 00000000
      	    04208040 0000000a 00394b4a 10835ed4 00043aa8 001a70f8 00394b0f 0000001c
      	    00000009 00394b4a 0026aba8 003215a4 00000003 00000000 0026d5a8 00000001
      	    003215a4 003a4361 003238d6 000001f0 00000000 003215a4 10aa3b00 00025e84
      	    003ddb00 10834000 002416a8 10aa3b00 00000000 00000080 000aa038 0004854a
          Call Trace: [<00023fa6>] __warn+0xb2/0xb4
           [<00023fea>] warn_slowpath_fmt+0x42/0x64
           [<001a70f8>] refcount_warn_saturate+0x44/0x9a
           [<00043aa8>] printk+0x0/0x18
           [<001a70f8>] refcount_warn_saturate+0x44/0x9a
           [<0026aba8>] refcount_sub_and_test.constprop.73+0x38/0x3e
           [<0026d5a8>] ipv4_dst_destroy+0x5e/0x7e
           [<00025e84>] __local_bh_enable_ip+0x0/0x8e
           [<002416a8>] dst_destroy+0x40/0xae
      
      Fix this by forcing 4-byte alignment of all dst_metrics structures.
      
      Fixes: e5fd387a ("ipv6: do not overwrite inetpeer metrics prematurely")
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      258a980d
    • R
      net: phy: ensure that phy IDs are correctly typed · 7d49a32a
      Russell King 提交于
      PHY IDs are 32-bit unsigned quantities. Ensure that they are always
      treated as such, and not passed around as "int"s.
      
      Fixes: 13d0ab67 ("net: phy: check return code when requesting PHY driver module")
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d49a32a
    • R
      mod_devicetable: fix PHY module format · d2ed49cf
      Russell King 提交于
      When a PHY is probed, if the top bit is set, we end up requesting a
      module with the string "mdio:-10101110000000100101000101010001" -
      the top bit is printed to a signed -1 value. This leads to the module
      not being loaded.
      
      Fix the module format string and the macro generating the values for
      it to ensure that we only print unsigned types and the top bit is
      always 0/1. We correctly end up with
      "mdio:10101110000000100101000101010001".
      
      Fixes: 8626d3b4 ("phylib: Support phy module autoloading")
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2ed49cf
  3. 20 12月, 2019 3 次提交
    • P
      xen/interface: re-define FRONT/BACK_RING_ATTACH() · 1ee54195
      Paul Durrant 提交于
      Currently these macros are defined to re-initialize a front/back ring
      (respectively) to values read from the shared ring in such a way that any
      requests/responses that are added to the shared ring whilst the front/back
      is detached will be skipped over. This, in general, is not a desirable
      semantic since most frontend implementations will eventually block waiting
      for a response which would either never appear or never be processed.
      
      Since the macros are currently unused, take this opportunity to re-define
      them to re-initialize a front/back ring using specified values. This also
      allows FRONT/BACK_RING_INIT() to be re-defined in terms of
      FRONT/BACK_RING_ATTACH() using a specified value of 0.
      
      NOTE: BACK_RING_ATTACH() will be used directly in a subsequent patch.
      Signed-off-by: NPaul Durrant <pdurrant@amazon.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      1ee54195
    • P
      xenbus: limit when state is forced to closed · 672b7763
      Paul Durrant 提交于
      If a driver probe() fails then leave the xenstore state alone. There is no
      reason to modify it as the failure may be due to transient resource
      allocation issues and hence a subsequent probe() may succeed.
      
      If the driver supports re-binding then only force state to closed during
      remove() only in the case when the toolstack may need to clean up. This can
      be detected by checking whether the state in xenstore has been set to
      closing prior to device removal.
      
      NOTE: Re-bind support is indicated by new boolean in struct xenbus_driver,
            which defaults to false. Subsequent patches will add support to
            some backend drivers.
      Signed-off-by: NPaul Durrant <pdurrant@amazon.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      672b7763
    • A
      of: mdio: export of_mdiobus_child_is_phy · 0aa4d016
      Antoine Tenart 提交于
      This patch exports of_mdiobus_child_is_phy, allowing to check if a child
      node is a network PHY.
      Signed-off-by: NAntoine Tenart <antoine.tenart@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0aa4d016
  4. 18 12月, 2019 5 次提交
  5. 17 12月, 2019 2 次提交
  6. 16 12月, 2019 1 次提交
  7. 14 12月, 2019 2 次提交
  8. 13 12月, 2019 6 次提交
    • T
      mac80211: Turn AQL into an NL80211_EXT_FEATURE · 911bde0f
      Toke Høiland-Jørgensen 提交于
      Instead of just having an airtime flag in debugfs, turn AQL into a proper
      NL80211_EXT_FEATURE, so drivers can turn it on when they are ready, and so
      we also expose the presence of the feature to userspace.
      
      This also has the effect of flipping the default, so drivers have to opt in
      to using AQL instead of getting it by default with TXQs. To keep
      functionality the same as pre-patch, we set this feature for ath10k (which
      is where it is needed the most).
      
      While we're at it, split out the debugfs interface so AQL gets its own
      per-station debugfs file instead of using the 'airtime' file.
      
      [Johannes:]
      This effectively disables AQL for iwlwifi, where it fixes a number of
      issues:
       * TSO in iwlwifi is causing underflows and associated warnings in AQL
       * HE (802.11ax) rates aren't reported properly so at HE rates, AQL could
         never have a valid estimate (it'd use 6 Mbps instead of up to 2400!)
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/r/20191212111437.224294-1-toke@redhat.com
      Fixes: 3ace10f5 ("mac80211: Implement Airtime-based Queue Limit (AQL)")
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      911bde0f
    • Y
      IB/core: Introduce rdma_user_mmap_entry_insert_range() API · 7a763d18
      Yishai Hadas 提交于
      Introduce rdma_user_mmap_entry_insert_range() API to be used once the
      required key for the given entry should be in a given range.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Link: https://lore.kernel.org/r/20191212100237.330654-2-leon@kernel.orgSigned-off-by: NDoug Ledford <dledford@redhat.com>
      7a763d18
    • D
      fs: remove ksys_dup() · 8243186f
      Dominik Brodowski 提交于
      ksys_dup() is used only at one place in the kernel, namely to duplicate
      fd 0 of /dev/console to stdout and stderr. The same functionality can be
      achieved by using functions already available within the kernel namespace.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      8243186f
    • D
      init: unify opening /dev/console as stdin/stdout/stderr · b49a733d
      Dominik Brodowski 提交于
      Merge the two instances where /dev/console is opened as
      stdin/stdout/stderr.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      b49a733d
    • R
      cpufreq: Avoid leaving stale IRQ work items during CPU offline · 85572c2c
      Rafael J. Wysocki 提交于
      The scheduler code calling cpufreq_update_util() may run during CPU
      offline on the target CPU after the IRQ work lists have been flushed
      for it, so the target CPU should be prevented from running code that
      may queue up an IRQ work item on it at that point.
      
      Unfortunately, that may not be the case if dvfs_possible_from_any_cpu
      is set for at least one cpufreq policy in the system, because that
      allows the CPU going offline to run the utilization update callback
      of the cpufreq governor on behalf of another (online) CPU in some
      cases.
      
      If that happens, the cpufreq governor callback may queue up an IRQ
      work on the CPU running it, which is going offline, and the IRQ work
      may not be flushed after that point.  Moreover, that IRQ work cannot
      be flushed until the "offlining" CPU goes back online, so if any
      other CPU calls irq_work_sync() to wait for the completion of that
      IRQ work, it will have to wait until the "offlining" CPU is back
      online and that may not happen forever.  In particular, a system-wide
      deadlock may occur during CPU online as a result of that.
      
      The failing scenario is as follows.  CPU0 is the boot CPU, so it
      creates a cpufreq policy and becomes the "leader" of it
      (policy->cpu).  It cannot go offline, because it is the boot CPU.
      Next, other CPUs join the cpufreq policy as they go online and they
      leave it when they go offline.  The last CPU to go offline, say CPU3,
      may queue up an IRQ work while running the governor callback on
      behalf of CPU0 after leaving the cpufreq policy because of the
      dvfs_possible_from_any_cpu effect described above.  Then, CPU0 is
      the only online CPU in the system and the stale IRQ work is still
      queued on CPU3.  When, say, CPU1 goes back online, it will run
      irq_work_sync() to wait for that IRQ work to complete and so it
      will wait for CPU3 to go back online (which may never happen even
      in principle), but (worse yet) CPU0 is waiting for CPU1 at that
      point too and a system-wide deadlock occurs.
      
      To address this problem notice that CPUs which cannot run cpufreq
      utilization update code for themselves (for example, because they
      have left the cpufreq policies that they belonged to), should also
      be prevented from running that code on behalf of the other CPUs that
      belong to a cpufreq policy with dvfs_possible_from_any_cpu set and so
      in that case the cpufreq_update_util_data pointer of the CPU running
      the code must not be NULL as well as for the CPU which is the target
      of the cpufreq utilization update in progress.
      
      Accordingly, change cpufreq_this_cpu_can_update() into a regular
      function in kernel/sched/cpufreq.c (instead of a static inline in a
      header file) and make it check the cpufreq_update_util_data pointer
      of the local CPU if dvfs_possible_from_any_cpu is set for the target
      cpufreq policy.
      
      Also update the schedutil governor to do the
      cpufreq_this_cpu_can_update() check in the non-fast-switch
      case too to avoid the stale IRQ work issues.
      
      Fixes: 99d14d0e ("cpufreq: Process remote callbacks from any CPU if the platform permits")
      Link: https://lore.kernel.org/linux-pm/20191121093557.bycvdo4xyinbc5cb@vireshk-i7/Reported-by: NAnson Huang <anson.huang@nxp.com>
      Tested-by: NAnson Huang <anson.huang@nxp.com>
      Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Tested-by: Peng Fan <peng.fan@nxp.com> (i.MX8QXP-MEK)
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      85572c2c
    • G
      blk-cgroup: remove blkcg_drain_queue · 5addeae1
      Guoqing Jiang 提交于
      Since blk_drain_queue had already been removed, so this function
      is not needed anymore.
      Signed-off-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5addeae1
  9. 12 12月, 2019 4 次提交
    • D
      init: use do_mount() instead of ksys_mount() · cccaa5e3
      Dominik Brodowski 提交于
      In prepare_namespace(), do_mount() can be used instead of ksys_mount()
      as the first and third argument are const strings in the kernel, the
      second and fourth argument are passed through anyway, and the fifth
      argument is NULL.
      
      In do_mount_root(), ksys_mount() is called with the first and third
      argument being already kernelspace strings, which do not need to be
      copied over from userspace to kernelspace (again). The second and
      fourth arguments are passed through to do_mount() anyway. The fifth
      argument, while already residing in kernelspace, needs to be put into
      a page of its own. Then, do_mount() can be used instead of
      ksys_mount().
      
      Once this is done, there are no in-kernel users to ksys_mount() left,
      which can therefore be removed.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      cccaa5e3
    • D
      devtmpfs: use do_mount() instead of ksys_mount() · 5e787dbf
      Dominik Brodowski 提交于
      In devtmpfs, do_mount() can be called directly instead of complex wrapping
      by ksys_mount():
      - the first and third arguments are const strings in the kernel,
        and do not need to be copied over from userspace;
      - the fifth argument is NULL, and therefore no page needs to be
        copied over from userspace;
      - the second and fourth argument are passed through anyway.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      5e787dbf
    • A
      bpf: Make BPF trampoline use register_ftrace_direct() API · b91e014f
      Alexei Starovoitov 提交于
      Make BPF trampoline attach its generated assembly code to kernel functions via
      register_ftrace_direct() API. It helps ftrace-based tracers co-exist with BPF
      trampoline on the same kernel function. It also switches attaching logic from
      arch specific text_poke to generic ftrace that is available on many
      architectures. text_poke is still necessary for bpf-to-bpf attach and for
      bpf_tail_call optimization.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20191209000114.1876138-3-ast@kernel.org
      b91e014f
    • J
      io_uring: ensure we return -EINVAL on unknown opcode · 9e3aa61a
      Jens Axboe 提交于
      If we submit an unknown opcode and have fd == -1, io_op_needs_file()
      will return true as we default to needing a file. Then when we go and
      assign the file, we find the 'fd' invalid and return -EBADF. We really
      should be returning -EINVAL for that case, as we normally do for
      unsupported opcodes.
      
      Change io_op_needs_file() to have the following return values:
      
      0   - does not need a file
      1   - does need a file
      < 0 - error value
      
      and use this to pass back the right value for this invalid case.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9e3aa61a
  10. 11 12月, 2019 5 次提交
  11. 10 12月, 2019 4 次提交