1. 06 11月, 2017 1 次提交
    • R
      PM / core: Add NEVER_SKIP and SMART_PREPARE driver flags · 08810a41
      Rafael J. Wysocki 提交于
      The motivation for this change is to provide a way to work around
      a problem with the direct-complete mechanism used for avoiding
      system suspend/resume handling for devices in runtime suspend.
      
      The problem is that some middle layer code (the PCI bus type and
      the ACPI PM domain in particular) returns positive values from its
      system suspend ->prepare callbacks regardless of whether the driver's
      ->prepare returns a positive value or 0, which effectively prevents
      drivers from being able to control the direct-complete feature.
      Some drivers need that control, however, and the PCI bus type has
      grown its own flag to deal with this issue, but since it is not
      limited to PCI, it is better to address it by adding driver flags at
      the core level.
      
      To that end, add a driver_flags field to struct dev_pm_info for flags
      that can be set by device drivers at the probe time to inform the PM
      core and/or bus types, PM domains and so on on the capabilities and/or
      preferences of device drivers.  Also add two static inline helpers
      for setting that field and testing it against a given set of flags
      and make the driver core clear it automatically on driver remove
      and probe failures.
      
      Define and document two PM driver flags related to the direct-
      complete feature: NEVER_SKIP and SMART_PREPARE that can be used,
      respectively, to indicate to the PM core that the direct-complete
      mechanism should never be used for the device and to inform the
      middle layer code (bus types, PM domains etc) that it can only
      request the PM core to use the direct-complete mechanism for
      the device (by returning a positive value from its ->prepare
      callback) if it also has been requested by the driver.
      
      While at it, make the core check pm_runtime_suspended() when
      setting power.direct_complete so that it doesn't need to be
      checked by ->prepare callbacks.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
      08810a41
  2. 17 10月, 2017 1 次提交
  3. 14 10月, 2017 1 次提交
  4. 11 10月, 2017 3 次提交
    • U
      PM / sleep: Remove pm_complete_with_resume_check() · 0e708fc6
      Ulf Hansson 提交于
      According to recent changes for ACPI, the are longer any users of
      pm_complete_with_resume_check(), thus let's drop it.
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0e708fc6
    • S
      ACPI / LPIT: Add Low Power Idle Table (LPIT) support · eeb2d80d
      Srinivas Pandruvada 提交于
      Add functionality to read LPIT table, which provides:
      
       - Sysfs interface to read residency counters via
         /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
         /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
      
      Here the count "low_power_idle_cpu_residency_us" shows the time spent
      by CPU package in low power state.  This is read via MSR interface,
      which points to MSR for PKG C10.
      
      Here the count "low_power_idle_system_residency_us" show the count the
      system was in low power state. This is read via MMIO interface. This
      is mapped to SLP_S0 residency on modern Intel systems. This residency
      is achieved only when CPU is in PKG C10 and all functional blocks are
      in low power state.
      
      It is possible that none of the above counters present or anyone of the
      counter present or all counters present.
      
      For example: On my Kabylake system both of the above counters present.
      After suspend to idle these counts updated and prints:
      
       6916179
       6998564
      
      This counter can be read by tools like turbostat to display. Or it can
      be used to debug, if modern systems are reaching desired low power state.
      
       - Provides an interface to read residency counter memory address
      
         This address can be used to get the base address of PMC memory
         mapped IO.  This is utilized by intel_pmc_core driver to print
         more debug information.
      
      In addition, to avoid code duplication to read iomem, removed the read of
      iomem from acpi_os_read_memory() in osl.c and made a common function
      acpi_os_read_iomem(). This new function is used for reading iomem in
      in both osl.c and acpi_lpit.c.
      
      Link: http://www.uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdfSigned-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      eeb2d80d
    • R
      ACPI / PM: Combine two identical device resume routines · 63705c40
      Rafael J. Wysocki 提交于
      Notice that acpi_dev_runtime_resume() and acpi_dev_resume_early() are
      actually literally identical after some more-or-less recent changes,
      so rename acpi_dev_runtime_resume() to acpi_dev_resume(), use it
      everywhere instead of acpi_dev_resume_early() and drop the latter.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
      63705c40
  5. 10 10月, 2017 1 次提交
  6. 04 10月, 2017 11 次提交
  7. 03 10月, 2017 2 次提交
  8. 01 10月, 2017 2 次提交
    • P
      udp: perform source validation for mcast early demux · bc044e8d
      Paolo Abeni 提交于
      The UDP early demux can leverate the rx dst cache even for
      multicast unconnected sockets.
      
      In such scenario the ipv4 source address is validated only on
      the first packet in the given flow. After that, when we fetch
      the dst entry  from the socket rx cache, we stop enforcing
      the rp_filter and we even start accepting any kind of martian
      addresses.
      
      Disabling the dst cache for unconnected multicast socket will
      cause large performace regression, nearly reducing by half the
      max ingress tput.
      
      Instead we factor out a route helper to completely validate an
      skb source address for multicast packets and we call it from
      the UDP early demux for mcast packets landing on unconnected
      sockets, after successful fetching the related cached dst entry.
      
      This still gives a measurable, but limited performance
      regression:
      
      		rp_filter = 0		rp_filter = 1
      edmux disabled:	1182 Kpps		1127 Kpps
      edmux before:	2238 Kpps		2238 Kpps
      edmux after:	2037 Kpps		2019 Kpps
      
      The above figures are on top of current net tree.
      Applying the net-next commit 6e617de8 ("net: avoid a full
      fib lookup when rp_filter is disabled.") the delta with
      rp_filter == 0 will decrease even more.
      
      Fixes: 421b3885 ("udp: ipv4: Add udp early demux")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc044e8d
    • P
      IPv4: early demux can return an error code · 7487449c
      Paolo Abeni 提交于
      Currently no error is emitted, but this infrastructure will
      used by the next patch to allow source address validation
      for mcast sockets.
      Since early demux can do a route lookup and an ipv4 route
      lookup can return an error code this is consistent with the
      current ipv4 route infrastructure.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7487449c
  9. 29 9月, 2017 6 次提交
  10. 28 9月, 2017 3 次提交
    • K
      timer: Prepare to change timer callback argument type · 686fef92
      Kees Cook 提交于
      Modern kernel callback systems pass the structure associated with a
      given callback to the callback function. The timer callback remains one
      of the legacy cases where an arbitrary unsigned long argument continues
      to be passed as the callback argument. This has several problems:
      
      - This bloats the timer_list structure with a normally redundant
        .data field.
      
      - No type checking is being performed, forcing callbacks to do
        explicit type casts of the unsigned long argument into the object
        that was passed, rather than using container_of(), as done in most
        of the other callback infrastructure.
      
      - Neighboring buffer overflows can overwrite both the .function and
        the .data field, providing attackers with a way to elevate from a buffer
        overflow into a simplistic ROP-like mechanism that allows calling
        arbitrary functions with a controlled first argument.
      
      - For future Control Flow Integrity work, this creates a unique function
        prototype for timer callbacks, instead of allowing them to continue to
        be clustered with other void functions that take a single unsigned long
        argument.
      
      This adds a new timer initialization API, which will ultimately replace
      the existing setup_timer(), setup_{deferrable,pinned,etc}_timer() family,
      named timer_setup() (to mirror hrtimer_setup(), making instances of its
      use much easier to grep for).
      
      In order to support the migration of existing timers into the new
      callback arguments, timer_setup() casts its arguments to the existing
      legacy types, and explicitly passes the timer pointer as the legacy
      data argument. Once all setup_*timer() callers have been replaced with
      timer_setup(), the casts can be removed, and the data argument can be
      dropped with the timer expiration code changed to just pass the timer
      to the callback directly.
      
      Since the regular pattern of using container_of() during local variable
      declaration repeats the need for the variable type declaration
      to be included, this adds a helper modeled after other from_*()
      helpers that wrap container_of(), named from_timer(). This helper uses
      typeof(*variable), removing the type redundancy and minimizing the need
      for line wraps in forthcoming conversions from "unsigned data long" to
      "struct timer_list *" in the timer callbacks:
      
      -void callback(unsigned long data)
      +void callback(struct timer_list *t)
      {
      -   struct some_data_structure *local = (struct some_data_structure *)data;
      +   struct some_data_structure *local = from_timer(local, t, timer);
      
      Finally, in order to support the handful of timer users that perform
      open-coded assignments of the .function (and .data) fields, provide
      cast macros (TIMER_FUNC_TYPE and TIMER_DATA_TYPE) that can be used
      temporarily. Once conversion has been completed, these can be globally
      trivially removed.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20170928133817.GA113410@beast
      686fef92
    • R
      net/mlx5: Check device capability for maximum flow counters · 16f1c5bb
      Raed Salem 提交于
      Added check for the maximal number of flow counters attached
      to rule (FTE).
      
      Fixes: bd5251db ('net/mlx5_core: Introduce flow steering destination of type counter')
      Signed-off-by: NRaed Salem <raeds@mellanox.com>
      Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      16f1c5bb
    • I
      net/mlx5: Fix FPGA capability location · 99d3cd27
      Inbar Karmy 提交于
      Currently, FPGA capability is located in (mdev)->caps.hca_cur,
      change the location to be (mdev)->caps.fpga,
      since hca_cur is reserved for HCA device capabilities.
      
      Fixes: e29341fb ("net/mlx5: FPGA, Add basic support for Innova")
      Signed-off-by: NInbar Karmy <inbark@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      99d3cd27
  11. 27 9月, 2017 1 次提交
  12. 26 9月, 2017 7 次提交
    • M
      percpu: make this_cpu_generic_read() atomic w.r.t. interrupts · e88d62cd
      Mark Rutland 提交于
      As raw_cpu_generic_read() is a plain read from a raw_cpu_ptr() address,
      it's possible (albeit unlikely) that the compiler will split the access
      across multiple instructions.
      
      In this_cpu_generic_read() we disable preemption but not interrupts
      before calling raw_cpu_generic_read(). Thus, an interrupt could be taken
      in the middle of the split load instructions. If a this_cpu_write() or
      RMW this_cpu_*() op is made to the same variable in the interrupt
      handling path, this_cpu_read() will return a torn value.
      
      For native word types, we can avoid tearing using READ_ONCE(), but this
      won't work in all cases (e.g. 64-bit types on most 32-bit platforms).
      This patch reworks this_cpu_generic_read() to use READ_ONCE() where
      possible, otherwise falling back to disabling interrupts.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pranith Kumar <bobby.prani@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: stable@vger.kernel.org
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e88d62cd
    • A
      netlink: fix nla_put_{u8,u16,u32} for KASAN · b4391db4
      Arnd Bergmann 提交于
      When CONFIG_KASAN is enabled, the "--param asan-stack=1" causes rather large
      stack frames in some functions. This goes unnoticed normally because
      CONFIG_FRAME_WARN is disabled with CONFIG_KASAN by default as of commit
      3f181b4d ("lib/Kconfig.debug: disable -Wframe-larger-than warnings with
      KASAN=y").
      
      The kernelci.org build bot however has the warning enabled and that led
      me to investigate it a little further, as every build produces these warnings:
      
      net/wireless/nl80211.c:4389:1: warning: the frame size of 2240 bytes is larger than 2048 bytes [-Wframe-larger-than=]
      net/wireless/nl80211.c:1895:1: warning: the frame size of 3776 bytes is larger than 2048 bytes [-Wframe-larger-than=]
      net/wireless/nl80211.c:1410:1: warning: the frame size of 2208 bytes is larger than 2048 bytes [-Wframe-larger-than=]
      net/bridge/br_netlink.c:1282:1: warning: the frame size of 2544 bytes is larger than 2048 bytes [-Wframe-larger-than=]
      
      Most of this problem is now solved in gcc-8, which can consolidate
      the stack slots for the inline function arguments. On older compilers
      we can add a workaround by declaring a local variable in each function
      to pass the inline function argument.
      
      Cc: stable@vger.kernel.org
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4391db4
    • R
      PM / core: Drop legacy class suspend/resume operations · a380f2ed
      Rafael J. Wysocki 提交于
      There are no classes using the legacy suspend/resume operations in
      the tree any more, so drop these operations and update the code
      referring to them accordingly.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a380f2ed
    • P
      smp/hotplug: Hotplug state fail injection · 1db49484
      Peter Zijlstra 提交于
      Add a sysfs file to one-time fail a specific state. This can be used
      to test the state rollback code paths.
      
      Something like this (hotplug-up.sh):
      
        #!/bin/bash
      
        echo 0 > /debug/sched_debug
        echo 1 > /debug/tracing/events/cpuhp/enable
      
        ALL_STATES=`cat /sys/devices/system/cpu/hotplug/states | cut -d':' -f1`
        STATES=${1:-$ALL_STATES}
      
        for state in $STATES
        do
      	  echo 0 > /sys/devices/system/cpu/cpu1/online
      	  echo 0 > /debug/tracing/trace
      	  echo Fail state: $state
      	  echo $state > /sys/devices/system/cpu/cpu1/hotplug/fail
      	  cat /sys/devices/system/cpu/cpu1/hotplug/fail
      	  echo 1 > /sys/devices/system/cpu/cpu1/online
      
      	  cat /debug/tracing/trace > hotfail-${state}.trace
      
      	  sleep 1
        done
      
      Can be used to test for all possible rollback (barring multi-instance)
      scenarios on CPU-up, CPU-down is a trivial modification of the above.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: bigeasy@linutronix.de
      Cc: efault@gmx.de
      Cc: rostedt@goodmis.org
      Cc: max.byungchul.park@gmail.com
      Link: https://lkml.kernel.org/r/20170920170546.972581715@infradead.org
      
      1db49484
    • P
      smp/hotplug: Add state diagram · fac1c204
      Peter Zijlstra 提交于
      Add a state diagram to clarify when which states are ran where.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: bigeasy@linutronix.de
      Cc: efault@gmx.de
      Cc: rostedt@goodmis.org
      Cc: max.byungchul.park@gmail.com
      Link: https://lkml.kernel.org/r/20170920170546.661598270@infradead.org
      
      fac1c204
    • J
      nvmet-fc: sync header templates with comments · 6b71f9e1
      James Smart 提交于
      Comments were incorrect:
      - defer_rcv was in host port template. moved to target port template
      - Added Mandatory statements for target port template items
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6b71f9e1
    • G
      PCI: Add dummy pci_acs_enabled() for CONFIG_PCI=n build · fe594932
      Geert Uytterhoeven 提交于
      If CONFIG_PCI=n and gcc (e.g. 4.1.2) decides not to inline
      get_pci_function_alias_group(), the build fails with:
      
        drivers/iommu/iommu.o: In function `get_pci_function_alias_group':
        iommu.c:(.text+0xfdc): undefined reference to `pci_acs_enabled'
      
      Due to the various dummies for PCI calls in the CONFIG_PCI=n case,
      pci_acs_enabled() never called, but not all versions of gcc are smart
      enough to realize that.
      
      While explicitly marking get_pci_function_alias_group() inline would fix
      the build, this would inflate the code for the CONFIG_PCI=y case, as
      get_pci_function_alias_group() is a not-so-small function called from two
      places.
      
      Hence fix the issue by introducing a dummy for pci_acs_enabled() instead.
      
      Fixes: 0ae349a0 ("iommu/qcom: Add qcom_iommu")
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NAlex Williamson <alex.williamson@redhat.com>
      fe594932
  13. 25 9月, 2017 1 次提交