1. 13 9月, 2021 1 次提交
  2. 11 9月, 2021 2 次提交
  3. 10 9月, 2021 3 次提交
  4. 09 9月, 2021 31 次提交
    • M
      thermal/drivers/qcom/spmi-adc-tm5: Don't abort probing if a sensor is not used · 70ee251d
      Matthias Kaehlcke 提交于
      adc_tm5_register_tzd() registers the thermal zone sensors for all
      channels of the thermal monitor. If the registration of one channel
      fails the function skips the processing of the remaining channels
      and returns an error, which results in _probe() being aborted.
      
      One of the reasons the registration could fail is that none of the
      thermal zones is using the channel/sensor, which hardly is a critical
      error (if it is an error at all). If this case is detected emit a
      warning and continue with processing the remaining channels.
      
      Fixes: ca66dca5 ("thermal: qcom: add support for adc-tm5 PMIC thermal monitor")
      Signed-off-by: NMatthias Kaehlcke <mka@chromium.org>
      Reported-by: NStephen Boyd <swboyd@chromium.org>
      Reviewed-by: NStephen Boyd <swboyd@chromium.org>
      Reviewed-by: NDmitry Baryshkov <dmitry.baryshkov@linaro.org>
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Link: https://lore.kernel.org/r/20210823134726.1.I1dd23ddf77e5b3568625d80d6827653af071ce19@changeid
      70ee251d
    • S
      thermal/drivers/intel: Allow processing of HWP interrupt · 5950fc44
      Srinivas Pandruvada 提交于
      Add a weak function to process HWP (Hardware P-states) notifications and
      move updating HWP_STATUS MSR to this function.
      
      This allows HWP interrupts to be processed by the intel_pstate driver in
      HWP mode by overriding the implementation.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Acked-by: NZhang Rui <rui.zhang@intel.com>
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Link: https://lore.kernel.org/r/20210820024006.2347720-1-srinivas.pandruvada@linux.intel.com
      5950fc44
    • R
      iommu: Clarify default domain Kconfig · 8cc63319
      Robin Murphy 提交于
      Although strictly it is the AMD and Intel drivers which have an existing
      expectation of lazy behaviour by default, it ends up being rather
      unintuitive to describe this literally in Kconfig. Express it instead as
      an architecture dependency, to clarify that it is a valid config-time
      decision. The end result is the same since virtio-iommu doesn't support
      lazy mode and thus falls back to strict at runtime regardless.
      
      The per-architecture disparity is a matter of historical expectations:
      the AMD and Intel drivers have been lazy by default since 2008, and
      changing that gets noticed by people asking where their I/O throughput
      has gone. Conversely, Arm-based systems with their wider assortment of
      IOMMU drivers mostly only support strict mode anyway; only the Arm SMMU
      drivers have later grown support for passthrough and lazy mode, for
      users who wanted to explicitly trade off isolation for performance.
      These days, reducing the default level of isolation in a way which may
      go unnoticed by users who expect otherwise hardly seems worth risking
      for the sake of one line of Kconfig, so here's where we are.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Link: https://lore.kernel.org/r/69a0c6f17b000b54b8333ee42b3124c1d5a869e2.1631105737.git.robin.murphy@arm.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
      8cc63319
    • F
      iommu/vt-d: Fix a deadlock in intel_svm_drain_prq() · 6ef05051
      Fenghua Yu 提交于
      pasid_mutex and dev->iommu->param->lock are held while unbinding mm is
      flushing IO page fault workqueue and waiting for all page fault works to
      finish. But an in-flight page fault work also need to hold the two locks
      while unbinding mm are holding them and waiting for the work to finish.
      This may cause an ABBA deadlock issue as shown below:
      
      	idxd 0000:00:0a.0: unbind PASID 2
      	======================================================
      	WARNING: possible circular locking dependency detected
      	5.14.0-rc7+ #549 Not tainted [  186.615245] ----------
      	dsa_test/898 is trying to acquire lock:
      	ffff888100d854e8 (&param->lock){+.+.}-{3:3}, at:
      	iopf_queue_flush_dev+0x29/0x60
      	but task is already holding lock:
      	ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at:
      	intel_svm_unbind+0x34/0x1e0
      	which lock already depends on the new lock.
      
      	the existing dependency chain (in reverse order) is:
      
      	-> #2 (pasid_mutex){+.+.}-{3:3}:
      	       __mutex_lock+0x75/0x730
      	       mutex_lock_nested+0x1b/0x20
      	       intel_svm_page_response+0x8e/0x260
      	       iommu_page_response+0x122/0x200
      	       iopf_handle_group+0x1c2/0x240
      	       process_one_work+0x2a5/0x5a0
      	       worker_thread+0x55/0x400
      	       kthread+0x13b/0x160
      	       ret_from_fork+0x22/0x30
      
      	-> #1 (&param->fault_param->lock){+.+.}-{3:3}:
      	       __mutex_lock+0x75/0x730
      	       mutex_lock_nested+0x1b/0x20
      	       iommu_report_device_fault+0xc2/0x170
      	       prq_event_thread+0x28a/0x580
      	       irq_thread_fn+0x28/0x60
      	       irq_thread+0xcf/0x180
      	       kthread+0x13b/0x160
      	       ret_from_fork+0x22/0x30
      
      	-> #0 (&param->lock){+.+.}-{3:3}:
      	       __lock_acquire+0x1134/0x1d60
      	       lock_acquire+0xc6/0x2e0
      	       __mutex_lock+0x75/0x730
      	       mutex_lock_nested+0x1b/0x20
      	       iopf_queue_flush_dev+0x29/0x60
      	       intel_svm_drain_prq+0x127/0x210
      	       intel_svm_unbind+0xc5/0x1e0
      	       iommu_sva_unbind_device+0x62/0x80
      	       idxd_cdev_release+0x15a/0x200 [idxd]
      	       __fput+0x9c/0x250
      	       ____fput+0xe/0x10
      	       task_work_run+0x64/0xa0
      	       exit_to_user_mode_prepare+0x227/0x230
      	       syscall_exit_to_user_mode+0x2c/0x60
      	       do_syscall_64+0x48/0x90
      	       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      	other info that might help us debug this:
      
      	Chain exists of:
      	  &param->lock --> &param->fault_param->lock --> pasid_mutex
      
      	 Possible unsafe locking scenario:
      
      	       CPU0                    CPU1
      	       ----                    ----
      	  lock(pasid_mutex);
      				       lock(&param->fault_param->lock);
      				       lock(pasid_mutex);
      	  lock(&param->lock);
      
      	 *** DEADLOCK ***
      
      	2 locks held by dsa_test/898:
      	 #0: ffff888100cc1cc0 (&group->mutex){+.+.}-{3:3}, at:
      	 iommu_sva_unbind_device+0x53/0x80
      	 #1: ffffffff82b2f7c8 (pasid_mutex){+.+.}-{3:3}, at:
      	 intel_svm_unbind+0x34/0x1e0
      
      	stack backtrace:
      	CPU: 2 PID: 898 Comm: dsa_test Not tainted 5.14.0-rc7+ #549
      	Hardware name: Intel Corporation Kabylake Client platform/KBL S
      	DDR4 UD IMM CRB, BIOS KBLSE2R1.R00.X050.P01.1608011715 08/01/2016
      	Call Trace:
      	 dump_stack_lvl+0x5b/0x74
      	 dump_stack+0x10/0x12
      	 print_circular_bug.cold+0x13d/0x142
      	 check_noncircular+0xf1/0x110
      	 __lock_acquire+0x1134/0x1d60
      	 lock_acquire+0xc6/0x2e0
      	 ? iopf_queue_flush_dev+0x29/0x60
      	 ? pci_mmcfg_read+0xde/0x240
      	 __mutex_lock+0x75/0x730
      	 ? iopf_queue_flush_dev+0x29/0x60
      	 ? pci_mmcfg_read+0xfd/0x240
      	 ? iopf_queue_flush_dev+0x29/0x60
      	 mutex_lock_nested+0x1b/0x20
      	 iopf_queue_flush_dev+0x29/0x60
      	 intel_svm_drain_prq+0x127/0x210
      	 ? intel_pasid_tear_down_entry+0x22e/0x240
      	 intel_svm_unbind+0xc5/0x1e0
      	 iommu_sva_unbind_device+0x62/0x80
      	 idxd_cdev_release+0x15a/0x200
      
      pasid_mutex protects pasid and svm data mapping data. It's unnecessary
      to hold pasid_mutex while flushing the workqueue. To fix the deadlock
      issue, unlock pasid_pasid during flushing the workqueue to allow the works
      to be handled.
      
      Fixes: d5b9e4bf ("iommu/vt-d: Report prq to io-pgfault framework")
      Reported-and-tested-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.comSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20210828070622.2437559-3-baolu.lu@linux.intel.com
      [joro: Removed timing information from kernel log messages]
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      6ef05051
    • F
      iommu/vt-d: Fix PASID leak in intel_svm_unbind_mm() · a21518cb
      Fenghua Yu 提交于
      The mm->pasid will be used in intel_svm_free_pasid() after load_pasid()
      during unbinding mm. Clearing it in load_pasid() will cause PASID cannot
      be freed in intel_svm_free_pasid().
      
      Additionally mm->pasid was updated already before load_pasid() during pasid
      allocation. No need to update it again in load_pasid() during binding mm.
      Don't update mm->pasid to avoid the issues in both binding mm and unbinding
      mm.
      
      Fixes: 40483774 ("iommu/vt-d: Use iommu_sva_alloc(free)_pasid() helpers")
      Reported-and-tested-by: NDave Jiang <dave.jiang@intel.com>
      Co-developed-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
      Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com>
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Link: https://lore.kernel.org/r/20210826215918.4073446-1-fenghua.yu@intel.comSigned-off-by: NLu Baolu <baolu.lu@linux.intel.com>
      Link: https://lore.kernel.org/r/20210828070622.2437559-2-baolu.lu@linux.intel.comSigned-off-by: NJoerg Roedel <jroedel@suse.de>
      a21518cb
    • S
      iommu/amd: Remove iommu_init_ga() · eb03f2d2
      Suravee Suthikulpanit 提交于
      Since the function has been simplified and only call iommu_init_ga_log(),
      remove the function and replace with iommu_init_ga_log() instead.
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Link: https://lore.kernel.org/r/20210820202957.187572-4-suravee.suthikulpanit@amd.com
      Fixes: 8bda0cfb ("iommu/amd: Detect and initialize guest vAPIC log")
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      eb03f2d2
    • W
      iommu/amd: Relocate GAMSup check to early_enable_iommus · c3811a50
      Wei Huang 提交于
      Currently, iommu_init_ga() checks and disables IOMMU VAPIC support
      (i.e. AMD AVIC support in IOMMU) when GAMSup feature bit is not set.
      However it forgets to clear IRQ_POSTING_CAP from the previously set
      amd_iommu_irq_ops.capability.
      
      This triggers an invalid page fault bug during guest VM warm reboot
      if AVIC is enabled since the irq_remapping_cap(IRQ_POSTING_CAP) is
      incorrectly set, and crash the system with the following kernel trace.
      
          BUG: unable to handle page fault for address: 0000000000400dd8
          RIP: 0010:amd_iommu_deactivate_guest_mode+0x19/0xbc
          Call Trace:
           svm_set_pi_irte_mode+0x8a/0xc0 [kvm_amd]
           ? kvm_make_all_cpus_request_except+0x50/0x70 [kvm]
           kvm_request_apicv_update+0x10c/0x150 [kvm]
           svm_toggle_avic_for_irq_window+0x52/0x90 [kvm_amd]
           svm_enable_irq_window+0x26/0xa0 [kvm_amd]
           vcpu_enter_guest+0xbbe/0x1560 [kvm]
           ? avic_vcpu_load+0xd5/0x120 [kvm_amd]
           ? kvm_arch_vcpu_load+0x76/0x240 [kvm]
           ? svm_get_segment_base+0xa/0x10 [kvm_amd]
           kvm_arch_vcpu_ioctl_run+0x103/0x590 [kvm]
           kvm_vcpu_ioctl+0x22a/0x5d0 [kvm]
           __x64_sys_ioctl+0x84/0xc0
           do_syscall_64+0x33/0x40
           entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fixes by moving the initializing of AMD IOMMU interrupt remapping mode
      (amd_iommu_guest_ir) earlier before setting up the
      amd_iommu_irq_ops.capability with appropriate IRQ_POSTING_CAP flag.
      
      [joro:	Squashed the two patches and limited
      	check_features_on_all_iommus() to CONFIG_IRQ_REMAP
      	to fix a compile warning.]
      Signed-off-by: NWei Huang <wei.huang2@amd.com>
      Co-developed-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Link: https://lore.kernel.org/r/20210820202957.187572-2-suravee.suthikulpanit@amd.com
      Link: https://lore.kernel.org/r/20210820202957.187572-3-suravee.suthikulpanit@amd.com
      Fixes: 8bda0cfb ("iommu/amd: Detect and initialize guest vAPIC log")
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      c3811a50
    • G
      parisc: Move pci_dev_is_behind_card_dino to where it is used · 907872ba
      Guenter Roeck 提交于
      parisc build test images fail to compile with the following error.
      
      drivers/parisc/dino.c:160:12: error:
      	'pci_dev_is_behind_card_dino' defined but not used
      
      Move the function just ahead of its only caller to avoid the error.
      
      Fixes: 5fa16591 ("parisc: Disable HP HSC-PCI Cards to prevent kernel crash")
      Cc: Helge Deller <deller@gmx.de>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      907872ba
    • Y
      rtc: rx8010: select REGMAP_I2C · 0c45d3e2
      Yu-Tung Chang 提交于
      The rtc-rx8010 uses the I2C regmap but doesn't select it in Kconfig so
      depending on the configuration the build may fail. Fix it.
      Signed-off-by: NYu-Tung Chang <mtwget@gmail.com>
      Signed-off-by: NAlexandre Belloni <alexandre.belloni@bootlin.com>
      Link: https://lore.kernel.org/r/20210830052532.40356-1-mtwget@gmail.com
      0c45d3e2
    • G
      Input: analog - always use ktime functions · 0c5483a5
      Guenter Roeck 提交于
      m68k, mips, s390, and sparc allmodconfig images fail to build with the
      following error.
      
      drivers/input/joystick/analog.c:160:2: error:
      	#warning Precise timer not defined for this architecture.
      
      Remove architecture specific time handling code and always use ktime
      functions to determine time deltas. Also remove the now useless use_ktime
      kernel parameter.
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested
      Link: https://lore.kernel.org/r/20210907123734.21520-1-linux@roeck-us.netSigned-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      0c5483a5
    • A
      ACPI: PRM: Find PRMT table before parsing it · 3265cc3e
      Aubrey Li 提交于
      Find and verify PRMT before parsing it, which eliminates a
      warning on machines without PRMT:
      
      	[    7.197173] ACPI: PRMT not present
      
      Fixes: cefc7ca4 ("ACPI: PRM: implement OperationRegion handler for the PlatformRtMechanism subtype")
      Signed-off-by: NAubrey Li <aubrey.li@linux.intel.com>
      Tested-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Cc: 5.14+ <stable@vger.kernel.org> # 5.14+
      [ rjw: Subject and changelog edits ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3265cc3e
    • D
      phy/drivers/stm32: use HZ macros · 18821693
      Daniel Lezcano 提交于
      HZ unit conversion macros are available in units.h, use them and remove
      the duplicate definition.
      
      Link: https://lkml.kernel.org/r/20210816114732.1834145-11-daniel.lezcano@linaro.orgSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Chanwoo Choi <cw00.choi@samsung.com>
      Cc: Christian Eggers <ceggers@arri.de>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Lukasz Luba <lukasz.luba@arm.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Miquel Raynal <miquel.raynal@bootlin.com>
      Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
      Cc: Peter Meerwald <pmeerw@pmeerw.net>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      18821693
    • D
      mtd/drivers/nand: use HZ macros · 9ef347c3
      Daniel Lezcano 提交于
      HZ unit conversion macros are available in units.h, use them and remove
      the duplicate definition.
      
      Link: https://lkml.kernel.org/r/20210816114732.1834145-10-daniel.lezcano@linaro.orgSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Chanwoo Choi <cw00.choi@samsung.com>
      Cc: Christian Eggers <ceggers@arri.de>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Lukasz Luba <lukasz.luba@arm.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
      Cc: Peter Meerwald <pmeerw@pmeerw.net>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ef347c3
    • D
      i2c/drivers/ov02q10: use HZ macros · 09704a94
      Daniel Lezcano 提交于
      HZ unit conversion macros are available in units.h, use them and remove
      the duplicate definition.
      
      Link: https://lkml.kernel.org/r/20210816114732.1834145-9-daniel.lezcano@linaro.orgSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Chanwoo Choi <cw00.choi@samsung.com>
      Cc: Christian Eggers <ceggers@arri.de>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Lukasz Luba <lukasz.luba@arm.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Miquel Raynal <miquel.raynal@bootlin.com>
      Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
      Cc: Peter Meerwald <pmeerw@pmeerw.net>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09704a94
    • D
      iio/drivers/hid-sensor: use HZ macros · 87000e7f
      Daniel Lezcano 提交于
      HZ unit conversion macros are available in units.h, use them and remove
      the duplicate definition.
      
      Link: https://lkml.kernel.org/r/20210816114732.1834145-8-daniel.lezcano@linaro.orgSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Acked-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Chanwoo Choi <cw00.choi@samsung.com>
      Cc: Christian Eggers <ceggers@arri.de>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Lukasz Luba <lukasz.luba@arm.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Miquel Raynal <miquel.raynal@bootlin.com>
      Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
      Cc: Peter Meerwald <pmeerw@pmeerw.net>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87000e7f
    • D
      hwmon/drivers/mr75203: use HZ macros · d59eacaa
      Daniel Lezcano 提交于
      HZ unit conversion macros are available in units.h, use them and remove
      the duplicate definition.
      
      The new macro is an unsigned long.  The code dealing with it is
      considering as an unsigned long also.
      
      Link: https://lkml.kernel.org/r/20210816114732.1834145-7-daniel.lezcano@linaro.orgSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Reviewed-by: NChristian Eggers <ceggers@arri.de>
      Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Acked-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: Chanwoo Choi <cw00.choi@samsung.com>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Lukasz Luba <lukasz.luba@arm.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Miquel Raynal <miquel.raynal@bootlin.com>
      Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
      Cc: Peter Meerwald <pmeerw@pmeerw.net>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d59eacaa
    • D
      iio/drivers/as73211: use HZ macros · 55c653e0
      Daniel Lezcano 提交于
      HZ unit conversion macros are available in units.h, use them and remove
      the duplicate definition.
      
      Link: https://lkml.kernel.org/r/20210816114732.1834145-6-daniel.lezcano@linaro.orgSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Reviewed-by: NChristian Eggers <ceggers@arri.de>
      Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Acked-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Chanwoo Choi <cw00.choi@samsung.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Lukasz Luba <lukasz.luba@arm.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Miquel Raynal <miquel.raynal@bootlin.com>
      Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
      Cc: Peter Meerwald <pmeerw@pmeerw.net>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      55c653e0
    • D
      devfreq: use HZ macros · 04c8984a
      Daniel Lezcano 提交于
      HZ unit conversion macros are available in units.h, use them and remove
      the duplicate definition.
      
      The new macro has an unsigned long type.
      
      All the code is dealing with unsigned long and the code using the macro is
      doing a coercitive cast to unsigned long.
      
      Link: https://lkml.kernel.org/r/20210816114732.1834145-5-daniel.lezcano@linaro.orgSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Reviewed-by: NChristian Eggers <ceggers@arri.de>
      Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Acked-by: NChanwoo Choi <cw00.choi@samsung.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Lukasz Luba <lukasz.luba@arm.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Miquel Raynal <miquel.raynal@bootlin.com>
      Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
      Cc: Peter Meerwald <pmeerw@pmeerw.net>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04c8984a
    • D
      thermal/drivers/devfreq_cooling: use HZ macros · 73b718c6
      Daniel Lezcano 提交于
      HZ unit conversion macros are available in units.h, use them and remove
      the duplicate definition.
      
      The new macro uses a unsigned long type which is already the type in the
      current code via the 'freq' variable.
      
      Link: https://lkml.kernel.org/r/20210816114732.1834145-4-daniel.lezcano@linaro.orgSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: NChristian Eggers <ceggers@arri.de>
      Cc: Chanwoo Choi <cw00.choi@samsung.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jonathan Cameron <jic23@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Lukasz Luba <lukasz.luba@arm.com>
      Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
      Cc: Miquel Raynal <miquel.raynal@bootlin.com>
      Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
      Cc: Peter Meerwald <pmeerw@pmeerw.net>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73b718c6
    • D
      mm/memory_hotplug: improved dynamic memory group aware "auto-movable" online policy · 3fcebf90
      David Hildenbrand 提交于
      Currently, the "auto-movable" online policy does not allow for hotplugged
      KERNEL (ZONE_NORMAL) memory to increase the amount of MOVABLE memory we
      can have, primarily, because there is no coordiantion across memory
      devices and we don't want to create zone-imbalances accidentially when
      unplugging memory.
      
      However, within a single memory device it's different.  Let's allow for
      KERNEL memory within a dynamic memory group to allow for more MOVABLE
      within the same memory group.  The only thing we have to take care of is
      that the managing driver avoids zone imbalances by unplugging MOVABLE
      memory first, otherwise there can be corner cases where unplug of memory
      could result in (accidential) zone imbalances.
      
      virtio-mem is the only user of dynamic memory groups and recently added
      support for prioritizing unplug of ZONE_MOVABLE over ZONE_NORMAL, so we
      don't need a new toggle to enable it for dynamic memory groups.
      
      We limit this handling to dynamic memory groups, because:
      
      * We want to keep the runtime overhead for collecting stats when
        onlining a single memory block small.  We tend to have only a handful of
        dynamic memory groups, but we can have quite some static memory groups
        (e.g., 256 DIMMs).
      
      * It doesn't make too much sense for static memory groups, as we try
        onlining all applicable memory blocks either completely to ZONE_MOVABLE
        or not.  In ordinary operation, we won't have a mixture of zones within
        a static memory group.
      
      When adding memory to a dynamic memory group, we'll first online memory to
      ZONE_MOVABLE as long as early KERNEL memory allows for it.  Then, we'll
      online the next unit(s) to ZONE_NORMAL, until we can online the next
      unit(s) to ZONE_MOVABLE.
      
      For a simple virtio-mem device with a MOVABLE:KERNEL ratio of 3:1, it will
      result in a layout like:
      
        [M][M][M][M][M][M][M][M][N][M][M][M][N][M][M][M]...
        ^ movable memory due to early kernel memory
      			   ^ allows for more movable memory ...
      			      ^-----^ ... here
      				       ^ allows for more movable memory ...
      				          ^-----^ ... here
      
      While the created layout is sub-optimal when it comes to contiguous zones,
      it gives us the maximum flexibility when dynamically growing/shrinking a
      device; we can grow small VMs really big in small steps, and still shrink
      reliably to e.g., 1/4 of the maximum VM size in this example, removing
      full memory blocks along with meta data more reliably.
      
      Mark dynamic memory groups in the xarray such that we can efficiently
      iterate over them when collecting stats.  In usual setups, we have one
      virtio-mem device per NUMA node, and usually only a small number of NUMA
      nodes.
      
      Note: for now, there seems to be no compelling reason to make this
      behavior configurable.
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-10-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3fcebf90
    • D
      mm/memory_hotplug: memory group aware "auto-movable" online policy · 445fcf7c
      David Hildenbrand 提交于
      Use memory groups to improve our "auto-movable" onlining policy:
      
      1. For static memory groups (e.g., a DIMM), online a memory block MOVABLE
         only if all other memory blocks in the group are either MOVABLE or could
         be onlined MOVABLE. A DIMM will either be MOVABLE or not, not a mixture.
      
      2. For dynamic memory groups (e.g., a virtio-mem device), online a
         memory block MOVABLE only if all other memory blocks inside the
         current unit are either MOVABLE or could be onlined MOVABLE. For a
         virtio-mem device with a device block size with 512 MiB, all 128 MiB
         memory blocks wihin a 512 MiB unit will either be MOVABLE or not, not
         a mixture.
      
      We have to pass the memory group to zone_for_pfn_range() to take the
      memory group into account.
      
      Note: for now, there seems to be no compelling reason to make this
      behavior configurable.
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-9-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      445fcf7c
    • D
      virtio-mem: use a single dynamic memory group for a single virtio-mem device · ffaa6ce8
      David Hildenbrand 提交于
      Let's use a single dynamic memory group.
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-8-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ffaa6ce8
    • D
      dax/kmem: use a single static memory group for a single probed unit · eedf634a
      David Hildenbrand 提交于
      Although dax/kmem users often disable auto-onlining and instead online
      memory manually (usually to ZONE_MOVABLE), there is still value in having
      auto-onlining be aware of the relationship of memory blocks.
      
      Let's treat one probed unit as a single static memory device, similar to a
      single ACPI memory device.
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-7-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eedf634a
    • D
      ACPI: memhotplug: use a single static memory group for a single memory device · 2a157839
      David Hildenbrand 提交于
      Let's group all memory we add for a single memory device - we want a
      single node for that (which also seems to be the sane thing to do).
      
      We won't care for now about memory that was already added to the system
      (e.g., via e820) -- usually *all* memory of a memory device was already
      added and we'll fail acpi_memory_enable_device().
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-6-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a157839
    • D
      mm/memory_hotplug: track present pages in memory groups · 836809ec
      David Hildenbrand 提交于
      Let's track all present pages in each memory group.  Especially, track
      memory present in ZONE_MOVABLE and memory present in one of the kernel
      zones (which really only is ZONE_NORMAL right now as memory groups only
      apply to hotplugged memory) separately within a memory group, to prepare
      for making smart auto-online decision for individual memory blocks within
      a memory group based on group statistics.
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-5-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      836809ec
    • D
      drivers/base/memory: introduce "memory groups" to logically group memory blocks · 028fc57a
      David Hildenbrand 提交于
      In our "auto-movable" memory onlining policy, we want to make decisions
      across memory blocks of a single memory device.  Examples of memory
      devices include ACPI memory devices (in the simplest case a single DIMM)
      and virtio-mem.  For now, we don't have a connection between a single
      memory block device and the real memory device.  Each memory device
      consists of 1..X memory block devices.
      
      Let's logically group memory blocks belonging to the same memory device in
      "memory groups".  Memory groups can span multiple physical ranges and a
      memory group itself does not contain any information regarding physical
      ranges, only properties (e.g., "max_pages") necessary for improved memory
      onlining.
      
      Introduce two memory group types:
      
      1) Static memory group: E.g., a single ACPI memory device, consisting
         of 1..X memory resources.  A memory group consists of 1..Y memory
         blocks.  The whole group is added/removed in one go.  If any part
         cannot get offlined, the whole group cannot be removed.
      
      2) Dynamic memory group: E.g., a single virtio-mem device.  Memory is
         dynamically added/removed in a fixed granularity, called a "unit",
         consisting of 1..X memory blocks.  A unit is added/removed in one go.
         If any part of a unit cannot get offlined, the whole unit cannot be
         removed.
      
      In case of 1) we usually want either all memory managed by ZONE_MOVABLE or
      none.  In case of 2) we usually want to have as many units as possible
      managed by ZONE_MOVABLE.  We want a single unit to be of the same type.
      
      For now, memory groups are an internal concept that is not exposed to user
      space; we might want to change that in the future, though.
      
      add_memory() users can specify a mgid instead of a nid when passing the
      MHP_NID_IS_MGID flag.
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-4-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      028fc57a
    • D
      mm: track present early pages per zone · 4b097002
      David Hildenbrand 提交于
      Patch series "mm/memory_hotplug: "auto-movable" online policy and memory groups", v3.
      
      I. Goal
      
      The goal of this series is improving in-kernel auto-online support.  It
      tackles the fundamental problems that:
      
       1) We can create zone imbalances when onlining all memory blindly to
          ZONE_MOVABLE, in the worst case crashing the system. We have to know
          upfront how much memory we are going to hotplug such that we can
          safely enable auto-onlining of all hotplugged memory to ZONE_MOVABLE
          via "online_movable". This is far from practical and only applicable in
          limited setups -- like inside VMs under the RHV/oVirt hypervisor which
          will never hotplug more than 3 times the boot memory (and the
          limitation is only in place due to the Linux limitation).
      
       2) We see more setups that implement dynamic VM resizing, hot(un)plugging
          memory to resize VM memory. In these setups, we might hotplug a lot of
          memory, but it might happen in various small steps in both directions
          (e.g., 2 GiB -> 8 GiB -> 4 GiB -> 16 GiB ...). virtio-mem is the
          primary driver of this upstream right now, performing such dynamic
          resizing NUMA-aware via multiple virtio-mem devices.
      
          Onlining all hotplugged memory to ZONE_NORMAL means we basically have
          no hotunplug guarantees. Onlining all to ZONE_MOVABLE means we can
          easily run into zone imbalances when growing a VM. We want a mixture,
          and we want as much memory as reasonable/configured in ZONE_MOVABLE.
          Details regarding zone imbalances can be found at [1].
      
       3) Memory devices consist of 1..X memory block devices, however, the
          kernel doesn't really track the relationship. Consequently, also user
          space has no idea. We want to make per-device decisions.
      
          As one example, for memory hotunplug it doesn't make sense to use a
          mixture of zones within a single DIMM: we want all MOVABLE if
          possible, otherwise all !MOVABLE, because any !MOVABLE part will easily
          block the whole DIMM from getting hotunplugged.
      
          As another example, virtio-mem operates on individual units that span
          1..X memory blocks. Similar to a DIMM, we want a unit to either be all
          MOVABLE or !MOVABLE. A "unit" can be thought of like a DIMM, however,
          all units of a virtio-mem device logically belong together and are
          managed (added/removed) by a single driver. We want as much memory of
          a virtio-mem device to be MOVABLE as possible.
      
       4) We want memory onlining to be done right from the kernel while adding
          memory, not triggered by user space via udev rules; for example, this
          is reqired for fast memory hotplug for drivers that add individual
          memory blocks, like virito-mem. We want a way to configure a policy in
          the kernel and avoid implementing advanced policies in user space.
      
      The auto-onlining support we have in the kernel is not sufficient.  All we
      have is a) online everything MOVABLE (online_movable) b) online everything
      !MOVABLE (online_kernel) c) keep zones contiguous (online).  This series
      allows configuring c) to mean instead "online movable if possible
      according to the coniguration, driven by a maximum MOVABLE:KERNEL ratio"
      -- a new onlining policy.
      
      II. Approach
      
      This series does 3 things:
      
       1) Introduces the "auto-movable" online policy that initially operates on
          individual memory blocks only. It uses a maximum MOVABLE:KERNEL ratio
          to make a decision whether a memory block will be onlined to
          ZONE_MOVABLE or not. However, in the basic form, hotplugged KERNEL
          memory does not allow for more MOVABLE memory (details in the
          patches). CMA memory is treated like MOVABLE memory.
      
       2) Introduces static (e.g., DIMM) and dynamic (e.g., virtio-mem) memory
          groups and uses group information to make decisions in the
          "auto-movable" online policy across memory blocks of a single memory
          device (modeled as memory group). More details can be found in patch
          #3 or in the DIMM example below.
      
       3) Maximizes ZONE_MOVABLE memory within dynamic memory groups, by
          allowing ZONE_NORMAL memory within a dynamic memory group to allow for
          more ZONE_MOVABLE memory within the same memory group. The target use
          case is dynamic VM resizing using virtio-mem. See the virtio-mem
          example below.
      
      I remember that the basic idea of using a ratio to implement a policy in
      the kernel was once mentioned by Vitaly Kuznetsov, but I might be wrong (I
      lost the pointer to that discussion).
      
      For me, the main use case is using it along with virtio-mem (and DIMMs /
      ppc64 dlpar where necessary) for dynamic resizing of VMs, increasing the
      amount of memory we can hotunplug reliably again if we might eventually
      hotplug a lot of memory to a VM.
      
      III. Target Usage
      
      The target usage will be:
      
       1) Linux boots with "mhp_default_online_type=offline"
      
       2) User space (e.g., systemd unit) configures memory onlining (according
          to a config file and system properties), for example:
          * Setting memory_hotplug.online_policy=auto-movable
          * Setting memory_hotplug.auto_movable_ratio=301
          * Setting memory_hotplug.auto_movable_numa_aware=true
      
       3) User space enabled auto onlining via "echo online >
          /sys/devices/system/memory/auto_online_blocks"
      
       4) User space triggers manual onlining of all already-offline memory
          blocks (go over offline memory blocks and set them to "online")
      
      IV. Example
      
      For DIMMs, hotplugging 4 GiB DIMMs to a 4 GiB VM with a configured ratio of
      301% results in the following layout:
      	Memory block 0-15:    DMA32   (early)
      	Memory block 32-47:   Normal  (early)
      	Memory block 48-79:   Movable (DIMM 0)
      	Memory block 80-111:  Movable (DIMM 1)
      	Memory block 112-143: Movable (DIMM 2)
      	Memory block 144-275: Normal  (DIMM 3)
      	Memory block 176-207: Normal  (DIMM 4)
      	... all Normal
      	(-> hotplugged Normal memory does not allow for more Movable memory)
      
      For virtio-mem, using a simple, single virtio-mem device with a 4 GiB VM
      will result in the following layout:
      	Memory block 0-15:    DMA32   (early)
      	Memory block 32-47:   Normal  (early)
      	Memory block 48-143:  Movable (virtio-mem, first 12 GiB)
      	Memory block 144:     Normal  (virtio-mem, next 128 MiB)
      	Memory block 145-147: Movable (virtio-mem, next 384 MiB)
      	Memory block 148:     Normal  (virtio-mem, next 128 MiB)
      	Memory block 149-151: Movable (virtio-mem, next 384 MiB)
      	... Normal/Movable mixture as above
      	(-> hotplugged Normal memory allows for more Movable memory within
      	    the same device)
      
      Which gives us maximum flexibility when dynamically growing/shrinking a
      VM in smaller steps.
      
      V. Doc Update
      
      I'll update the memory-hotplug.rst documentation, once the overhaul [1] is
      usptream. Until then, details can be found in patch #2.
      
      VI. Future Work
      
       1) Use memory groups for ppc64 dlpar
       2) Being able to specify a portion of (early) kernel memory that will be
          excluded from the ratio. Like "128 MiB globally/per node" are excluded.
      
          This might be helpful when starting VMs with extremely small memory
          footprint (e.g., 128 MiB) and hotplugging memory later -- not wanting
          the first hotplugged units getting onlined to ZONE_MOVABLE. One
          alternative would be a trigger to not consider ZONE_DMA memory
          in the ratio. We'll have to see if this is really rrequired.
       3) Indicate to user space that MOVABLE might be a bad idea -- especially
          relevant when memory ballooning without support for balloon compaction
          is active.
      
      This patch (of 9):
      
      For implementing a new memory onlining policy, which determines when to
      online memory blocks to ZONE_MOVABLE semi-automatically, we need the
      number of present early (boot) pages -- present pages excluding hotplugged
      pages.  Let's track these pages per zone.
      
      Pass a page instead of the zone to adjust_present_page_count(), similar as
      adjust_managed_page_count() and derive the zone from the page.
      
      It's worth noting that a memory block to be offlined/onlined is either
      completely "early" or "not early".  add_memory() and friends can only add
      complete memory blocks and we only online/offline complete (individual)
      memory blocks.
      
      Link: https://lkml.kernel.org/r/20210806124715.17090-1-david@redhat.com
      Link: https://lkml.kernel.org/r/20210806124715.17090-2-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Marek Kedzierski <mkedzier@redhat.com>
      Cc: Hui Zhu <teawater@gmail.com>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b097002
    • D
      ACPI: memhotplug: memory resources cannot be enabled yet · 35ba0cd5
      David Hildenbrand 提交于
      We allocate + initialize everything from scratch.  In case enabling the
      device fails, we free all memory resourcs.
      
      Link: https://lkml.kernel.org/r/20210712124052.26491-5-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Reviewed-by: NPankaj Gupta <pankaj.gupta@ionos.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Anton Blanchard <anton@ozlabs.org>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Laurent Dufour <ldufour@linux.ibm.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michel Lespinasse <michel@lespinasse.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pierre Morel <pmorel@linux.ibm.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Scott Cheloha <cheloha@linux.ibm.com>
      Cc: Sergei Trofimovich <slyfox@gentoo.org>
      Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35ba0cd5
    • D
      mm/memory_hotplug: remove nid parameter from remove_memory() and friends · e1c158e4
      David Hildenbrand 提交于
      There is only a single user remaining.  We can simply lookup the nid only
      used for node offlining purposes when walking our memory blocks.  We don't
      expect to remove multi-nid ranges; and if we'd ever do, we most probably
      don't care about removing multi-nid ranges that actually result in empty
      nodes.
      
      If ever required, we can detect the "multi-nid" scenario and simply try
      offlining all online nodes.
      
      Link: https://lkml.kernel.org/r/20210712124052.26491-4-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: Laurent Dufour <ldufour@linux.ibm.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Scott Cheloha <cheloha@linux.ibm.com>
      Cc: Anton Blanchard <anton@ozlabs.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michel Lespinasse <michel@lespinasse.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta@ionos.com>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pierre Morel <pmorel@linux.ibm.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Sergei Trofimovich <slyfox@gentoo.org>
      Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e1c158e4
    • M
      mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE · 859a85dd
      Mike Rapoport 提交于
      Patch series "mm: remove pfn_valid_within() and CONFIG_HOLES_IN_ZONE".
      
      After recent updates to freeing unused parts of the memory map, no
      architecture can have holes in the memory map within a pageblock.  This
      makes pfn_valid_within() check and CONFIG_HOLES_IN_ZONE configuration
      option redundant.
      
      The first patch removes them both in a mechanical way and the second patch
      simplifies memory_hotplug::test_pages_in_a_zone() that had
      pfn_valid_within() surrounded by more logic than simple if.
      
      This patch (of 2):
      
      After recent changes in freeing of the unused parts of the memory map and
      rework of pfn_valid() in arm and arm64 there are no architectures that can
      have holes in the memory map within a pageblock and so nothing can enable
      CONFIG_HOLES_IN_ZONE which guards non trivial implementation of
      pfn_valid_within().
      
      With that, pfn_valid_within() is always hardwired to 1 and can be
      completely removed.
      
      Remove calls to pfn_valid_within() and CONFIG_HOLES_IN_ZONE.
      
      Link: https://lkml.kernel.org/r/20210713080035.7464-1-rppt@kernel.org
      Link: https://lkml.kernel.org/r/20210713080035.7464-2-rppt@kernel.orgSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      859a85dd
    • T
      fbmem: don't allow too huge resolutions · 8c28051c
      Tetsuo Handa 提交于
      syzbot is reporting page fault at vga16fb_fillrect() [1], for
      vga16fb_check_var() is failing to detect multiplication overflow.
      
        if (vxres * vyres > maxmem) {
          vyres = maxmem / vxres;
          if (vyres < yres)
            return -ENOMEM;
        }
      
      Since no module would accept too huge resolutions where multiplication
      overflow happens, let's reject in the common path.
      
      Link: https://syzkaller.appspot.com/bug?extid=04168c8063cfdde1db5e [1]
      Reported-by: Nsyzbot <syzbot+04168c8063cfdde1db5e@syzkaller.appspotmail.com>
      Debugged-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/185175d6-227a-7b55-433d-b070929b262c@i-love.sakura.ne.jp
      8c28051c
  5. 08 9月, 2021 3 次提交