1. 09 9月, 2021 1 次提交
    • D
      mm/memory_hotplug: remove nid parameter from remove_memory() and friends · e1c158e4
      David Hildenbrand 提交于
      There is only a single user remaining.  We can simply lookup the nid only
      used for node offlining purposes when walking our memory blocks.  We don't
      expect to remove multi-nid ranges; and if we'd ever do, we most probably
      don't care about removing multi-nid ranges that actually result in empty
      nodes.
      
      If ever required, we can detect the "multi-nid" scenario and simply try
      offlining all online nodes.
      
      Link: https://lkml.kernel.org/r/20210712124052.26491-4-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: Laurent Dufour <ldufour@linux.ibm.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Scott Cheloha <cheloha@linux.ibm.com>
      Cc: Anton Blanchard <anton@ozlabs.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jia He <justin.he@arm.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michel Lespinasse <michel@lespinasse.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pankaj Gupta <pankaj.gupta@ionos.com>
      Cc: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pierre Morel <pmorel@linux.ibm.com>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Sergei Trofimovich <slyfox@gentoo.org>
      Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@linux.alibaba.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e1c158e4
  2. 24 6月, 2021 3 次提交
  3. 23 5月, 2021 5 次提交
  4. 15 12月, 2020 1 次提交
    • L
      powerpc/pseries/memhotplug: Quieten some DLPAR operations · 20e9de85
      Laurent Dufour 提交于
      When attempting to remove by index a set of LMBs a lot of messages are
      displayed on the console, even when everything goes fine:
      
        pseries-hotplug-mem: Attempting to hot-remove LMB, drc index 8000002d
        Offlined Pages 4096
        pseries-hotplug-mem: Memory at 2d0000000 was hot-removed
      
      The 2 messages prefixed by "pseries-hotplug-mem" are not really
      helpful for the end user, they should be debug outputs.
      
      In case of error, because some of the LMB's pages couldn't be
      offlined, the following is displayed on the console:
      
        pseries-hotplug-mem: Attempting to hot-remove LMB, drc index 8000003e
        pseries-hotplug-mem: Failed to hot-remove memory at 3e0000000
        dlpar: Could not handle DLPAR request "memory remove index 0x8000003e"
      
      Again, the 2 messages prefixed by "pseries-hotplug-mem" are useless,
      and the generic DLPAR prefixed message should be enough.
      
      These 2 first changes are mainly triggered by the changes introduced
      in drmgr:
        https://groups.google.com/g/powerpc-utils-devel/c/Y6ef4NB3EzM/m/9cu5JHRxAQAJ
      
      Also, when adding a bunch of LMBs, a message is displayed in the console per LMB
      like these ones:
        pseries-hotplug-mem: Memory at 7e0000000 (drc index 8000007e) was hot-added
        pseries-hotplug-mem: Memory at 7f0000000 (drc index 8000007f) was hot-added
        pseries-hotplug-mem: Memory at 800000000 (drc index 80000080) was hot-added
        pseries-hotplug-mem: Memory at 810000000 (drc index 80000081) was hot-added
      
      When adding 1TB of memory and LMB size is 256MB, this leads to 4096
      messages to be displayed on the console. These messages are not really
      helpful for the end user, so moving them to the DEBUG level.
      Signed-off-by: NLaurent Dufour <ldufour@linux.ibm.com>
      [mpe: Tweak change log wording]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20201211145954.90143-1-ldufour@linux.ibm.com
      20e9de85
  5. 17 10月, 2020 1 次提交
    • D
      mm/memory_hotplug: prepare passing flags to add_memory() and friends · b6117199
      David Hildenbrand 提交于
      We soon want to pass flags, e.g., to mark added System RAM resources.
      mergeable.  Prepare for that.
      
      This patch is based on a similar patch by Oscar Salvador:
      
      https://lkml.kernel.org/r/20190625075227.15193-3-osalvador@suse.deSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: Juergen Gross <jgross@suse.com> # Xen related part
      Reviewed-by: NPankaj Gupta <pankaj.gupta.linux@gmail.com>
      Acked-by: NWei Liu <wei.liu@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Wei Liu <wei.liu@kernel.org>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Cc: "Oliver O'Halloran" <oohall@gmail.com>
      Cc: Pingfan Liu <kernelfans@gmail.com>
      Cc: Nathan Lynch <nathanl@linux.ibm.com>
      Cc: Libor Pechacek <lpechacek@suse.cz>
      Cc: Anton Blanchard <anton@ozlabs.org>
      Cc: Leonardo Bras <leobras.c@gmail.com>
      Cc: Ard Biesheuvel <ardb@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Julien Grall <julien@xen.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Roger Pau Monné <roger.pau@citrix.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Yang <richardw.yang@linux.intel.com>
      Link: https://lkml.kernel.org/r/20200911103459.10306-5-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6117199
  6. 08 10月, 2020 2 次提交
  7. 06 10月, 2020 1 次提交
    • S
      pseries/hotplug-memory: hot-add: skip redundant LMB lookup · 72cdd117
      Scott Cheloha 提交于
      During memory hot-add, dlpar_add_lmb() calls memory_add_physaddr_to_nid()
      to determine which node id (nid) to use when later calling __add_memory().
      
      This is wasteful.  On pseries, memory_add_physaddr_to_nid() finds an
      appropriate nid for a given address by looking up the LMB containing the
      address and then passing that LMB to of_drconf_to_nid_single() to get the
      nid.  In dlpar_add_lmb() we get this address from the LMB itself.
      
      In short, we have a pointer to an LMB and then we are searching for
      that LMB *again* in order to find its nid.
      
      If we call of_drconf_to_nid_single() directly from dlpar_add_lmb() we
      can skip the redundant lookup.  The only error handling we need to
      duplicate from memory_add_physaddr_to_nid() is the fallback to the
      default nid when drconf_to_nid_single() returns -1 (NUMA_NO_NODE) or
      an invalid nid.
      
      Skipping the extra lookup makes hot-add operations faster, especially
      on machines with many LMBs.
      
      Consider an LPAR with 126976 LMBs.  In one test, hot-adding 126000
      LMBs on an upatched kernel took ~3.5 hours while a patched kernel
      completed the same operation in ~2 hours:
      
      Unpatched (12450 seconds):
      Sep  9 04:06:31 ltc-brazos1 drmgr[810169]: drmgr: -c mem -a -q 126000
      Sep  9 04:06:31 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s)
      [...]
      Sep  9 07:34:01 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added
      
      Patched (7065 seconds):
      Sep  8 21:49:57 ltc-brazos1 drmgr[877703]: drmgr: -c mem -a -q 126000
      Sep  8 21:49:57 ltc-brazos1 kernel: pseries-hotplug-mem: Attempting to hot-add 126000 LMB(s)
      [...]
      Sep  8 23:27:42 ltc-brazos1 kernel: pseries-hotplug-mem: Memory at 20000000 (drc index 80000002) was hot-added
      
      It should be noted that the speedup grows more substantial when
      hot-adding LMBs at the end of the drconf range.  This is because we
      are skipping a linear LMB search.
      
      To see the distinction, consider smaller hot-add test on the same
      LPAR.  A perf-stat run with 10 iterations showed that hot-adding 4096
      LMBs completed less than 1 second faster on a patched kernel:
      
      Unpatched:
       Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs):
      
              104,753.42 msec task-clock                #    0.992 CPUs utilized            ( +-  0.55% )
                   4,708      context-switches          #    0.045 K/sec                    ( +-  0.69% )
                   2,444      cpu-migrations            #    0.023 K/sec                    ( +-  1.25% )
                     394      page-faults               #    0.004 K/sec                    ( +-  0.22% )
         445,902,503,057      cycles                    #    4.257 GHz                      ( +-  0.55% )  (66.67%)
           8,558,376,740      stalled-cycles-frontend   #    1.92% frontend cycles idle     ( +-  0.88% )  (49.99%)
         300,346,181,651      stalled-cycles-backend    #   67.36% backend cycles idle      ( +-  0.76% )  (50.01%)
         258,091,488,691      instructions              #    0.58  insn per cycle
                                                        #    1.16  stalled cycles per insn  ( +-  0.22% )  (66.67%)
          70,568,169,256      branches                  #  673.660 M/sec                    ( +-  0.17% )  (50.01%)
           3,100,725,426      branch-misses             #    4.39% of all branches          ( +-  0.20% )  (49.99%)
      
                 105.583 +- 0.589 seconds time elapsed  ( +-  0.56% )
      
      Patched:
       Performance counter stats for 'drmgr -c mem -a -q 4096' (10 runs):
      
              104,055.69 msec task-clock                #    0.993 CPUs utilized            ( +-  0.32% )
                   4,606      context-switches          #    0.044 K/sec                    ( +-  0.20% )
                   2,463      cpu-migrations            #    0.024 K/sec                    ( +-  0.93% )
                     394      page-faults               #    0.004 K/sec                    ( +-  0.25% )
         442,951,129,921      cycles                    #    4.257 GHz                      ( +-  0.32% )  (66.66%)
           8,710,413,329      stalled-cycles-frontend   #    1.97% frontend cycles idle     ( +-  0.47% )  (50.06%)
         299,656,905,836      stalled-cycles-backend    #   67.65% backend cycles idle      ( +-  0.39% )  (50.02%)
         252,731,168,193      instructions              #    0.57  insn per cycle
                                                        #    1.19  stalled cycles per insn  ( +-  0.20% )  (66.66%)
          68,902,851,121      branches                  #  662.173 M/sec                    ( +-  0.13% )  (49.94%)
           3,100,242,882      branch-misses             #    4.50% of all branches          ( +-  0.15% )  (49.98%)
      
                 104.829 +- 0.325 seconds time elapsed  ( +-  0.31% )
      
      This is consistent.  An add-by-count hot-add operation adds LMBs
      greedily, so LMBs near the start of the drconf range are considered
      first.  On an otherwise idle LPAR with so many LMBs we would expect to
      find the LMBs we need near the start of the drconf range, hence the
      smaller speedup.
      Signed-off-by: NScott Cheloha <cheloha@linux.ibm.com>
      Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200916145122.3408129-1-cheloha@linux.ibm.com
      72cdd117
  8. 02 9月, 2020 1 次提交
    • S
      pseries/drmem: don't cache node id in drmem_lmb struct · e5e179aa
      Scott Cheloha 提交于
      At memory hot-remove time we can retrieve an LMB's nid from its
      corresponding memory_block.  There is no need to store the nid
      in multiple locations.
      
      Note that lmb_to_memblock() uses find_memory_block() to get the
      corresponding memory_block.  As find_memory_block() runs in sub-linear
      time this approach is negligibly slower than what we do at present.
      
      In exchange for this lookup at hot-remove time we no longer need to
      call memory_add_physaddr_to_nid() during drmem_init() for each LMB.
      On powerpc, memory_add_physaddr_to_nid() is a linear search, so this
      spares us an O(n^2) initialization during boot.
      
      On systems with many LMBs that initialization overhead is palpable and
      disruptive.  For example, on a box with 249854 LMBs we're seeing
      drmem_init() take upwards of 30 seconds to complete:
      
      [   53.721639] drmem: initializing drmem v2
      [   80.604346] watchdog: BUG: soft lockup - CPU#65 stuck for 23s! [swapper/0:1]
      [   80.604377] Modules linked in:
      [   80.604389] CPU: 65 PID: 1 Comm: swapper/0 Not tainted 5.6.0-rc2+ #4
      [   80.604397] NIP:  c0000000000a4980 LR: c0000000000a4940 CTR: 0000000000000000
      [   80.604407] REGS: c0002dbff8493830 TRAP: 0901   Not tainted  (5.6.0-rc2+)
      [   80.604412] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 44000248  XER: 0000000d
      [   80.604431] CFAR: c0000000000a4a38 IRQMASK: 0
      [   80.604431] GPR00: c0000000000a4940 c0002dbff8493ac0 c000000001904400 c0003cfffffede30
      [   80.604431] GPR04: 0000000000000000 c000000000f4095a 000000000000002f 0000000010000000
      [   80.604431] GPR08: c0000bf7ecdb7fb8 c0000bf7ecc2d3c8 0000000000000008 c00c0002fdfb2001
      [   80.604431] GPR12: 0000000000000000 c00000001e8ec200
      [   80.604477] NIP [c0000000000a4980] hot_add_scn_to_nid+0xa0/0x3e0
      [   80.604486] LR [c0000000000a4940] hot_add_scn_to_nid+0x60/0x3e0
      [   80.604492] Call Trace:
      [   80.604498] [c0002dbff8493ac0] [c0000000000a4940] hot_add_scn_to_nid+0x60/0x3e0 (unreliable)
      [   80.604509] [c0002dbff8493b20] [c000000000087c10] memory_add_physaddr_to_nid+0x20/0x60
      [   80.604521] [c0002dbff8493b40] [c0000000010d4880] drmem_init+0x25c/0x2f0
      [   80.604530] [c0002dbff8493c10] [c000000000010154] do_one_initcall+0x64/0x2c0
      [   80.604540] [c0002dbff8493ce0] [c0000000010c4aa0] kernel_init_freeable+0x2d8/0x3a0
      [   80.604550] [c0002dbff8493db0] [c000000000010824] kernel_init+0x2c/0x148
      [   80.604560] [c0002dbff8493e20] [c00000000000b648] ret_from_kernel_thread+0x5c/0x74
      [   80.604567] Instruction dump:
      [   80.604574] 392918e8 e9490000 e90a000a e92a0000 80ea000c 1d080018 3908ffe8 7d094214
      [   80.604586] 7fa94040 419d00dc e9490010 714a0088 <2faa0008> 409e00ac e9490000 7fbe5040
      [   89.047390] drmem: 249854 LMB(s)
      
      With a patched kernel on the same machine we're no longer seeing the
      soft lockup.  drmem_init() now completes in negligible time, even when
      the LMB count is large.
      
      Fixes: b2d3b5ee ("powerpc/pseries: Track LMB nid instead of using device tree")
      Signed-off-by: NScott Cheloha <cheloha@linux.ibm.com>
      Reviewed-by: NNathan Lynch <nathanl@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200811015115.63677-1-cheloha@linux.ibm.com
      e5e179aa
  9. 16 7月, 2020 3 次提交
  10. 05 6月, 2020 1 次提交
    • D
      powerpc/pseries/hotplug-memory: stop checking is_mem_section_removable() · ef1b51f7
      David Hildenbrand 提交于
      In commit 53cdc1cb ("drivers/base/memory.c: indicate all memory blocks
      as removable"), the user space interface to compute whether a memory block
      can be offlined (exposed via /sys/devices/system/memory/memoryX/removable)
      has effectively been deprecated.  We want to remove the leftovers of the
      kernel implementation.
      
      When offlining a memory block (mm/memory_hotplug.c:__offline_pages()),
      we'll start by:
       1. Testing if it contains any holes, and reject if so
       2. Testing if pages belong to different zones, and reject if so
       3. Isolating the page range, checking if it contains any unmovable pages
      
      Using is_mem_section_removable() before trying to offline is not only
      racy, it can easily result in false positives/negatives.  Let's stop
      manually checking is_mem_section_removable(), and let device_offline()
      handle it completely instead.  We can remove the racy
      is_mem_section_removable() implementation next.
      
      We now take more locks (e.g., memory hotplug lock when offlining and the
      zone lock when isolating), but maybe we should optimize that
      implementation instead if this ever becomes a real problem (after all,
      memory unplug is already an expensive operation).  We started using
      is_mem_section_removable() in commit 51925fb3 ("powerpc/pseries:
      Implement memory hotplug remove in the kernel"), with the initial
      hotremove support of lmbs.
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Link: http://lkml.kernel.org/r/20200407135416.24093-2-david@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef1b51f7
  11. 03 4月, 2020 1 次提交
  12. 19 2月, 2020 1 次提交
    • L
      powerpc/pseries: Avoid NULL pointer dereference when drmem is unavailable · a83836db
      Libor Pechacek 提交于
      In guests without hotplugagble memory drmem structure is only zero
      initialized. Trying to manipulate DLPAR parameters results in a crash.
      
        $ echo "memory add count 1" > /sys/kernel/dlpar
        Oops: Kernel access of bad area, sig: 11 [#1]
        LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
        ...
        NIP:  c0000000000ff294 LR: c0000000000ff248 CTR: 0000000000000000
        REGS: c0000000fb9d3880 TRAP: 0300   Tainted: G            E      (5.5.0-rc6-2-default)
        MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28242428  XER: 20000000
        CFAR: c0000000009a6c10 DAR: 0000000000000010 DSISR: 40000000 IRQMASK: 0
        ...
        NIP dlpar_memory+0x6e4/0xd00
        LR  dlpar_memory+0x698/0xd00
        Call Trace:
          dlpar_memory+0x698/0xd00 (unreliable)
          handle_dlpar_errorlog+0xc0/0x190
          dlpar_store+0x198/0x4a0
          kobj_attr_store+0x30/0x50
          sysfs_kf_write+0x64/0x90
          kernfs_fop_write+0x1b0/0x290
          __vfs_write+0x3c/0x70
          vfs_write+0xd0/0x260
          ksys_write+0xdc/0x130
          system_call+0x5c/0x68
      
      Taking closer look at the code, I can see that for_each_drmem_lmb is a
      macro expanding into `for (lmb = &drmem_info->lmbs[0]; lmb <=
      &drmem_info->lmbs[drmem_info->n_lmbs - 1]; lmb++)`. When drmem_info->lmbs
      is NULL, the loop would iterate through the whole address range if it
      weren't stopped by the NULL pointer dereference on the next line.
      
      This patch aligns for_each_drmem_lmb and for_each_drmem_lmb_in_range
      macro behavior with the common C semantics, where the end marker does
      not belong to the scanned range, and alters get_lmb_range() semantics.
      As a side effect, the wraparound observed in the crash is prevented.
      
      Fixes: 6c6ea537 ("powerpc/mm: Separate ibm, dynamic-memory data from DT format")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NLibor Pechacek <lpechacek@suse.cz>
      Signed-off-by: NMichal Suchanek <msuchanek@suse.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200131132829.10281-1-msuchanek@suse.de
      a83836db
  13. 14 1月, 2020 1 次提交
  14. 13 11月, 2019 1 次提交
  15. 05 8月, 2019 1 次提交
  16. 14 6月, 2019 1 次提交
    • N
      powerpc/pseries: Fix oops in hotplug memory notifier · 0aa82c48
      Nathan Lynch 提交于
      During post-migration device tree updates, we can oops in
      pseries_update_drconf_memory() if the source device tree has an
      ibm,dynamic-memory-v2 property and the destination has a
      ibm,dynamic_memory (v1) property. The notifier processes an "update"
      for the ibm,dynamic-memory property but it's really an add in this
      scenario. So make sure the old property object is there before
      dereferencing it.
      
      Fixes: 2b31e3ae ("powerpc/drmem: Add support for ibm, dynamic-memory-v2 property")
      Cc: stable@vger.kernel.org # v4.16+
      Signed-off-by: NNathan Lynch <nathanl@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0aa82c48
  17. 31 5月, 2019 1 次提交
  18. 29 4月, 2019 1 次提交
    • N
      powerpc/pseries: Track LMB nid instead of using device tree · b2d3b5ee
      Nathan Fontenot 提交于
      When removing memory we need to remove the memory from the node
      it was added to instead of looking up the node it should be in
      in the device tree.
      
      During testing we have seen scenarios where the affinity for a
      LMB changes due to a partition migration or PRRN event. In these
      cases the node the LMB exists in may not match the node the device
      tree indicates it belongs in. This can lead to a system crash
      when trying to DLPAR remove the LMB after a migration or PRRN
      event. The current code looks up the node in the device tree to
      remove the LMB from, the crash occurs when we try to offline this
      node and it does not have any data, i.e. node_data[nid] == NULL.
      
      36:mon> e
      cpu 0x36: Vector: 300 (Data Access) at [c0000001828b7810]
          pc: c00000000036d08c: try_offline_node+0x2c/0x1b0
          lr: c0000000003a14ec: remove_memory+0xbc/0x110
          sp: c0000001828b7a90
         msr: 800000000280b033
         dar: 9a28
       dsisr: 40000000
        current = 0xc0000006329c4c80
        paca    = 0xc000000007a55200   softe: 0        irq_happened: 0x01
          pid   = 76926, comm = kworker/u320:3
      
      36:mon> t
      [link register   ] c0000000003a14ec remove_memory+0xbc/0x110
      [c0000001828b7a90] c00000000006a1cc arch_remove_memory+0x9c/0xd0 (unreliable)
      [c0000001828b7ad0] c0000000003a14e0 remove_memory+0xb0/0x110
      [c0000001828b7b20] c0000000000c7db4 dlpar_remove_lmb+0x94/0x160
      [c0000001828b7b60] c0000000000c8ef8 dlpar_memory+0x7e8/0xd10
      [c0000001828b7bf0] c0000000000bf828 handle_dlpar_errorlog+0xf8/0x160
      [c0000001828b7c60] c0000000000bf8cc pseries_hp_work_fn+0x3c/0xa0
      [c0000001828b7c90] c000000000128cd8 process_one_work+0x298/0x5a0
      [c0000001828b7d20] c000000000129068 worker_thread+0x88/0x620
      [c0000001828b7dc0] c00000000013223c kthread+0x1ac/0x1c0
      [c0000001828b7e30] c00000000000b45c ret_from_kernel_thread+0x5c/0x80
      
      To resolve this we need to track the node a LMB belongs to when
      it is added to the system so we can remove it from that node instead
      of the node that the device tree indicates it should belong to.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      b2d3b5ee
  19. 22 12月, 2018 1 次提交
  20. 21 12月, 2018 1 次提交
    • M
      powerpc/fadump: Do not allow hot-remove memory from fadump reserved area. · 0db6896f
      Mahesh Salgaonkar 提交于
      For fadump to work successfully there should not be any holes in reserved
      memory ranges where kernel has asked firmware to move the content of old
      kernel memory in event of crash. Now that fadump uses CMA for reserved
      area, this memory area is now not protected from hot-remove operations
      unless it is cma allocated. Hence, fadump service can fail to re-register
      after the hot-remove operation, if hot-removed memory belongs to fadump
      reserved region. To avoid this make sure that memory from fadump reserved
      area is not hot-removable if fadump is registered.
      
      However, if user still wants to remove that memory, he can do so by
      manually stopping fadump service before hot-remove operation.
      Signed-off-by: NMahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      0db6896f
  21. 26 11月, 2018 1 次提交
  22. 31 10月, 2018 2 次提交
    • D
      mm/memory_hotplug: make add_memory() take the device_hotplug_lock · 8df1d0e4
      David Hildenbrand 提交于
      add_memory() currently does not take the device_hotplug_lock, however
      is aleady called under the lock from
      	arch/powerpc/platforms/pseries/hotplug-memory.c
      	drivers/acpi/acpi_memhotplug.c
      to synchronize against CPU hot-remove and similar.
      
      In general, we should hold the device_hotplug_lock when adding memory to
      synchronize against online/offline request (e.g.  from user space) - which
      already resulted in lock inversions due to device_lock() and
      mem_hotplug_lock - see 30467e0b ("mm, hotplug: fix concurrent memory
      hot-add deadlock").  add_memory()/add_memory_resource() will create memory
      block devices, so this really feels like the right thing to do.
      
      Holding the device_hotplug_lock makes sure that a memory block device
      can really only be accessed (e.g. via .online/.state) from user space,
      once the memory has been fully added to the system.
      
      The lock is not held yet in
      	drivers/xen/balloon.c
      	arch/powerpc/platforms/powernv/memtrace.c
      	drivers/s390/char/sclp_cmd.c
      	drivers/hv/hv_balloon.c
      So, let's either use the locked variants or take the lock.
      
      Don't export add_memory_resource(), as it once was exported to be used by
      XEN, which is never built as a module.  If somebody requires it, we also
      have to export a locked variant (as device_hotplug_lock is never
      exported).
      
      Link: http://lkml.kernel.org/r/20180925091457.28651-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NRashmica Gupta <rashmica.g@gmail.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: John Allen <jallen@linux.vnet.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8df1d0e4
    • D
      mm/memory_hotplug: make remove_memory() take the device_hotplug_lock · d15e5926
      David Hildenbrand 提交于
      Patch series "mm: online/offline_pages called w.o. mem_hotplug_lock", v3.
      
      Reading through the code and studying how mem_hotplug_lock is to be used,
      I noticed that there are two places where we can end up calling
      device_online()/device_offline() - online_pages()/offline_pages() without
      the mem_hotplug_lock.  And there are other places where we call
      device_online()/device_offline() without the device_hotplug_lock.
      
      While e.g.
      	echo "online" > /sys/devices/system/memory/memory9/state
      is fine, e.g.
      	echo 1 > /sys/devices/system/memory/memory9/online
      Will not take the mem_hotplug_lock. However the device_lock() and
      device_hotplug_lock.
      
      E.g.  via memory_probe_store(), we can end up calling
      add_memory()->online_pages() without the device_hotplug_lock.  So we can
      have concurrent callers in online_pages().  We e.g.  touch in
      online_pages() basically unprotected zone->present_pages then.
      
      Looks like there is a longer history to that (see Patch #2 for details),
      and fixing it to work the way it was intended is not really possible.  We
      would e.g.  have to take the mem_hotplug_lock in device/base/core.c, which
      sounds wrong.
      
      Summary: We had a lock inversion on mem_hotplug_lock and device_lock().
      More details can be found in patch 3 and patch 6.
      
      I propose the general rules (documentation added in patch 6):
      
      1. add_memory/add_memory_resource() must only be called with
         device_hotplug_lock.
      2. remove_memory() must only be called with device_hotplug_lock. This is
         already documented and holds for all callers.
      3. device_online()/device_offline() must only be called with
         device_hotplug_lock. This is already documented and true for now in core
         code. Other callers (related to memory hotplug) have to be fixed up.
      4. mem_hotplug_lock is taken inside of add_memory/remove_memory/
         online_pages/offline_pages.
      
      To me, this looks way cleaner than what we have right now (and easier to
      verify).  And looking at the documentation of remove_memory, using
      lock_device_hotplug also for add_memory() feels natural.
      
      This patch (of 6):
      
      remove_memory() is exported right now but requires the
      device_hotplug_lock, which is not exported.  So let's provide a variant
      that takes the lock and only export that one.
      
      The lock is already held in
      	arch/powerpc/platforms/pseries/hotplug-memory.c
      	drivers/acpi/acpi_memhotplug.c
      	arch/powerpc/platforms/powernv/memtrace.c
      
      Apart from that, there are not other users in the tree.
      
      Link: http://lkml.kernel.org/r/20180925091457.28651-2-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
      Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Reviewed-by: NRashmica Gupta <rashmica.g@gmail.com>
      Reviewed-by: NOscar Salvador <osalvador@suse.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Rashmica Gupta <rashmica.g@gmail.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: John Allen <jallen@linux.vnet.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d15e5926
  23. 13 10月, 2018 1 次提交
  24. 19 9月, 2018 1 次提交
    • N
      powerpc/pseries/memory-hotplug: Only update DT once per memory DLPAR request · 063b8b12
      Nathan Fontenot 提交于
      The updates to powerpc numa and memory hotplug code now use the
      in-kernel LMB array instead of the device tree. This change allows the
      pseries memory DLPAR code to only update the device tree once after
      successfully handling a DLPAR request.
      
      Prior to the in-kernel LMB array, the numa code looked up the affinity
      for memory being added in the device tree, the code now looks this up
      in the LMB array. This change means the memory hotplug code can just
      update the affinity for an LMB in the LMB array instead of updating
      the device tree.
      
      This also provides a savings in kernel memory. When updating the
      device tree old properties are never free'ed since there is no
      usecount on properties. This behavior leads to a new copy of the
      property being allocated every time a LMB is added or removed (i.e. a
      request to add 100 LMBs creates 100 new copies of the property). With
      this update only a single new property is created when a DLPAR request
      completes successfully.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      063b8b12
  25. 16 1月, 2018 2 次提交
    • N
      powerpc: Move of_drconf_cell struct to asm/drmem.h · 2c777215
      Nathan Fontenot 提交于
      Now that the powerpc code parses dynamic reconfiguration memory
      LMB information from the LMB array and not the device tree
      directly we can move the of_drconf_cell struct to drmem.h where
      it fits better.
      
      In addition, the struct is renamed to of_drconf_cell_v1 in
      anticipation of upcoming support for version 2 of the dynamic
      reconfiguration property and the members are typed as __be*
      values to reflect how they exist in the device tree.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2c777215
    • N
      powerpc/pseries: Update memory hotplug code to use drmem LMB array · 6195a500
      Nathan Fontenot 提交于
      Update the pseries memory hotplug code to use the newly added
      dynamic reconfiguration LMB array. Doing this is required for the
      upcoming support of version 2 of the dynamic reconfiguration
      device tree property.
      
      In addition, making this change cleans up the code that parses the
      LMB information as we no longer need to worry about device tree
      format. This allows us to discard one of the first steps on memory
      hotplug where we make a working copy of the device tree property and
      convert the entire property to cpu format. Instead we just use the
      LMB array directly while holding the memory hotplug lock.
      
      This patch also moves the updating of the device tree property to
      powerpc/mm/drmem.c. This allows to the hotplug code to work without
      needing to know the device tree format and provides a single
      routine for updating the device tree property. This new routine
      will handle determination of the proper device tree format and
      generate a properly formatted device tree property.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      6195a500
  26. 31 8月, 2017 1 次提交
  27. 10 8月, 2017 1 次提交
    • N
      powerpc/pseries: Check memory device state before onlining/offlining · 1a367063
      Nathan Fontenot 提交于
      When DLPAR adding or removing memory we need to check the device
      offline status before trying to online/offline the memory. This is
      needed because calls to device_online() and device_offline() will
      return non-zero for memory that is already online and offline
      respectively.
      
      This update resolves two scenarios. First, for a kernel built with
      auto-online memory enabled (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y),
      memory will be onlined as part of calls to add_memory(). After adding
      the memory the pseries DLPAR code tries to online it and fails since
      the memory is already online. The DLPAR code then tries to remove the
      memory which produces the oops message below because the memory is not
      offline.
      
      The second scenario occurs when removing memory that is already
      offline, i.e. marking memory offline (via sysfs) and then trying to
      remove that memory. This doesn't work because offlining the already
      offline memory does not succeed and the DLPAR code then fails the
      DLPAR remove operation.
      
      The fix for both scenarios is to check the device.offline status
      before making the calls to device_online() or device_offline().
      
        kernel BUG at mm/memory_hotplug.c:1936!
        ...
        NIP [c0000000002ca428] .remove_memory+0xb8/0xc0
        LR [c0000000002ca3cc] .remove_memory+0x5c/0xc0
        Call Trace:
          .remove_memory+0x5c/0xc0 (unreliable)
          .dlpar_add_lmb+0x384/0x400
          .dlpar_memory+0x5dc/0xca0
          .handle_dlpar_errorlog+0x74/0xe0
          .pseries_hp_work_fn+0x2c/0x90
          .process_one_work+0x17c/0x460
          .worker_thread+0x88/0x500
          .kthread+0x15c/0x1a0
          .ret_from_kernel_thread+0x58/0xc0
      
      Fixes: 943db62c ("powerpc/pseries: Revert 'Auto-online hotplugged memory'")
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      [mpe: Use bool, add explicit rc=0 case, change log typos & formatting]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      1a367063
  28. 28 6月, 2017 1 次提交
  29. 01 6月, 2017 1 次提交