1. 03 2月, 2020 20 次提交
  2. 19 1月, 2020 10 次提交
  3. 17 1月, 2020 1 次提交
    • X
      alinux: mm: kidled: fix frame-larger-than build warning · 042cecca
      Xu Yu 提交于
      This fix the following build warning:
      
      mm/memcontrol.c: In function 'mem_cgroup_idle_page_stats_show':
      mm/memcontrol.c:3866:1: warning: the frame size of 2160 bytes is larger than 2048 bytes [-Wframe-larger-than=]
      
      The root cause is that "mem_cgroup_idle_page_stats_show" has two
      "struct idle_page_stats" variables, each of which is 1056 bytes in
      size, on the stack, thus exceeding the 2048 max frame size.
      
      This fix the build warning by dynamically allocating memory to these two
      variables with kmalloc.
      
      Fixes: f55ac551 ("alinux: mm: Support kidled")
      Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
      Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
      042cecca
  4. 15 1月, 2020 9 次提交
    • A
      mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections · 78c5482c
      Alexander Duyck 提交于
      commit 0e56acae4b4dd4a9fbe897854ab83a109e2a9e11 upstream.
      
      Add yet another iterator, for_each_free_mem_range_in_zone_from, and then
      use it to support initializing and freeing pages in groups no larger than
      MAX_ORDER_NR_PAGES.  By doing this we can greatly improve the cache
      locality of the pages while we do several loops over them in the init and
      freeing process.
      
      We are able to tighten the loops further as a result of the "from"
      iterator as we can perform the initial checks for first_init_pfn in our
      first call to the iterator, and continue without the need for those checks
      via the "from" iterator.  I have added this functionality in the function
      called deferred_init_mem_pfn_range_in_zone that primes the iterator and
      causes us to exit if we encounter any failure.
      
      On my x86_64 test system with 384GB of memory per node I saw a reduction
      in initialization time from 1.85s to 1.38s as a result of this patch.
      
      Link: http://lkml.kernel.org/r/20190405221231.12227.85836.stgit@localhost.localdomainSigned-off-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: <yi.z.zhang@linux.intel.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      78c5482c
    • A
      mm: implement new zone specific memblock iterator · cff9acae
      Alexander Duyck 提交于
      commit 837566e7e08e3f89444166444836a8a49b9f9322 upstream.
      
      Introduce a new iterator for_each_free_mem_pfn_range_in_zone.
      
      This iterator will take care of making sure a given memory range provided
      is in fact contained within a zone.  It takes are of all the bounds
      checking we were doing in deferred_grow_zone, and deferred_init_memmap.
      In addition it should help to speed up the search a bit by iterating until
      the end of a range is greater than the start of the zone pfn range, and
      will exit completely if the start is beyond the end of the zone.
      
      Link: http://lkml.kernel.org/r/20190405221225.12227.22573.stgit@localhost.localdomainSigned-off-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Tatashin <pavel.tatashin@microsoft.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <yi.z.zhang@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      cff9acae
    • A
      mm: drop meminit_pfn_in_nid as it is redundant · 0265f3b8
      Alexander Duyck 提交于
      commit 56ec43d8b02719402c9fcf984feb52ec2300f8a5 upstream.
      
      As best as I can tell the meminit_pfn_in_nid call is completely redundant.
      The deferred memory initialization is already making use of
      for_each_free_mem_range which in turn will call into __next_mem_range
      which will only return a memory range if it matches the node ID provided
      assuming it is not NUMA_NO_NODE.
      
      I am operating on the assumption that there are no zones or pgdata_t
      structures that have a NUMA node of NUMA_NO_NODE associated with them.  If
      that is the case then __next_mem_range will never return a memory range
      that doesn't match the zone's node ID and as such the check is redundant.
      
      So one piece I would like to verify on this is if this works for ia64.
      Technically it was using a different approach to get the node ID, but it
      seems to have the node ID also encoded into the memblock.  So I am
      assuming this is okay, but would like to get confirmation on that.
      
      On my x86_64 test system with 384GB of memory per node I saw a reduction
      in initialization time from 2.80s to 1.85s as a result of this patch.
      
      Link: http://lkml.kernel.org/r/20190405221219.12227.93957.stgit@localhost.localdomainSigned-off-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <yi.z.zhang@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      0265f3b8
    • A
      mm: use mm_zero_struct_page from SPARC on all 64b architectures · e258d861
      Alexander Duyck 提交于
      commit 5470dea49f5382257c242ac617d908267727f1a8 upstream.
      
      Patch series "Deferred page init improvements", v7.
      
      This patchset is essentially a refactor of the page initialization logic
      that is meant to provide for better code reuse while providing a
      significant improvement in deferred page initialization performance.
      
      In my testing on an x86_64 system with 384GB of RAM I have seen the
      following.  In the case of regular memory initialization the deferred init
      time was decreased from 3.75s to 1.38s on average.  This amounts to a 172%
      improvement for the deferred memory initialization performance.
      
      I have called out the improvement observed with each patch.
      
      This patch (of 4):
      
      Use the same approach that was already in use on Sparc on all the
      architectures that support a 64b long.
      
      This is mostly motivated by the fact that 7 to 10 store/move instructions
      are likely always going to be faster than having to call into a function
      that is not specialized for handling page init.
      
      An added advantage to doing it this way is that the compiler can get away
      with combining writes in the __init_single_page call.  As a result the
      memset call will be reduced to only about 4 write operations, or at least
      that is what I am seeing with GCC 6.2 as the flags, LRU pointers, and
      count/mapcount seem to be cancelling out at least 4 of the 8 assignments
      on my system.
      
      One change I had to make to the function was to reduce the minimum page
      size to 56 to support some powerpc64 configurations.
      
      This change should introduce no change on SPARC since it already had this
      code.  In the case of x86_64 I saw a reduction from 3.75s to 2.80s when
      initializing 384GB of RAM per node.  Pavel Tatashin tested on a system
      with Broadcom's Stingray CPU and 48GB of RAM and found that
      __init_single_page() takes 19.30ns / 64-byte struct page before this patch
      and with this patch it takes 17.33ns / 64-byte struct page.  Mike Rapoport
      ran a similar test on a OpenPower (S812LC 8348-21C) with Power8 processor
      and 128GB or RAM.  His results per 64-byte struct page were 4.68ns before,
      and 4.59ns after this patch.
      
      Link: http://lkml.kernel.org/r/20190405221213.12227.9392.stgit@localhost.localdomainSigned-off-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com>
      Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Laurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <yi.z.zhang@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      e258d861
    • C
      nvme-mpath: remove I/O polling support · 73d7873c
      Christoph Hellwig 提交于
      commit 9d6610b76fa374eae3deb93bcbace4a06c2e3b95 upstream.
      
      The ->poll_fn has been stale for a while, as a lot of places check for mq
      ops.  But there is no real point in it anyway, as we don't even use
      the multipath code for subsystems without multiple ports, which is usually
      what we do high performance I/O to.  If it really becomes an issue we
      should rework the nvme code to also skip the multipath code for any
      private namespace, even if that could mean some trouble when rescanning.
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NShile Zhang <shile.zhang@linux.alibaba-inc.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      73d7873c
    • D
      amd-gpu: Don't undefine READ and WRITE · 1cf428f7
      David Howells 提交于
      commit 1fcb748d187d0c7732a75a509e924ead6d070e04 upstream.
      
      Remove the undefinition of READ and WRITE because these constants may be
      used elsewhere in subsequently included header files, thus breaking them.
      
      These constants don't actually appear to be used in the driver, so the
      undefinition seems pointless.
      
      Fixes: 4562236b ("drm/amd/dc: Add dc display driver (v2)")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      1cf428f7
    • M
      blk-mq: grab .q_usage_counter when queuing request from plug code path · 709e97f3
      Ming Lei 提交于
      commit e87eb301bee183d82bb3d04bd71b6660889a2588 upstream.
      
      Just like aio/io_uring, we need to grab 2 refcount for queuing one
      request, one is for submission, another is for completion.
      
      If the request isn't queued from plug code path, the refcount grabbed
      in generic_make_request() serves for submission. In theroy, this
      refcount should have been released after the sumission(async run queue)
      is done. blk_freeze_queue() works with blk_sync_queue() together
      for avoiding race between cleanup queue and IO submission, given async
      run queue activities are canceled because hctx->run_work is scheduled with
      the refcount held, so it is fine to not hold the refcount when
      running the run queue work function for dispatch IO.
      
      However, if request is staggered into plug list, and finally queued
      from plug code path, the refcount in submission side is actually missed.
      And we may start to run queue after queue is removed because the queue's
      kobject refcount isn't guaranteed to be grabbed in flushing plug list
      context, then kernel oops is triggered, see the following race:
      
      blk_mq_flush_plug_list():
              blk_mq_sched_insert_requests()
                      insert requests to sw queue or scheduler queue
                      blk_mq_run_hw_queue
      
      Because of concurrent run queue, all requests inserted above may be
      completed before calling the above blk_mq_run_hw_queue. Then queue can
      be freed during the above blk_mq_run_hw_queue().
      
      Fixes the issue by grab .q_usage_counter before calling
      blk_mq_sched_insert_requests() in blk_mq_flush_plug_list(). This way is
      safe because the queue is absolutely alive before inserting request.
      
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: James Smart <james.smart@broadcom.com>
      Cc: linux-scsi@vger.kernel.org,
      Cc: Martin K . Petersen <martin.petersen@oracle.com>,
      Cc: Christoph Hellwig <hch@lst.de>,
      Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Tested-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      [Joseph: use the passing 'q' directly]
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      709e97f3
    • K
      block/bfq: fix ifdef for CONFIG_BFQ_GROUP_IOSCHED=y · c327d430
      Konstantin Khlebnikov 提交于
      commit 42b1bd33dcdef4ffd98f695e188bab82f9fa46d8 upstream.
      
      Replace BFQ_GROUP_IOSCHED_ENABLED with CONFIG_BFQ_GROUP_IOSCHED.
      Code under these ifdefs never worked, something might be broken.
      
      Fixes: 0471559c ("block, bfq: add/remove entity weights correctly")
      Reviewed-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      c327d430
    • J
      block: remove bogus check for queue_lock assignment · ffc0d8e4
      Jens Axboe 提交于
      commit 5e27891e88555fecd8262e110e1a29feca4b0166 upstream.
      
      We just allocated the queue and haven't even set it up yet,
      hence we know that checking if ->mq_ops is NULL is always
      going to be true.
      
      In fact we do need to assign a lock to ->queue_lock always,
      as we need it for the queue flags modifications.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      ffc0d8e4