1. 24 8月, 2020 1 次提交
  2. 22 8月, 2020 1 次提交
  3. 21 8月, 2020 1 次提交
  4. 20 8月, 2020 1 次提交
    • B
      ext4: limit the length of per-inode prealloc list · 27bc446e
      brookxu 提交于
      In the scenario of writing sparse files, the per-inode prealloc list may
      be very long, resulting in high overhead for ext4_mb_use_preallocated().
      To circumvent this problem, we limit the maximum length of per-inode
      prealloc list to 512 and allow users to modify it.
      
      After patching, we observed that the sys ratio of cpu has dropped, and
      the system throughput has increased significantly. We created a process
      to write the sparse file, and the running time of the process on the
      fixed kernel was significantly reduced, as follows:
      
      Running time on unfixed kernel:
      [root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
      real    0m2.051s
      user    0m0.008s
      sys     0m2.026s
      
      Running time on fixed kernel:
      [root@TENCENT64 ~]# time taskset 0x01 ./sparse /data1/sparce.dat
      real    0m0.471s
      user    0m0.004s
      sys     0m0.395s
      Signed-off-by: NChunguang Xu <brookxu@tencent.com>
      Link: https://lore.kernel.org/r/d7a98178-056b-6db5-6bce-4ead23f4a257@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      27bc446e
  5. 19 8月, 2020 1 次提交
    • X
      ipv6: some fixes for ipv6_dev_find() · 4ef1a7cb
      Xin Long 提交于
      This patch is to do 3 things for ipv6_dev_find():
      
        As David A. noticed,
      
        - rt6_lookup() is not really needed. Different from __ip_dev_find(),
          ipv6_dev_find() doesn't have a compatibility problem, so remove it.
      
        As Hideaki suggested,
      
        - "valid" (non-tentative) check for the address is also needed.
          ipv6_chk_addr() calls ipv6_chk_addr_and_flags(), which will
          traverse the address hash list, but it's heavy to be called
          inside ipv6_dev_find(). This patch is to reuse the code of
          ipv6_chk_addr_and_flags() for ipv6_dev_find().
      
        - dev parameter is passed into ipv6_dev_find(), as link-local
          addresses from user space has sin6_scope_id set and the dev
          lookup needs it.
      
      Fixes: 81f6cb31 ("ipv6: add ipv6_dev_find()")
      Suggested-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
      Reported-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ef1a7cb
  6. 18 8月, 2020 3 次提交
  7. 15 8月, 2020 14 次提交
  8. 14 8月, 2020 2 次提交
    • C
      dma-pool: fix coherent pool allocations for IOMMU mappings · 9420139f
      Christoph Hellwig 提交于
      When allocating coherent pool memory for an IOMMU mapping we don't care
      about the DMA mask.  Move the guess for the initial GFP mask into the
      dma_direct_alloc_pages and pass dma_coherent_ok as a function pointer
      argument so that it doesn't get applied to the IOMMU case.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NAmit Pundir <amit.pundir@linaro.org>
      9420139f
    • E
      random32: add a tracepoint for prandom_u32() · 94c7eb54
      Eric Dumazet 提交于
      There has been some heat around prandom_u32() lately, and some people
      were wondering if there was a simple way to determine how often
      it was used, before considering making it maybe 10 times more expensive.
      
      This tracepoint exports the generated pseudo random value.
      
      Tested:
      
      perf list | grep prandom_u32
        random:prandom_u32                                 [Tracepoint event]
      
      perf record -a [-g] [-C1] -e random:prandom_u32 sleep 1
      [ perf record: Woken up 0 times to write data ]
      [ perf record: Captured and wrote 259.748 MB perf.data (924087 samples) ]
      
      perf report --nochildren
          ...
          97.67%  ksoftirqd/1     [kernel.vmlinux]  [k] prandom_u32
                  |
                  ---prandom_u32
                     prandom_u32
                     |
                     |--48.86%--tcp_v4_syn_recv_sock
                     |          tcp_check_req
                     |          tcp_v4_rcv
                     |          ...
                      --48.81%--tcp_conn_request
                                tcp_v4_conn_request
                                tcp_rcv_state_process
                                ...
      perf script
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94c7eb54
  9. 13 8月, 2020 16 次提交
    • O
      xen: Sync up with the canonical protocol definition in Xen · 6f92337b
      Oleksandr Andrushchenko 提交于
      This is the sync up with the canonical definition of the
      display protocol in Xen.
      
      1. Add protocol version as an integer
      
      Version string, which is in fact an integer, is hard to handle in the
      code that supports different protocol versions. To simplify that
      also add the version as an integer.
      
      2. Pass buffer offset with XENDISPL_OP_DBUF_CREATE
      
      There are cases when display data buffer is created with non-zero
      offset to the data start. Handle such cases and provide that offset
      while creating a display buffer.
      
      3. Add XENDISPL_OP_GET_EDID command
      
      Add an optional request for reading Extended Display Identification
      Data (EDID) structure which allows better configuration of the
      display connectors over the configuration set in XenStore.
      With this change connectors may have multiple resolutions defined
      with respect to detailed timing definitions and additional properties
      normally provided by displays.
      
      If this request is not supported by the backend then visible area
      is defined by the relevant XenStore's "resolution" property.
      
      If backend provides extended display identification data (EDID) with
      XENDISPL_OP_GET_EDID request then EDID values must take precedence
      over the resolutions defined in XenStore.
      
      4. Bump protocol version to 2.
      Signed-off-by: NOleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Link: https://lore.kernel.org/r/20200813062113.11030-5-andr2000@gmail.comSigned-off-by: NJuergen Gross <jgross@suse.com>
      6f92337b
    • A
      mfd: Replace HTTP links with HTTPS ones · 4f4ed454
      Alexander A. Klimov 提交于
      Rationale:
      Reduces attack surface on kernel devs opening the links for MITM
      as HTTPS traffic is much harder to manipulate.
      
      Deterministic algorithm:
      For each file:
        If not .svg:
          For each line:
            If doesn't contain `\bxmlns\b`:
              For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
      	  If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
                  If both the HTTP and HTTPS versions
                  return 200 OK and serve the same content:
                    Replace HTTP with HTTPS.
      Signed-off-by: NAlexander A. Klimov <grandmaster@al2klimov.de>
      Acked-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      4f4ed454
    • C
      mfd: mfd-core: Add mechanism for removal of a subset of children · 114294d2
      Charles Keepax 提交于
      Currently, the only way to remove MFD children is with a call to
      mfd_remove_devices, which will remove all the children. Under
      some circumstances it is useful to remove only a subset of the
      child devices. For example if some additional clean up is required
      between removal of certain child devices.
      
      To accomplish this a level field is added to mfd_cell, the normal
      mfd_remove_devices is modified to not remove devices that are set
      to a higher level and a corresponding mfd_remove_devices_late
      function is added to remove those children.
      
      See further discussion at:
      https://lore.kernel.org/lkml/20200616075834.GF2608702@dell/Suggested-by: NLee Jones <lee.jones@linaro.org>
      Signed-off-by: NCharles Keepax <ckeepax@opensource.cirrus.com>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      114294d2
    • R
      mfd: max77693-private: Drop a duplicated word · e7b85500
      Randy Dunlap 提交于
      Drop the repeated word "in" in a comment.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      e7b85500
    • R
      mfd: da9055: pdata.h: Drop a duplicated word · 23ef2b64
      Randy Dunlap 提交于
      Drop the repeated word "that" in a comment.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Acked-by: NAdam Thomson <Adam.Thomson.Opensource@diasemi.com>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      23ef2b64
    • A
      mfd: da9063: Add support for latest DA silicon revision · 9ece3601
      Adam Thomson 提交于
      This update adds new regmap tables to support the latest DA silicon
      which will automatically be selected based on the chip and variant
      information read from the device.
      Signed-off-by: NAdam Thomson <Adam.Thomson.Opensource@diasemi.com>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      9ece3601
    • A
      mfd: da9063: Fix revision handling to correctly select reg tables · 091c6110
      Adam Thomson 提交于
      The current implementation performs checking in the i2c_probe()
      function of the variant_code but does this immediately after the
      containing struct has been initialised as all zero. This means the
      check for variant code will always default to using the BB tables
      and will never select AD. The variant code is subsequently set
      by device_init() and later used by the RTC so really it's a little
      fortunate this mismatch works.
      
      This update adds raw I2C read access functionality to read the chip
      and variant/revision information (common to all revisions) so that
      it can subsequently correctly choose the proper regmap tables for
      real initialisation.
      Signed-off-by: NAdam Thomson <Adam.Thomson.Opensource@diasemi.com>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      091c6110
    • M
      mfd: smsc-ece1099: Remove driver · 7d2594cd
      Michael Walle 提交于
      This MFD driver has no user. The keypad driver of this device never made
      it into the kernel. Therefore, this driver is useless. Remove it.
      Signed-off-by: NMichael Walle <michael@walle.cc>
      Cc: Sourav Poddar <sourav.poddar@ti.com>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      7d2594cd
    • L
      mfd: core: Add OF_MFD_CELL_REG() helper · 44e6171e
      Lee Jones 提交于
      Extend current list of helpers to provide support for parent drivers
      wishing to match specific child devices to particular OF nodes.
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      44e6171e
    • L
      mfd: core: Fix formatting of MFD helpers · d097965b
      Lee Jones 提交于
      Remove unnecessary '\'s and leading tabs.
      
      This will help to clean-up future diffs when subsequent changes are
      made.
      
      Hint: The aforementioned changes follow this patch.
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      d097965b
    • L
      mfd: core: Make a best effort attempt to match devices with the correct of_nodes · 466a62d7
      Lee Jones 提交于
      Currently, when a child platform device (sometimes referred to as a
      sub-device) is registered via the Multi-Functional Device (MFD) API,
      the framework attempts to match the newly registered platform device
      with its associated Device Tree (OF) node.  Until now, the device has
      been allocated the first node found with an identical OF compatible
      string.  Unfortunately, if there are, say for example '3' devices
      which are to be handled by the same driver and therefore have the same
      compatible string, each of them will be allocated a pointer to the
      *first* node.
      
      An example Device Tree entry might look like this:
      
        mfd_of_test {
                compatible = "mfd,of-test-parent";
                #address-cells = <0x02>;
                #size-cells = <0x02>;
      
                child@aaaaaaaaaaaaaaaa {
                        compatible = "mfd,of-test-child";
                        reg = <0xaaaaaaaa 0xaaaaaaaa 0 0x11>,
                              <0xbbbbbbbb 0xbbbbbbbb 0 0x22>;
                };
      
                child@cccccccc {
                        compatible = "mfd,of-test-child";
                        reg = <0x00000000 0xcccccccc 0 0x33>;
                };
      
                child@dddddddd00000000 {
                        compatible = "mfd,of-test-child";
                        reg = <0xdddddddd 0x00000000 0 0x44>;
                };
        };
      
      When used with example sub-device registration like this:
      
        static const struct mfd_cell mfd_of_test_cell[] = {
              OF_MFD_CELL("mfd-of-test-child", NULL, NULL, 0, 0, "mfd,of-test-child"),
              OF_MFD_CELL("mfd-of-test-child", NULL, NULL, 0, 1, "mfd,of-test-child"),
              OF_MFD_CELL("mfd-of-test-child", NULL, NULL, 0, 2, "mfd,of-test-child")
        };
      
      ... the current implementation will result in all devices being allocated
      the first OF node found containing a matching compatible string:
      
        [0.712511] mfd-of-test-child mfd-of-test-child.0: Probing platform device: 0
        [0.712710] mfd-of-test-child mfd-of-test-child.0: Using OF node: child@aaaaaaaaaaaaaaaa
        [0.713033] mfd-of-test-child mfd-of-test-child.1: Probing platform device: 1
        [0.713381] mfd-of-test-child mfd-of-test-child.1: Using OF node: child@aaaaaaaaaaaaaaaa
        [0.713691] mfd-of-test-child mfd-of-test-child.2: Probing platform device: 2
        [0.713889] mfd-of-test-child mfd-of-test-child.2: Using OF node: child@aaaaaaaaaaaaaaaa
      
      After this patch each device will be allocated a unique OF node:
      
        [0.712511] mfd-of-test-child mfd-of-test-child.0: Probing platform device: 0
        [0.712710] mfd-of-test-child mfd-of-test-child.0: Using OF node: child@aaaaaaaaaaaaaaaa
        [0.713033] mfd-of-test-child mfd-of-test-child.1: Probing platform device: 1
        [0.713381] mfd-of-test-child mfd-of-test-child.1: Using OF node: child@cccccccc
        [0.713691] mfd-of-test-child mfd-of-test-child.2: Probing platform device: 2
        [0.713889] mfd-of-test-child mfd-of-test-child.2: Using OF node: child@dddddddd00000000
      
      Which is fine if all OF nodes are identical.  However if we wish to
      apply an attribute to particular device, we really need to ensure the
      correct OF node will be associated with the device containing the
      correct address.  We accomplish this by matching the device's address
      expressed in DT with one provided during sub-device registration.
      Like this:
      
        static const struct mfd_cell mfd_of_test_cell[] = {
              OF_MFD_CELL_REG("mfd-of-test-child", NULL, NULL, 0, 1, "mfd,of-test-child", 0xdddddddd00000000),
              OF_MFD_CELL_REG("mfd-of-test-child", NULL, NULL, 0, 2, "mfd,of-test-child", 0xaaaaaaaaaaaaaaaa),
              OF_MFD_CELL_REG("mfd-of-test-child", NULL, NULL, 0, 3, "mfd,of-test-child", 0x00000000cccccccc)
        };
      
      This will ensure a specific device (designated here using the
      platform_ids; 1, 2 and 3) is matched with a particular OF node:
      
        [0.712511] mfd-of-test-child mfd-of-test-child.0: Probing platform device: 0
        [0.712710] mfd-of-test-child mfd-of-test-child.0: Using OF node: child@dddddddd00000000
        [0.713033] mfd-of-test-child mfd-of-test-child.1: Probing platform device: 1
        [0.713381] mfd-of-test-child mfd-of-test-child.1: Using OF node: child@aaaaaaaaaaaaaaaa
        [0.713691] mfd-of-test-child mfd-of-test-child.2: Probing platform device: 2
        [0.713889] mfd-of-test-child mfd-of-test-child.2: Using OF node: child@cccccccc
      
      This implementation is still not infallible, hence the mention of
      "best effort" in the commit subject.  Since we have not *insisted* on
      the existence of 'reg' properties (in some scenarios they just do not
      make sense) and no device currently uses the new 'of_reg' attribute,
      we have to make an on-the-fly judgement call whether to associate the
      OF node anyway.  Which we do in cases where parent drivers haven't
      specified a particular OF node to match to.  So there is a *slight*
      possibility of the following result (note: the implementation here is
      convoluted, but it shows you one means by which this process can
      still break):
      
        /*
         * First entry will match to the first OF node with matching compatible
         * Second will fail, since the first took its OF node and is no longer available
         * Third will succeed
         */
        static const struct mfd_cell mfd_of_test_cell[] = {
              OF_MFD_CELL("mfd-of-test-child", NULL, NULL, 0, 1, "mfd,of-test-child"),
      	OF_MFD_CELL_REG("mfd-of-test-child", NULL, NULL, 0, 2, "mfd,of-test-child", 0xaaaaaaaaaaaaaaaa),
              OF_MFD_CELL_REG("mfd-of-test-child", NULL, NULL, 0, 3, "mfd,of-test-child", 0x00000000cccccccc)
        };
      
      The result:
      
        [0.753869] mfd-of-test-parent mfd_of_test: Registering 3 devices
        [0.756597] mfd-of-test-child: Failed to locate of_node [id: 2]
        [0.759999] mfd-of-test-child mfd-of-test-child.1: Probing platform device: 1
        [0.760314] mfd-of-test-child mfd-of-test-child.1: Using OF node: child@aaaaaaaaaaaaaaaa
        [0.760908] mfd-of-test-child mfd-of-test-child.2: Probing platform device: 2
        [0.761183] mfd-of-test-child mfd-of-test-child.2: No OF node associated with this device
        [0.761621] mfd-of-test-child mfd-of-test-child.3: Probing platform device: 3
        [0.761899] mfd-of-test-child mfd-of-test-child.3: Using OF node: child@cccccccc
      
      We could code around this with some pre-parsing semantics, but the
      added complexity required to cover each and every corner-case is not
      justified.  Merely patching the current failing (via this patch) is
      already working with some pretty small corner-cases.  Other issues
      should be patched in the parent drivers which can be achieved simply
      by implementing OF_MFD_CELL_REG().
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      466a62d7
    • F
      netfilter: avoid ipv6 -> nf_defrag_ipv6 module dependency · 2404b73c
      Florian Westphal 提交于
      nf_ct_frag6_gather is part of nf_defrag_ipv6.ko, not ipv6 core.
      
      The current use of the netfilter ipv6 stub indirections  causes a module
      dependency between ipv6 and nf_defrag_ipv6.
      
      This prevents nf_defrag_ipv6 module from being removed because ipv6 can't
      be unloaded.
      
      Remove the indirection and always use a direct call.  This creates a
      depency from nf_conntrack_bridge to nf_defrag_ipv6 instead:
      
      modinfo nf_conntrack
      depends:        nf_conntrack,nf_defrag_ipv6,bridge
      
      .. and nf_conntrack already depends on nf_defrag_ipv6 anyway.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2404b73c
    • P
      mm/gup: remove task_struct pointer for all gup code · 64019a2e
      Peter Xu 提交于
      After the cleanup of page fault accounting, gup does not need to pass
      task_struct around any more.  Remove that parameter in the whole gup
      stack.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
      Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64019a2e
    • P
      mm: do page fault accounting in handle_mm_fault · bce617ed
      Peter Xu 提交于
      Patch series "mm: Page fault accounting cleanups", v5.
      
      This is v5 of the pf accounting cleanup series.  It originates from Gerald
      Schaefer's report on an issue a week ago regarding to incorrect page fault
      accountings for retried page fault after commit 4064b982 ("mm: allow
      VM_FAULT_RETRY for multiple times"):
      
        https://lore.kernel.org/lkml/20200610174811.44b94525@thinkpad/
      
      What this series did:
      
        - Correct page fault accounting: we do accounting for a page fault
          (no matter whether it's from #PF handling, or gup, or anything else)
          only with the one that completed the fault.  For example, page fault
          retries should not be counted in page fault counters.  Same to the
          perf events.
      
        - Unify definition of PERF_COUNT_SW_PAGE_FAULTS: currently this perf
          event is used in an adhoc way across different archs.
      
          Case (1): for many archs it's done at the entry of a page fault
          handler, so that it will also cover e.g.  errornous faults.
      
          Case (2): for some other archs, it is only accounted when the page
          fault is resolved successfully.
      
          Case (3): there're still quite some archs that have not enabled
          this perf event.
      
          Since this series will touch merely all the archs, we unify this
          perf event to always follow case (1), which is the one that makes most
          sense.  And since we moved the accounting into handle_mm_fault, the
          other two MAJ/MIN perf events are well taken care of naturally.
      
        - Unify definition of "major faults": the definition of "major
          fault" is slightly changed when used in accounting (not
          VM_FAULT_MAJOR).  More information in patch 1.
      
        - Always account the page fault onto the one that triggered the page
          fault.  This does not matter much for #PF handlings, but mostly for
          gup.  More information on this in patch 25.
      
      Patchset layout:
      
      Patch 1:     Introduced the accounting in handle_mm_fault(), not enabled.
      Patch 2-23:  Enable the new accounting for arch #PF handlers one by one.
      Patch 24:    Enable the new accounting for the rest outliers (gup, iommu, etc.)
      Patch 25:    Cleanup GUP task_struct pointer since it's not needed any more
      
      This patch (of 25):
      
      This is a preparation patch to move page fault accountings into the
      general code in handle_mm_fault().  This includes both the per task
      flt_maj/flt_min counters, and the major/minor page fault perf events.  To
      do this, the pt_regs pointer is passed into handle_mm_fault().
      
      PERF_COUNT_SW_PAGE_FAULTS should still be kept in per-arch page fault
      handlers.
      
      So far, all the pt_regs pointer that passed into handle_mm_fault() is
      NULL, which means this patch should have no intented functional change.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200707225021.200906-1-peterx@redhat.com
      Link: http://lkml.kernel.org/r/20200707225021.200906-2-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bce617ed
    • J
      mm/hugetlb: make hugetlb migration callback CMA aware · bbe88753
      Joonsoo Kim 提交于
      new_non_cma_page() in gup.c requires to allocate the new page that is not
      on the CMA area.  new_non_cma_page() implements it by using allocation
      scope APIs.
      
      However, there is a work-around for hugetlb.  Normal hugetlb page
      allocation API for migration is alloc_huge_page_nodemask().  It consists
      of two steps.  First is dequeing from the pool.  Second is, if there is no
      available page on the queue, allocating by using the page allocator.
      
      new_non_cma_page() can't use this API since first step (deque) isn't aware
      of scope API to exclude CMA area.  So, new_non_cma_page() exports hugetlb
      internal function for the second step, alloc_migrate_huge_page(), to
      global scope and uses it directly.  This is suboptimal since hugetlb pages
      on the queue cannot be utilized.
      
      This patch tries to fix this situation by making the deque function on
      hugetlb CMA aware.  In the deque function, CMA memory is skipped if
      PF_MEMALLOC_NOCMA flag is found.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NMike Kravetz <mike.kravetz@oracle.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Roman Gushchin <guro@fb.com>
      Link: http://lkml.kernel.org/r/1596180906-8442-2-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bbe88753
    • J
      mm/gup: restrict CMA region by using allocation scope API · 41b4dc14
      Joonsoo Kim 提交于
      We have well defined scope API to exclude CMA region.  Use it rather than
      manipulating gfp_mask manually.  With this change, we can now restore
      __GFP_MOVABLE for gfp_mask like as usual migration target allocation.  It
      would result in that the ZONE_MOVABLE is also searched by page allocator.
      For hugetlb, gfp_mask is redefined since it has a regular allocation mask
      filter for migration target.  __GPF_NOWARN is added to hugetlb gfp_mask
      filter since a new user for gfp_mask filter, gup, want to be silent when
      allocation fails.
      
      Note that this can be considered as a fix for the commit 9a4e9f3b
      ("mm: update get_user_pages_longterm to migrate pages allocated from CMA
      region").  However, "Fixes" tag isn't added here since it is just
      suboptimal but it doesn't cause any problem.
      Suggested-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Roman Gushchin <guro@fb.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
      Link: http://lkml.kernel.org/r/1596180906-8442-1-git-send-email-iamjoonsoo.kim@lge.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41b4dc14