1. 25 8月, 2019 18 次提交
    • O
      HID: holtek: test for sanity of intfdata · 537d957b
      Oliver Neukum 提交于
      commit 01ec0a5f19c8c82960a07f6c7410fc9e01d7fb51 upstream.
      
      The ioctl handler uses the intfdata of a second interface,
      which may not be present in a broken or malicious device, hence
      the intfdata needs to be checked for NULL.
      
      [jkosina@suse.cz: fix newly added spurious space]
      Reported-by: syzbot+965152643a75a56737be@syzkaller.appspotmail.com
      Signed-off-by: NOliver Neukum <oneukum@suse.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      537d957b
    • H
      ALSA: hda - Let all conexant codec enter D3 when rebooting · 9585f444
      Hui Wang 提交于
      commit 401714d9534aad8c24196b32600da683116bbe09 upstream.
      
      We have 3 new lenovo laptops which have conexant codec 0x14f11f86,
      these 3 laptops also have the noise issue when rebooting, after
      letting the codec enter D3 before rebooting or poweroff, the noise
      disappers.
      
      Instead of adding a new ID again in the reboot_notify(), let us make
      this function apply to all conexant codec. In theory make codec enter
      D3 before rebooting or poweroff is harmless, and I tested this change
      on a couple of other Lenovo laptops which have different conexant
      codecs, there is no side effect so far.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NHui Wang <hui.wang@canonical.com>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9585f444
    • H
      ALSA: hda - Add a generic reboot_notify · e58ba88d
      Hui Wang 提交于
      commit 871b9066027702e6e6589da0e1edd3b7dede7205 upstream.
      
      Make codec enter D3 before rebooting or poweroff can fix the noise
      issue on some laptops. And in theory it is harmless for all codecs
      to enter D3 before rebooting or poweroff, let us add a generic
      reboot_notify, then realtek and conexant drivers can call this
      function.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NHui Wang <hui.wang@canonical.com>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e58ba88d
    • W
      ALSA: hda - Fix a memory leak bug · 6c4a536c
      Wenwen Wang 提交于
      commit cfef67f016e4c00a2f423256fc678a6967a9fc09 upstream.
      
      In snd_hda_parse_generic_codec(), 'spec' is allocated through kzalloc().
      Then, the pin widgets in 'codec' are parsed. However, if the parsing
      process fails, 'spec' is not deallocated, leading to a memory leak.
      
      To fix the above issue, free 'spec' before returning the error.
      
      Fixes: 352f7f91 ("ALSA: hda - Merge Realtek parser code to generic parser")
      Signed-off-by: NWenwen Wang <wenwen@cs.uga.edu>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c4a536c
    • T
      ALSA: hda - Apply workaround for another AMD chip 1022:1487 · 1bf5f827
      Takashi Iwai 提交于
      commit de768ce45466f3009809719eb7b1f6f5277d9373 upstream.
      
      MSI MPG X570 board is with another AMD HD-audio controller (PCI ID
      1022:1487) and it requires the same workaround applied for X370, etc
      (PCI ID 1022:1457).
      
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=195303
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1bf5f827
    • H
      ALSA: usb-audio: Fix an OOB bug in parse_audio_mixer_unit · 58b9f19e
      Hui Peng 提交于
      commit daac07156b330b18eb5071aec4b3ddca1c377f2c upstream.
      
      The `uac_mixer_unit_descriptor` shown as below is read from the
      device side. In `parse_audio_mixer_unit`, `baSourceID` field is
      accessed from index 0 to `bNrInPins` - 1, the current implementation
      assumes that descriptor is always valid (the length  of descriptor
      is no shorter than 5 + `bNrInPins`). If a descriptor read from
      the device side is invalid, it may trigger out-of-bound memory
      access.
      
      ```
      struct uac_mixer_unit_descriptor {
      	__u8 bLength;
      	__u8 bDescriptorType;
      	__u8 bDescriptorSubtype;
      	__u8 bUnitID;
      	__u8 bNrInPins;
      	__u8 baSourceID[];
      }
      ```
      
      This patch fixes the bug by add a sanity check on the length of
      the descriptor.
      Reported-by: NHui Peng <benquike@gmail.com>
      Reported-by: NMathias Payer <mathias.payer@nebelwelt.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NHui Peng <benquike@gmail.com>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58b9f19e
    • H
      ALSA: usb-audio: Fix a stack buffer overflow bug in check_input_term · 46f9a1bc
      Hui Peng 提交于
      commit 19bce474c45be69a284ecee660aa12d8f1e88f18 upstream.
      
      `check_input_term` recursively calls itself with input from
      device side (e.g., uac_input_terminal_descriptor.bCSourceID)
      as argument (id). In `check_input_term`, if `check_input_term`
      is called with the same `id` argument as the caller, it triggers
      endless recursive call, resulting kernel space stack overflow.
      
      This patch fixes the bug by adding a bitmap to `struct mixer_build`
      to keep track of the checked ids and stop the execution if some id
      has been checked (similar to how parse_audio_unit handles unitid
      argument).
      Reported-by: NHui Peng <benquike@gmail.com>
      Reported-by: NMathias Payer <mathias.payer@nebelwelt.net>
      Signed-off-by: NHui Peng <benquike@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      46f9a1bc
    • T
      ALSA: hda/realtek - Add quirk for HP Envy x360 · d5bb1240
      Takashi Iwai 提交于
      commit 190d03814eb3b49d4f87ff38fef26d36f3568a60 upstream.
      
      HP Envy x360 (AMD Ryzen-based model) with 103c:8497 needs the same
      quirk like HP Spectre x360 for enabling the mute LED over Mic3 pin.
      
      BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=204373
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5bb1240
    • M
      xtensa: add missing isync to the cpu_reset TLB code · 61f6ecb7
      Max Filippov 提交于
      commit cd8869f4cb257f22b89495ca40f5281e58ba359c upstream.
      
      ITLB entry modifications must be followed by the isync instruction
      before the new entries are possibly used. cpu_reset lacks one isync
      between ITLB way 6 initialization and jump to the identity mapping.
      Add missing isync to xtensa cpu_reset.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      61f6ecb7
    • V
      cpufreq: schedutil: Don't skip freq update when limits change · 7c001e5a
      Viresh Kumar 提交于
      commit 600f5badb78c316146d062cfd7af4a2cfb655baa upstream.
      
      To avoid reducing the frequency of a CPU prematurely, we skip reducing
      the frequency if the CPU had been busy recently.
      
      This should not be done when the limits of the policy are changed, for
      example due to thermal throttling. We should always get the frequency
      within the new limits as soon as possible.
      
      Trying to fix this by using only one flag, i.e. need_freq_update, can
      lead to a race condition where the flag gets cleared without forcing us
      to change the frequency at least once. And so this patch introduces
      another flag to avoid that race condition.
      
      Fixes: ecd28842 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
      Cc: v4.18+ <stable@vger.kernel.org> # v4.18+
      Reported-by: NDoug Smythies <dsmythies@telus.net>
      Tested-by: NDoug Smythies <dsmythies@telus.net>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7c001e5a
    • F
      Revert "pwm: Set class for exported channels in sysfs" · 7f68aa2e
      Fabrice Gasnier 提交于
      commit c289d6625237aa785b484b4e94c23b3b91ea7e60 upstream.
      
      This reverts commit 7e5d1fd7 ("pwm: Set
      class for exported channels in sysfs") as it causes regression with
      multiple pwm chip[1], when exporting a pwm channel (echo X > export):
      
      - ABI (Documentation/ABI/testing/sysfs-class-pwm) states pwmX should be
        created in /sys/class/pwm/pwmchipN/pwmX
      - Reverted patch causes new entry to be also created directly in
        /sys/class/pwm/pwmX
      - 1st time, exporting pwmX will create an entry in /sys/class/pwm/pwmX
      - class attributes are added under pwmX folder, such as export, unexport
        npwm, symlinks. This is wrong as it belongs to pwmchipN. It may cause
        bad behavior and report wrong values.
      - when another export happens on another pwmchip, it can't be created
        (e.g. -EEXIST). This is causing the issue with multiple pwmchip.
      
      Example on stm32 (stm32429i-eval) platform:
      $ ls /sys/class/pwm
      pwmchip0 pwmchip4
      
      $ cd /sys/class/pwm/pwmchip0/
      $ echo 0 > export
      $ ls /sys/class/pwm
      pwm0 pwmchip0 pwmchip4
      
      $ cd /sys/class/pwm/pwmchip4/
      $ echo 0 > export
      sysfs: cannot create duplicate filename '/class/pwm/pwm0'
      ...Exception stack follows...
      
      This is also seen on other platform [2]
      
      [1] https://lkml.org/lkml/2018/9/25/713
      [2] https://lkml.org/lkml/2018/9/25/447Signed-off-by: NFabrice Gasnier <fabrice.gasnier@st.com>
      Tested-by: NGottfried Haider <gottfried.haider@gmail.com>
      Tested-by: NMichal Vokáč <michal.vokac@ysoft.com>
      Signed-off-by: NThierry Reding <thierry.reding@gmail.com>
      Cc: John Keeping <john@metanate.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7f68aa2e
    • I
      mm/usercopy: use memory range to be accessed for wraparound check · 056368fc
      Isaac J. Manjarres 提交于
      commit 951531691c4bcaa59f56a316e018bc2ff1ddf855 upstream.
      
      Currently, when checking to see if accessing n bytes starting at address
      "ptr" will cause a wraparound in the memory addresses, the check in
      check_bogus_address() adds an extra byte, which is incorrect, as the
      range of addresses that will be accessed is [ptr, ptr + (n - 1)].
      
      This can lead to incorrectly detecting a wraparound in the memory
      address, when trying to read 4 KB from memory that is mapped to the the
      last possible page in the virtual address space, when in fact, accessing
      that range of memory would not cause a wraparound to occur.
      
      Use the memory range that will actually be accessed when considering if
      accessing a certain amount of bytes will cause the memory address to
      wrap around.
      
      Link: http://lkml.kernel.org/r/1564509253-23287-1-git-send-email-isaacm@codeaurora.org
      Fixes: f5509cc1 ("mm: Hardened usercopy")
      Signed-off-by: NPrasad Sodagudi <psodagud@codeaurora.org>
      Signed-off-by: NIsaac J. Manjarres <isaacm@codeaurora.org>
      Co-developed-by: NPrasad Sodagudi <psodagud@codeaurora.org>
      Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Trilok Soni <tsoni@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      056368fc
    • M
      mm/memcontrol.c: fix use after free in mem_cgroup_iter() · c8282f1b
      Miles Chen 提交于
      commit 54a83d6bcbf8f4700013766b974bf9190d40b689 upstream.
      
      This patch is sent to report an use after free in mem_cgroup_iter()
      after merging commit be2657752e9e ("mm: memcg: fix use after free in
      mem_cgroup_iter()").
      
      I work with android kernel tree (4.9 & 4.14), and commit be2657752e9e
      ("mm: memcg: fix use after free in mem_cgroup_iter()") has been merged
      to the trees.  However, I can still observe use after free issues
      addressed in the commit be2657752e9e.  (on low-end devices, a few times
      this month)
      
      backtrace:
              css_tryget <- crash here
              mem_cgroup_iter
              shrink_node
              shrink_zones
              do_try_to_free_pages
              try_to_free_pages
              __perform_reclaim
              __alloc_pages_direct_reclaim
              __alloc_pages_slowpath
              __alloc_pages_nodemask
      
      To debug, I poisoned mem_cgroup before freeing it:
      
        static void __mem_cgroup_free(struct mem_cgroup *memcg)
              for_each_node(node)
              free_mem_cgroup_per_node_info(memcg, node);
              free_percpu(memcg->stat);
        +     /* poison memcg before freeing it */
        +     memset(memcg, 0x78, sizeof(struct mem_cgroup));
              kfree(memcg);
        }
      
      The coredump shows the position=0xdbbc2a00 is freed.
      
        (gdb) p/x ((struct mem_cgroup_per_node *)0xe5009e00)->iter[8]
        $13 = {position = 0xdbbc2a00, generation = 0x2efd}
      
        0xdbbc2a00:     0xdbbc2e00      0x00000000      0xdbbc2800      0x00000100
        0xdbbc2a10:     0x00000200      0x78787878      0x00026218      0x00000000
        0xdbbc2a20:     0xdcad6000      0x00000001      0x78787800      0x00000000
        0xdbbc2a30:     0x78780000      0x00000000      0x0068fb84      0x78787878
        0xdbbc2a40:     0x78787878      0x78787878      0x78787878      0xe3fa5cc0
        0xdbbc2a50:     0x78787878      0x78787878      0x00000000      0x00000000
        0xdbbc2a60:     0x00000000      0x00000000      0x00000000      0x00000000
        0xdbbc2a70:     0x00000000      0x00000000      0x00000000      0x00000000
        0xdbbc2a80:     0x00000000      0x00000000      0x00000000      0x00000000
        0xdbbc2a90:     0x00000001      0x00000000      0x00000000      0x00100000
        0xdbbc2aa0:     0x00000001      0xdbbc2ac8      0x00000000      0x00000000
        0xdbbc2ab0:     0x00000000      0x00000000      0x00000000      0x00000000
        0xdbbc2ac0:     0x00000000      0x00000000      0xe5b02618      0x00001000
        0xdbbc2ad0:     0x00000000      0x78787878      0x78787878      0x78787878
        0xdbbc2ae0:     0x78787878      0x78787878      0x78787878      0x78787878
        0xdbbc2af0:     0x78787878      0x78787878      0x78787878      0x78787878
        0xdbbc2b00:     0x78787878      0x78787878      0x78787878      0x78787878
        0xdbbc2b10:     0x78787878      0x78787878      0x78787878      0x78787878
        0xdbbc2b20:     0x78787878      0x78787878      0x78787878      0x78787878
        0xdbbc2b30:     0x78787878      0x78787878      0x78787878      0x78787878
        0xdbbc2b40:     0x78787878      0x78787878      0x78787878      0x78787878
        0xdbbc2b50:     0x78787878      0x78787878      0x78787878      0x78787878
        0xdbbc2b60:     0x78787878      0x78787878      0x78787878      0x78787878
        0xdbbc2b70:     0x78787878      0x78787878      0x78787878      0x78787878
        0xdbbc2b80:     0x78787878      0x78787878      0x00000000      0x78787878
        0xdbbc2b90:     0x78787878      0x78787878      0x78787878      0x78787878
        0xdbbc2ba0:     0x78787878      0x78787878      0x78787878      0x78787878
      
      In the reclaim path, try_to_free_pages() does not setup
      sc.target_mem_cgroup and sc is passed to do_try_to_free_pages(), ...,
      shrink_node().
      
      In mem_cgroup_iter(), root is set to root_mem_cgroup because
      sc->target_mem_cgroup is NULL.  It is possible to assign a memcg to
      root_mem_cgroup.nodeinfo.iter in mem_cgroup_iter().
      
              try_to_free_pages
              	struct scan_control sc = {...}, target_mem_cgroup is 0x0;
              do_try_to_free_pages
              shrink_zones
              shrink_node
              	 mem_cgroup *root = sc->target_mem_cgroup;
              	 memcg = mem_cgroup_iter(root, NULL, &reclaim);
              mem_cgroup_iter()
              	if (!root)
              		root = root_mem_cgroup;
              	...
      
              	css = css_next_descendant_pre(css, &root->css);
              	memcg = mem_cgroup_from_css(css);
              	cmpxchg(&iter->position, pos, memcg);
      
      My device uses memcg non-hierarchical mode.  When we release a memcg:
      invalidate_reclaim_iterators() reaches only dead_memcg and its parents.
      If non-hierarchical mode is used, invalidate_reclaim_iterators() never
      reaches root_mem_cgroup.
      
        static void invalidate_reclaim_iterators(struct mem_cgroup *dead_memcg)
        {
              struct mem_cgroup *memcg = dead_memcg;
      
              for (; memcg; memcg = parent_mem_cgroup(memcg)
              ...
        }
      
      So the use after free scenario looks like:
      
        CPU1						CPU2
      
        try_to_free_pages
        do_try_to_free_pages
        shrink_zones
        shrink_node
        mem_cgroup_iter()
            if (!root)
            	root = root_mem_cgroup;
            ...
            css = css_next_descendant_pre(css, &root->css);
            memcg = mem_cgroup_from_css(css);
            cmpxchg(&iter->position, pos, memcg);
      
              				invalidate_reclaim_iterators(memcg);
              				...
              				__mem_cgroup_free()
              					kfree(memcg);
      
        try_to_free_pages
        do_try_to_free_pages
        shrink_zones
        shrink_node
        mem_cgroup_iter()
            if (!root)
            	root = root_mem_cgroup;
            ...
            mz = mem_cgroup_nodeinfo(root, reclaim->pgdat->node_id);
            iter = &mz->iter[reclaim->priority];
            pos = READ_ONCE(iter->position);
            css_tryget(&pos->css) <- use after free
      
      To avoid this, we should also invalidate root_mem_cgroup.nodeinfo.iter
      in invalidate_reclaim_iterators().
      
      [cai@lca.pw: fix -Wparentheses compilation warning]
        Link: http://lkml.kernel.org/r/1564580753-17531-1-git-send-email-cai@lca.pw
      Link: http://lkml.kernel.org/r/20190730015729.4406-1-miles.chen@mediatek.com
      Fixes: 5ac8fb31 ("mm: memcontrol: convert reclaim iterator to simple css refcounting")
      Signed-off-by: NMiles Chen <miles.chen@mediatek.com>
      Signed-off-by: NQian Cai <cai@lca.pw>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c8282f1b
    • Y
      mm: mempolicy: handle vma with unmovable pages mapped correctly in mbind · 3c0cb90e
      Yang Shi 提交于
      commit a53190a4aaa36494f4d7209fd1fcc6f2ee08e0e0 upstream.
      
      When running syzkaller internally, we ran into the below bug on 4.9.x
      kernel:
      
        kernel BUG at mm/huge_memory.c:2124!
        invalid opcode: 0000 [#1] SMP KASAN
        CPU: 0 PID: 1518 Comm: syz-executor107 Not tainted 4.9.168+ #2
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
        task: ffff880067b34900 task.stack: ffff880068998000
        RIP: split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124
        Call Trace:
          split_huge_page include/linux/huge_mm.h:100 [inline]
          queue_pages_pte_range+0x7e1/0x1480 mm/mempolicy.c:538
          walk_pmd_range mm/pagewalk.c:50 [inline]
          walk_pud_range mm/pagewalk.c:90 [inline]
          walk_pgd_range mm/pagewalk.c:116 [inline]
          __walk_page_range+0x44a/0xdb0 mm/pagewalk.c:208
          walk_page_range+0x154/0x370 mm/pagewalk.c:285
          queue_pages_range+0x115/0x150 mm/mempolicy.c:694
          do_mbind mm/mempolicy.c:1241 [inline]
          SYSC_mbind+0x3c3/0x1030 mm/mempolicy.c:1370
          SyS_mbind+0x46/0x60 mm/mempolicy.c:1352
          do_syscall_64+0x1d2/0x600 arch/x86/entry/common.c:282
          entry_SYSCALL_64_after_swapgs+0x5d/0xdb
        Code: c7 80 1c 02 00 e8 26 0a 76 01 <0f> 0b 48 c7 c7 40 46 45 84 e8 4c
        RIP  [<ffffffff81895d6b>] split_huge_page_to_list+0x8fb/0x1030 mm/huge_memory.c:2124
         RSP <ffff88006899f980>
      
      with the below test:
      
        uint64_t r[1] = {0xffffffffffffffff};
      
        int main(void)
        {
              syscall(__NR_mmap, 0x20000000, 0x1000000, 3, 0x32, -1, 0);
                                      intptr_t res = 0;
              res = syscall(__NR_socket, 0x11, 3, 0x300);
              if (res != -1)
                      r[0] = res;
              *(uint32_t*)0x20000040 = 0x10000;
              *(uint32_t*)0x20000044 = 1;
              *(uint32_t*)0x20000048 = 0xc520;
              *(uint32_t*)0x2000004c = 1;
              syscall(__NR_setsockopt, r[0], 0x107, 0xd, 0x20000040, 0x10);
              syscall(__NR_mmap, 0x20fed000, 0x10000, 0, 0x8811, r[0], 0);
              *(uint64_t*)0x20000340 = 2;
              syscall(__NR_mbind, 0x20ff9000, 0x4000, 0x4002, 0x20000340, 0x45d4, 3);
              return 0;
        }
      
      Actually the test does:
      
        mmap(0x20000000, 16777216, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x20000000
        socket(AF_PACKET, SOCK_RAW, 768)        = 3
        setsockopt(3, SOL_PACKET, PACKET_TX_RING, {block_size=65536, block_nr=1, frame_size=50464, frame_nr=1}, 16) = 0
        mmap(0x20fed000, 65536, PROT_NONE, MAP_SHARED|MAP_FIXED|MAP_POPULATE|MAP_DENYWRITE, 3, 0) = 0x20fed000
        mbind(..., MPOL_MF_STRICT|MPOL_MF_MOVE) = 0
      
      The setsockopt() would allocate compound pages (16 pages in this test)
      for packet tx ring, then the mmap() would call packet_mmap() to map the
      pages into the user address space specified by the mmap() call.
      
      When calling mbind(), it would scan the vma to queue the pages for
      migration to the new node.  It would split any huge page since 4.9
      doesn't support THP migration, however, the packet tx ring compound
      pages are not THP and even not movable.  So, the above bug is triggered.
      
      However, the later kernel is not hit by this issue due to commit
      d44d363f ("mm: don't assume anonymous pages have SwapBacked flag"),
      which just removes the PageSwapBacked check for a different reason.
      
      But, there is a deeper issue.  According to the semantic of mbind(), it
      should return -EIO if MPOL_MF_MOVE or MPOL_MF_MOVE_ALL was specified and
      MPOL_MF_STRICT was also specified, but the kernel was unable to move all
      existing pages in the range.  The tx ring of the packet socket is
      definitely not movable, however, mbind() returns success for this case.
      
      Although the most socket file associates with non-movable pages, but XDP
      may have movable pages from gup.  So, it sounds not fine to just check
      the underlying file type of vma in vma_migratable().
      
      Change migrate_page_add() to check if the page is movable or not, if it
      is unmovable, just return -EIO.  But do not abort pte walk immediately,
      since there may be pages off LRU temporarily.  We should migrate other
      pages if MPOL_MF_MOVE* is specified.  Set has_unmovable flag if some
      paged could not be not moved, then return -EIO for mbind() eventually.
      
      With this change the above test would return -EIO as expected.
      
      [yang.shi@linux.alibaba.com: fix review comments from Vlastimil]
        Link: http://lkml.kernel.org/r/1563556862-54056-3-git-send-email-yang.shi@linux.alibaba.com
      Link: http://lkml.kernel.org/r/1561162809-59140-3-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: NYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3c0cb90e
    • Y
      mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified · cd825d87
      Yang Shi 提交于
      commit d883544515aae54842c21730b880172e7894fde9 upstream.
      
      When both MPOL_MF_MOVE* and MPOL_MF_STRICT was specified, mbind() should
      try best to migrate misplaced pages, if some of the pages could not be
      migrated, then return -EIO.
      
      There are three different sub-cases:
       1. vma is not migratable
       2. vma is migratable, but there are unmovable pages
       3. vma is migratable, pages are movable, but migrate_pages() fails
      
      If #1 happens, kernel would just abort immediately, then return -EIO,
      after a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when
      MPOL_MF_STRICT is specified").
      
      If #3 happens, kernel would set policy and migrate pages with
      best-effort, but won't rollback the migrated pages and reset the policy
      back.
      
      Before that commit, they behaves in the same way.  It'd better to keep
      their behavior consistent.  But, rolling back the migrated pages and
      resetting the policy back sounds not feasible, so just make #1 behave as
      same as #3.
      
      Userspace will know that not everything was successfully migrated (via
      -EIO), and can take whatever steps it deems necessary - attempt
      rollback, determine which exact page(s) are violating the policy, etc.
      
      Make queue_pages_range() return 1 to indicate there are unmovable pages
      or vma is not migratable.
      
      The #2 is not handled correctly in the current kernel, the following
      patch will fix it.
      
      [yang.shi@linux.alibaba.com: fix review comments from Vlastimil]
        Link: http://lkml.kernel.org/r/1563556862-54056-2-git-send-email-yang.shi@linux.alibaba.com
      Link: http://lkml.kernel.org/r/1561162809-59140-2-git-send-email-yang.shi@linux.alibaba.comSigned-off-by: NYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cd825d87
    • R
      mm/hmm: fix bad subpage pointer in try_to_unmap_one · f0fed828
      Ralph Campbell 提交于
      commit 1de13ee59225dfc98d483f8cce7d83f97c0b31de upstream.
      
      When migrating an anonymous private page to a ZONE_DEVICE private page,
      the source page->mapping and page->index fields are copied to the
      destination ZONE_DEVICE struct page and the page_mapcount() is
      increased.  This is so rmap_walk() can be used to unmap and migrate the
      page back to system memory.
      
      However, try_to_unmap_one() computes the subpage pointer from a swap pte
      which computes an invalid page pointer and a kernel panic results such
      as:
      
        BUG: unable to handle page fault for address: ffffea1fffffffc8
      
      Currently, only single pages can be migrated to device private memory so
      no subpage computation is needed and it can be set to "page".
      
      [rcampbell@nvidia.com: add comment]
        Link: http://lkml.kernel.org/r/20190724232700.23327-4-rcampbell@nvidia.com
      Link: http://lkml.kernel.org/r/20190719192955.30462-4-rcampbell@nvidia.com
      Fixes: a5430dda ("mm/migrate: support un-addressable ZONE_DEVICE page in migration")
      Signed-off-by: NRalph Campbell <rcampbell@nvidia.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jason Gunthorpe <jgg@mellanox.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Lai Jiangshan <jiangshanlai@gmail.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0fed828
    • N
      seq_file: fix problem when seeking mid-record · 3858cca1
      NeilBrown 提交于
      commit 6a2aeab59e97101b4001bac84388fc49a992f87e upstream.
      
      If you use lseek or similar (e.g.  pread) to access a location in a
      seq_file file that is within a record, rather than at a record boundary,
      then the first read will return the remainder of the record, and the
      second read will return the whole of that same record (instead of the
      next record).  When seeking to a record boundary, the next record is
      correctly returned.
      
      This bug was introduced by a recent patch (identified below).  Before
      that patch, seq_read() would increment m->index when the last of the
      buffer was returned (m->count == 0).  After that patch, we rely on
      ->next to increment m->index after filling the buffer - but there was
      one place where that didn't happen.
      
      Link: https://lkml.kernel.org/lkml/877e7xl029.fsf@notabene.neil.brown.name/
      Fixes: 1f4aace6 ("fs/seq_file.c: simplify seq_file iteration code and interface")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Reported-by: NSergei Turchanov <turchanov@farpost.com>
      Tested-by: NSergei Turchanov <turchanov@farpost.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Markus Elfring <Markus.Elfring@web.de>
      Cc: <stable@vger.kernel.org>	[4.19+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3858cca1
    • G
      sh: kernel: hw_breakpoint: Fix missing break in switch statement · 50d15197
      Gustavo A. R. Silva 提交于
      commit 1ee1119d184bb06af921b48c3021d921bbd85bac upstream.
      
      Add missing break statement in order to prevent the code from falling
      through to case SH_BREAKPOINT_WRITE.
      
      Fixes: 09a07294 ("sh: hw-breakpoints: Add preliminary support for SH-4A UBC.")
      Cc: stable@vger.kernel.org
      Reviewed-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50d15197
  2. 16 8月, 2019 22 次提交