1. 05 8月, 2017 1 次提交
    • L
      Merge tag 'powerpc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 841fe953
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
       "Fixes for recently merged code:
         - a fix for the _PAGE_DEVMAP support, which was breaking KVM on
           Power9 radix
         - avoid a (harmless) lockdep warning in the early SMP code
         - return failure for some uses of dma_set_mask() rather than falling
           back to 32-bits
         - fix stack setup in watchdog soft_nmi_common() to use emergency
           stack
         - fix of_irq_to_resource() error check in of_fsl_spi_probe()
      
        Two fixes going to stable:
         - fix saving of Transactional Memory SPRs in core dump
         - fix __check_irq_replay missing decrementer interrupt
      
        And two misc:
         - fix 64-bit boot wrapper build with non-biarch compiler
         - work around a POWER9 PMU hang after state-loss idle
      
        Thanks to: Alistair Popple, Aneesh Kumar K.V, Cyril Bur, Gustavo
        Romero, Jose Ricardo Ziviani, Laurent Vivier, Nicholas Piggin, Oliver
        O'Halloran, Sergei Shtylyov, Suraj Jitindar Singh, Thomas Gleixner"
      
      * tag 'powerpc-4.13-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/64: Fix __check_irq_replay missing decrementer interrupt
        powerpc/perf: POWER9 PMU stops after idle workaround
        powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check
        powerpc/64s: Fix stack setup in watchdog soft_nmi_common()
        powerpc/powernv/pci: Return failure for some uses of dma_set_mask()
        powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler
        powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU
        powerpc/tm: Fix saving of TM SPRs in core dump
        powerpc/mm: Fix pmd/pte_devmap() on non-leaf entries
      841fe953
  2. 04 8月, 2017 9 次提交
    • N
      powerpc/64: Fix __check_irq_replay missing decrementer interrupt · 3db40c31
      Nicholas Piggin 提交于
      If the decrementer wraps again and de-asserts the decrementer
      exception while hard-disabled, __check_irq_replay() has a test to
      notice the wrap when interrupts are re-enabled.
      
      The decrementer check must be done when clearing the PACA_IRQ_HARD_DIS
      flag, not when the PACA_IRQ_DEC flag is tested. Previously this worked
      because the decrementer interrupt was always the first one checked
      after clearing the hard disable flag, but HMI check was moved ahead of
      that, which introduced this bug.
      
      This can cause a missed decrementer interrupt if we soft-disable
      interrupts then take an HMI which is recorded in irq_happened, then
      hard-disable interrupts for > 4s to wrap the decrementer.
      
      Fixes: e0e0d6b7 ("powerpc/64: Replay hypervisor maintenance interrupt first")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3db40c31
    • N
      powerpc/perf: POWER9 PMU stops after idle workaround · 09539f9b
      Nicholas Piggin 提交于
      POWER9 DD2 PMU can stop after a state-loss idle in some conditions.
      
      A solution is to set then clear MMCRA[60] after wake from state-loss
      idle. MMCRA[60] is a non-architected bit, see the user manual for
      details.
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Acked-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Reviewed-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Acked-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      09539f9b
    • L
      Merge tag 'vfio-v4.13-rc4' of git://github.com/awilliam/linux-vfio · 869c058f
      Linus Torvalds 提交于
      Pull VFIO fixes from Alex Williamson:
      
       - SPAPR/EEH config build fix (Murilo Opsfelder Araujo)
      
       - Fix possible device lock deadlock (Alex Williamson)
      
       - Correctly size integrated endpoint PCIe capabilities (Alex
         Williamson)
      
      * tag 'vfio-v4.13-rc4' of git://github.com/awilliam/linux-vfio:
        vfio/pci: Fix handling of RC integrated endpoint PCIe capability size
        vfio/pci: Use pci_try_reset_function() on initial open
        include/linux/vfio.h: Guard powerpc-specific functions with CONFIG_VFIO_SPAPR_EEH
      869c058f
    • L
      Merge branch 'akpm' (patches from Andrew) · 995d03ae
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "15 fixes"
      
      [ This does not merge the "fortify: use WARN instead of BUG for now"
        patch, which needs a bit of extra work to build cleanly with all
        configurations. Arnd is on it.   - Linus ]
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        ocfs2: don't clear SGID when inheriting ACLs
        mm: allow page_cache_get_speculative in interrupt context
        userfaultfd: non-cooperative: flush event_wqh at release time
        ipc: add missing container_of()s for randstruct
        cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()
        userfaultfd_zeropage: return -ENOSPC in case mm has gone
        mm: take memory hotplug lock within numa_zonelist_order_handler()
        mm/page_io.c: fix oops during block io poll in swapin path
        zram: do not free pool->size_class
        kthread: fix documentation build warning
        kasan: avoid -Wmaybe-uninitialized warning
        userfaultfd: non-cooperative: notify about unmap of destination during mremap
        mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
        pid: kill pidhash_size in pidhash_init()
        mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors
      995d03ae
    • L
      Merge tag 'acpi-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 8d3fe85f
      Linus Torvalds 提交于
      Pull ACPI fixes from Rafael Wysocki:
       "These fix two issues in the ACPI SoC drivers (Intel LPSS and AMD APD),
        a crash in the PCC mailbox initialization code and a WDAT watchdog
        initialization failure.
      
        Specifics:
      
         - Fix a device ID of Hisilicon Hip07/08 in the ACPI APD (AMD SoC)
           driver (Hanjun Guo).
      
         - Fix list corruption (introduced during the 4.11 cycle) in the ACPI
           LPSS (Intel SoC) driver (Hans de Goede).
      
         - Fix PCC mailbox handling code crash during initialization when PCCT
           is not present and PCC channel 0 is requested (Hoan Tran).
      
         - Fix a WDAT watchdog initialization issue causing platform device
           creation to fail due to partially overlapping address ranges in
           resources (Ryan Kennedy)"
      
      * tag 'acpi-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: APD: Fix HID for Hisilicon Hip07/08
        mailbox: pcc: Fix crash when request PCC channel 0
        ACPI / watchdog: Fix init failure with overlapping register regions
        ACPI / LPSS: Only call pwm_add_table() for the first PWM controller
      8d3fe85f
    • L
      Merge tag 'pm-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 73784fb7
      Linus Torvalds 提交于
      Pull power management fixes from Rafael Wysocki:
       "These fix two cpufreq issues, one introduced recently and one related
        to recent changes, fix cpufreq documentation, fix up recently added
        code in the Thunderbolt driver and update runtime PM framework
        documentation.
      
        Specifics:
      
         - Fix the handling of the scaling_cur_freq cpufreq policy attribute
           on x86 systems with the MPERF/APERF registers present to make it
           behave more as expected after recent changes (Rafael Wysocki).
      
         - Drop a leftover callback from the intel_pstate driver which also
           prevents the cpuinfo_cur_freq cpufreq policy attribute from being
           incorrectly exposed when intel_pstate works in the active mode
           (Rafael Wysocki).
      
         - Add a missing piece describing the cpuinfo_cur_freq policy
           attribute to cpufreq documentation (Rafael Wysocki).
      
         - Fix up a recently added part of the Thunderbolt driver to avoid
           aborting system suspends if its mailbox commands time out (Rafael
           Wysocki).
      
         - Update device runtime PM framework documentation to reflect the
           current behavior of the code (Johan Hovold)"
      
      * tag 'pm-4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        thunderbolt: icm: Ignore mailbox errors in icm_suspend()
        cpufreq: x86: Make scaling_cur_freq behave more as expected
        PM / runtime: Document new pm_runtime_set_suspended() constraint
        cpufreq: docs: Add missing cpuinfo_cur_freq description
        cpufreq: intel_pstate: Drop ->get from intel_pstate structure
      73784fb7
    • R
      Merge branches 'acpi-soc', 'acpi-wdat' and 'acpi-cppc' · 3de559d4
      Rafael J. Wysocki 提交于
      * acpi-soc:
        ACPI: APD: Fix HID for Hisilicon Hip07/08
        ACPI / LPSS: Only call pwm_add_table() for the first PWM controller
      
      * acpi-wdat:
        ACPI / watchdog: Fix init failure with overlapping register regions
      
      * acpi-cppc:
        mailbox: pcc: Fix crash when request PCC channel 0
      3de559d4
    • R
      Merge branches 'pm-core' and 'pm-misc' · 78aa904a
      Rafael J. Wysocki 提交于
      * pm-core:
        PM / runtime: Document new pm_runtime_set_suspended() constraint
      
      * pm-misc:
        thunderbolt: icm: Ignore mailbox errors in icm_suspend()
      78aa904a
    • R
      Merge branches 'pm-cpufreq-x86', 'pm-cpufreq-docs' and 'intel_pstate' · 8a05c311
      Rafael J. Wysocki 提交于
      * pm-cpufreq-x86:
        cpufreq: x86: Make scaling_cur_freq behave more as expected
      
      * pm-cpufreq-docs:
        cpufreq: docs: Add missing cpuinfo_cur_freq description
      
      * intel_pstate:
        cpufreq: intel_pstate: Drop ->get from intel_pstate structure
      8a05c311
  3. 03 8月, 2017 18 次提交
    • L
      Merge tag 'nfs-for-4.13-4' of git://git.linux-nfs.org/projects/anna/linux-nfs · 19ec50a4
      Linus Torvalds 提交于
      Pull NFS client fixes from Anna Schumaker:
       "Two fixes from Trond this time, now that he's back from his vacation.
        The first is a stable fix for the EXCHANGE_ID issue on the mailing
        list, and the other fixes a double-free situation that he found at the
        same time.
      
        Stable fix:
         - Fix EXCHANGE_ID corrupt verifier issue
      
        Other fix:
         - Fix double frees in nfs4_test_session_trunk()"
      
      * tag 'nfs-for-4.13-4' of git://git.linux-nfs.org/projects/anna/linux-nfs:
        NFSv4: Fix double frees in nfs4_test_session_trunk()
        NFSv4: Fix EXCHANGE_ID corrupt verifier issue
      19ec50a4
    • A
      isdn/i4l: fix buffer overflow · 9f5af546
      Annie Cherkaev 提交于
      This fixes a potential buffer overflow in isdn_net.c caused by an
      unbounded strcpy.
      
      [ ISDN seems to be effectively unmaintained, and the I4L driver in
        particular is long deprecated, but in case somebody uses this..
          - Linus ]
      Signed-off-by: NJiten Thakkar <jitenmt@gmail.com>
      Signed-off-by: NAnnie Cherkaev <annie.cherk@gmail.com>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f5af546
    • J
      ocfs2: don't clear SGID when inheriting ACLs · 19ec8e48
      Jan Kara 提交于
      When new directory 'DIR1' is created in a directory 'DIR0' with SGID bit
      set, DIR1 is expected to have SGID bit set (and owning group equal to
      the owning group of 'DIR0').  However when 'DIR0' also has some default
      ACLs that 'DIR1' inherits, setting these ACLs will result in SGID bit on
      'DIR1' to get cleared if user is not member of the owning group.
      
      Fix the problem by moving posix_acl_update_mode() out of ocfs2_set_acl()
      into ocfs2_iop_set_acl().  That way the function will not be called when
      inheriting ACLs which is what we want as it prevents SGID bit clearing
      and the mode has been properly set by posix_acl_create() anyway.  Also
      posix_acl_chmod() that is calling ocfs2_set_acl() takes care of updating
      mode itself.
      
      Fixes: 07393101 ("posix_acl: Clear SGID bit when setting file permissions")
      Link: http://lkml.kernel.org/r/20170801141252.19675-3-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19ec8e48
    • K
      mm: allow page_cache_get_speculative in interrupt context · 1ee1c3f5
      Kan Liang 提交于
      Kernel panic when calling the IRQ-safe __get_user_pages_fast in NMI
      handler.
      
      The bug was introduced by commit 2947ba05 ("x86/mm/gup: Switch GUP
      to the generic get_user_page_fast() implementation").
      
      The original x86 __get_user_page_fast used plain get_page() or
      page_ref_add().  However, the generic __get_user_page_fast uses
      page_cache_get_speculative(), which has VM_BUG_ON(in_interrupt()).
      
      There is no reason to prevent page_cache_get_speculative from using in
      interrupt context.  According to the author, putting a BUG_ON there is
      just because the code is not verifying correctness of interrupt races.
      I did some tests in interrupt context.  There is no issue found.
      
      Removing VM_BUG_ON(in_interrupt()) for page_cache_get_speculative().
      
      Link: http://lkml.kernel.org/r/1501609146-59730-1-git-send-email-kan.liang@intel.com
      Fixes: 2947ba05 ("x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation")
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Ying Huang <ying.huang@intel.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ee1c3f5
    • M
      userfaultfd: non-cooperative: flush event_wqh at release time · 5a18b64e
      Mike Rapoport 提交于
      There may still be threads waiting on event_wqh at the time the
      userfault file descriptor is closed.  Flush the events wait-queue to
      prevent waiting threads from hanging.
      
      Link: http://lkml.kernel.org/r/1501398127-30419-1-git-send-email-rppt@linux.vnet.ibm.com
      Fixes: 9cd75c3c ("userfaultfd: non-cooperative: add ability to report
      non-PF events from uffd descriptor")
      Signed-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a18b64e
    • K
      ipc: add missing container_of()s for randstruct · ade9f91b
      Kees Cook 提交于
      When building with the randstruct gcc plugin, the layout of the IPC
      structs will be randomized, which requires any sub-structure accesses to
      use container_of().  The proc display handlers were missing the needed
      container_of()s since the iterator is passing in the top-level struct
      kern_ipc_perm.
      
      This would lead to crashes when running the "lsipc" program after the
      system had IPC registered (e.g. after starting up Gnome):
      
        general protection fault: 0000 [#1] PREEMPT SMP
        ...
        RIP: 0010:shm_add_rss_swap.isra.1+0x13/0xa0
        ...
        Call Trace:
          sysvipc_shm_proc_show+0x5e/0x150
          sysvipc_proc_show+0x1a/0x30
          seq_read+0x2e9/0x3f0
        ...
      
      Link: http://lkml.kernel.org/r/20170730205950.GA55841@beast
      Fixes: 3859a271 ("randstruct: Mark various structs for randomization")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reported-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Acked-by: NDavidlohr Bueso <dave@stgolabs.net>
      Acked-by: NManfred Spraul <manfred@colorfullife.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ade9f91b
    • D
      cpuset: fix a deadlock due to incomplete patching of cpusets_enabled() · 89affbf5
      Dima Zavin 提交于
      In codepaths that use the begin/retry interface for reading
      mems_allowed_seq with irqs disabled, there exists a race condition that
      stalls the patch process after only modifying a subset of the
      static_branch call sites.
      
      This problem manifested itself as a deadlock in the slub allocator,
      inside get_any_partial.  The loop reads mems_allowed_seq value (via
      read_mems_allowed_begin), performs the defrag operation, and then
      verifies the consistency of mem_allowed via the read_mems_allowed_retry
      and the cookie returned by xxx_begin.
      
      The issue here is that both begin and retry first check if cpusets are
      enabled via cpusets_enabled() static branch.  This branch can be
      rewritted dynamically (via cpuset_inc) if a new cpuset is created.  The
      x86 jump label code fully synchronizes across all CPUs for every entry
      it rewrites.  If it rewrites only one of the callsites (specifically the
      one in read_mems_allowed_retry) and then waits for the
      smp_call_function(do_sync_core) to complete while a CPU is inside the
      begin/retry section with IRQs off and the mems_allowed value is changed,
      we can hang.
      
      This is because begin() will always return 0 (since it wasn't patched
      yet) while retry() will test the 0 against the actual value of the seq
      counter.
      
      The fix is to use two different static keys: one for begin
      (pre_enable_key) and one for retry (enable_key).  In cpuset_inc(), we
      first bump the pre_enable key to ensure that cpuset_mems_allowed_begin()
      always return a valid seqcount if are enabling cpusets.  Similarly, when
      disabling cpusets via cpuset_dec(), we first ensure that callers of
      cpuset_mems_allowed_retry() will start ignoring the seqcount value
      before we let cpuset_mems_allowed_begin() return 0.
      
      The relevant stack traces of the two stuck threads:
      
        CPU: 1 PID: 1415 Comm: mkdir Tainted: G L  4.9.36-00104-g540c51286237 #4
        Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
        task: ffff8817f9c28000 task.stack: ffffc9000ffa4000
        RIP: smp_call_function_many+0x1f9/0x260
        Call Trace:
          smp_call_function+0x3b/0x70
          on_each_cpu+0x2f/0x90
          text_poke_bp+0x87/0xd0
          arch_jump_label_transform+0x93/0x100
          __jump_label_update+0x77/0x90
          jump_label_update+0xaa/0xc0
          static_key_slow_inc+0x9e/0xb0
          cpuset_css_online+0x70/0x2e0
          online_css+0x2c/0xa0
          cgroup_apply_control_enable+0x27f/0x3d0
          cgroup_mkdir+0x2b7/0x420
          kernfs_iop_mkdir+0x5a/0x80
          vfs_mkdir+0xf6/0x1a0
          SyS_mkdir+0xb7/0xe0
          entry_SYSCALL_64_fastpath+0x18/0xad
      
        ...
      
        CPU: 2 PID: 1 Comm: init Tainted: G L  4.9.36-00104-g540c51286237 #4
        Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
        task: ffff8818087c0000 task.stack: ffffc90000030000
        RIP: int3+0x39/0x70
        Call Trace:
          <#DB> ? ___slab_alloc+0x28b/0x5a0
          <EOE> ? copy_process.part.40+0xf7/0x1de0
          __slab_alloc.isra.80+0x54/0x90
          copy_process.part.40+0xf7/0x1de0
          copy_process.part.40+0xf7/0x1de0
          kmem_cache_alloc_node+0x8a/0x280
          copy_process.part.40+0xf7/0x1de0
          _do_fork+0xe7/0x6c0
          _raw_spin_unlock_irq+0x2d/0x60
          trace_hardirqs_on_caller+0x136/0x1d0
          entry_SYSCALL_64_fastpath+0x5/0xad
          do_syscall_64+0x27/0x350
          SyS_clone+0x19/0x20
          do_syscall_64+0x60/0x350
          entry_SYSCALL64_slow_path+0x25/0x25
      
      Link: http://lkml.kernel.org/r/20170731040113.14197-1-dmitriyz@waymo.com
      Fixes: 46e700ab ("mm, page_alloc: remove unnecessary taking of a seqlock when cpusets are disabled")
      Signed-off-by: NDima Zavin <dmitriyz@waymo.com>
      Reported-by: NCliff Spradlin <cspradlin@waymo.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89affbf5
    • M
      userfaultfd_zeropage: return -ENOSPC in case mm has gone · 9d95aa4b
      Mike Rapoport 提交于
      In the non-cooperative userfaultfd case, the process exit may race with
      outstanding mcopy_atomic called by the uffd monitor.  Returning -ENOSPC
      instead of -EINVAL when mm is already gone will allow uffd monitor to
      distinguish this case from other error conditions.
      
      Unfortunately I overlooked userfaultfd_zeropage when updating
      userfaultd_copy().
      
      Link: http://lkml.kernel.org/r/1501136819-21857-1-git-send-email-rppt@linux.vnet.ibm.com
      Fixes: 96333187 ("userfaultfd_copy: return -ENOSPC in case mm has gone")
      Signed-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Pavel Emelyanov <xemul@virtuozzo.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d95aa4b
    • H
      mm: take memory hotplug lock within numa_zonelist_order_handler() · 167d0f25
      Heiko Carstens 提交于
      Andre Wild reported the following warning:
      
        WARNING: CPU: 2 PID: 1205 at kernel/cpu.c:240 lockdep_assert_cpus_held+0x4c/0x60
        Modules linked in:
        CPU: 2 PID: 1205 Comm: bash Not tainted 4.13.0-rc2-00022-gfd2b2c57 #10
        Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
        task: 00000000701d8100 task.stack: 0000000073594000
        Krnl PSW : 0704f00180000000 0000000000145e24 (lockdep_assert_cpus_held+0x4c/0x60)
        ...
        Call Trace:
         lockdep_assert_cpus_held+0x42/0x60)
         stop_machine_cpuslocked+0x62/0xf0
         build_all_zonelists+0x92/0x150
         numa_zonelist_order_handler+0x102/0x150
         proc_sys_call_handler.isra.12+0xda/0x118
         proc_sys_write+0x34/0x48
         __vfs_write+0x3c/0x178
         vfs_write+0xbc/0x1a0
         SyS_write+0x66/0xc0
         system_call+0xc4/0x2b0
         locks held by bash/1205:
         #0:  (sb_writers#4){.+.+.+}, at: vfs_write+0xa6/0x1a0
         #1:  (zl_order_mutex){+.+...}, at: numa_zonelist_order_handler+0x44/0x150
         #2:  (zonelists_mutex){+.+...}, at: numa_zonelist_order_handler+0xf4/0x150
        Last Breaking-Event-Address:
          lockdep_assert_cpus_held+0x48/0x60
      
      This can be easily triggered with e.g.
      
          echo n > /proc/sys/vm/numa_zonelist_order
      
      In commit 3f906ba2 ("mm/memory-hotplug: switch locking to a percpu
      rwsem") memory hotplug locking was changed to fix a potential deadlock.
      
      This also switched the stop_machine() invocation within
      build_all_zonelists() to stop_machine_cpuslocked() which now expects
      that online cpus are locked when being called.
      
      This assumption is not true if build_all_zonelists() is being called
      from numa_zonelist_order_handler().
      
      In order to fix this simply add a mem_hotplug_begin()/mem_hotplug_done()
      pair to numa_zonelist_order_handler().
      
      Link: http://lkml.kernel.org/r/20170726111738.38768-1-heiko.carstens@de.ibm.com
      Fixes: 3f906ba2 ("mm/memory-hotplug: switch locking to a percpu rwsem")
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Reported-by: NAndre Wild <wild@linux.vnet.ibm.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      167d0f25
    • T
      mm/page_io.c: fix oops during block io poll in swapin path · b0ba2d0f
      Tetsuo Handa 提交于
      When a thread is OOM-killed during swap_readpage() operation, an oops
      occurs because end_swap_bio_read() is calling wake_up_process() based on
      an assumption that the thread which called swap_readpage() is still
      alive.
      
        Out of memory: Kill process 525 (polkitd) score 0 or sacrifice child
        Killed process 525 (polkitd) total-vm:528128kB, anon-rss:0kB, file-rss:4kB, shmem-rss:0kB
        oom_reaper: reaped process 525 (polkitd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
        general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
        Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter coretemp ppdev pcspkr vmw_balloon sg shpchp vmw_vmci parport_pc parport i2c_piix4 ip_tables xfs libcrc32c sd_mod sr_mod cdrom ata_generic pata_acpi vmwgfx ahci libahci drm_kms_helper ata_piix syscopyarea sysfillrect sysimgblt fb_sys_fops mptspi scsi_transport_spi ttm e1000 mptscsih drm mptbase i2c_core libata serio_raw
        CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0-rc2-next-20170725 #129
        Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
        task: ffffffffb7c16500 task.stack: ffffffffb7c00000
        RIP: 0010:__lock_acquire+0x151/0x12f0
        Call Trace:
         <IRQ>
         lock_acquire+0x59/0x80
         _raw_spin_lock_irqsave+0x3b/0x4f
         try_to_wake_up+0x3b/0x410
         wake_up_process+0x10/0x20
         end_swap_bio_read+0x6f/0xf0
         bio_endio+0x92/0xb0
         blk_update_request+0x88/0x270
         scsi_end_request+0x32/0x1c0
         scsi_io_completion+0x209/0x680
         scsi_finish_command+0xd4/0x120
         scsi_softirq_done+0x120/0x140
         __blk_mq_complete_request_remote+0xe/0x10
         flush_smp_call_function_queue+0x51/0x120
         generic_smp_call_function_single_interrupt+0xe/0x20
         smp_trace_call_function_single_interrupt+0x22/0x30
         smp_call_function_single_interrupt+0x9/0x10
         call_function_single_interrupt+0xa7/0xb0
         </IRQ>
        RIP: 0010:native_safe_halt+0x6/0x10
         default_idle+0xe/0x20
         arch_cpu_idle+0xa/0x10
         default_idle_call+0x1e/0x30
         do_idle+0x187/0x200
         cpu_startup_entry+0x6e/0x70
         rest_init+0xd0/0xe0
         start_kernel+0x456/0x477
         x86_64_start_reservations+0x24/0x26
         x86_64_start_kernel+0xf7/0x11a
         secondary_startup_64+0xa5/0xa5
        Code: c3 49 81 3f 20 9e 0b b8 41 bc 00 00 00 00 44 0f 45 e2 83 fe 01 0f 87 62 ff ff ff 89 f0 49 8b 44 c7 08 48 85 c0 0f 84 52 ff ff ff <f0> ff 80 98 01 00 00 8b 3d 5a 49 c4 01 45 8b b3 18 0c 00 00 85
        RIP: __lock_acquire+0x151/0x12f0 RSP: ffffa01f39e03c50
        ---[ end trace 6c441db499169b1e ]---
        Kernel panic - not syncing: Fatal exception in interrupt
        Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
        ---[ end Kernel panic - not syncing: Fatal exception in interrupt
      
      Fix it by holding a reference to the thread.
      
      [akpm@linux-foundation.org: add comment]
      Fixes: 23955622 ("swap: add block io poll in swapin path")
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reviewed-by: NShaohua Li <shli@fb.com>
      Cc: Tim Chen <tim.c.chen@intel.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b0ba2d0f
    • M
      zram: do not free pool->size_class · 3189c820
      Minchan Kim 提交于
      Mike reported kernel goes oops with ltp:zram03 testcase.
      
        zram: Added device: zram0
        zram0: detected capacity change from 0 to 107374182400
        BUG: unable to handle kernel paging request at 0000306d61727a77
        IP: zs_map_object+0xb9/0x260
        PGD 0
        P4D 0
        Oops: 0000 [#1] SMP
        Dumping ftrace buffer:
           (ftrace buffer empty)
        Modules linked in: zram(E) xfs(E) libcrc32c(E) btrfs(E) xor(E) raid6_pq(E) loop(E) ebtable_filter(E) ebtables(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) ip_tables(E) x_tables(E) af_packet(E) br_netfilter(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) intel_powerclamp(E) coretemp(E) cdc_ether(E) kvm_intel(E) usbnet(E) mii(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) iTCO_wdt(E) ghash_clmulni_intel(E) bnx2(E) iTCO_vendor_support(E) pcbc(E) ioatdma(E) ipmi_ssif(E) aesni_intel(E) i5500_temp(E) i2c_i801(E) aes_x86_64(E) lpc_ich(E) shpchp(E) mfd_core(E) crypto_simd(E) i7core_edac(E) dca(E) glue_helper(E) cryptd(E) ipmi_si(E) button(E) acpi_cpufreq(E) ipmi_devintf(E) pcspkr(E) ipmi_msghandler(E)
         nfsd(E) auth_rpcgss(E) nfs_acl(E) lockd(E) grace(E) sunrpc(E) ext4(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E) ata_generic(E) i2c_algo_bit(E) ata_piix(E) drm_kms_helper(E) ahci(E) syscopyarea(E) sysfillrect(E) libahci(E) sysimgblt(E) fb_sys_fops(E) uhci_hcd(E) ehci_pci(E) ttm(E) ehci_hcd(E) libata(E) drm(E) megaraid_sas(E) usbcore(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) efivarfs(E) autofs4(E) [last unloaded: zram]
        CPU: 6 PID: 12356 Comm: swapon Tainted: G            E   4.13.0.g87b2c3fc-default #194
        Hardware name: IBM System x3550 M3 -[7944K3G]-/69Y5698     , BIOS -[D6E150AUS-1.10]- 12/15/2010
        task: ffff880158d2c4c0 task.stack: ffffc90001680000
        RIP: 0010:zs_map_object+0xb9/0x260
        Call Trace:
         zram_bvec_rw.isra.26+0xe8/0x780 [zram]
         zram_rw_page+0x6e/0xa0 [zram]
         bdev_read_page+0x81/0xb0
         do_mpage_readpage+0x51a/0x710
         mpage_readpages+0x122/0x1a0
         blkdev_readpages+0x1d/0x20
         __do_page_cache_readahead+0x1b2/0x270
         ondemand_readahead+0x180/0x2c0
         page_cache_sync_readahead+0x31/0x50
         generic_file_read_iter+0x7e7/0xaf0
         blkdev_read_iter+0x37/0x40
         __vfs_read+0xce/0x140
         vfs_read+0x9e/0x150
         SyS_read+0x46/0xa0
         entry_SYSCALL_64_fastpath+0x1a/0xa5
        Code: 81 e6 00 c0 3f 00 81 fe 00 00 16 00 0f 85 9f 01 00 00 0f b7 13 65 ff 05 5e 07 dc 7e 66 c1 ea 02 81 e2 ff 01 00 00 49 8b 54 d4 08 <8b> 4a 48 41 0f af ce 81 e1 ff 0f 00 00 41 89 c9 48 c7 c3 a0 70
        RIP: zs_map_object+0xb9/0x260 RSP: ffffc90001683988
        CR2: 0000306d61727a77
      
      He bisected the problem is [1].
      
      After commit cf8e0fed ("mm/zsmalloc: simplify zs_max_alloc_size
      handling"), zram doesn't use double pointer for pool->size_class any
      more in zs_create_pool so counter function zs_destroy_pool don't need to
      free it, either.
      
      Otherwise, it does kfree wrong address and then, kernel goes Oops.
      
      Link: http://lkml.kernel.org/r/20170725062650.GA12134@bbox
      Fixes: cf8e0fed ("mm/zsmalloc: simplify zs_max_alloc_size handling")
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Reported-by: NMike Galbraith <efault@gmx.de>
      Tested-by: NMike Galbraith <efault@gmx.de>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3189c820
    • J
      kthread: fix documentation build warning · d16977f3
      Jonathan Corbet 提交于
      The kerneldoc comment for kthread_create() had an incorrect argument
      name, leading to a warning in the docs build.
      
      Correct it, and make one more small step toward a warning-free build.
      
      Link: http://lkml.kernel.org/r/20170724135916.7f486c6f@lwn.netSigned-off-by: NJonathan Corbet <corbet@lwn.net>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d16977f3
    • A
      kasan: avoid -Wmaybe-uninitialized warning · e7701557
      Arnd Bergmann 提交于
      gcc-7 produces this warning:
      
        mm/kasan/report.c: In function 'kasan_report':
        mm/kasan/report.c:351:3: error: 'info.first_bad_addr' may be used uninitialized in this function [-Werror=maybe-uninitialized]
           print_shadow_for_address(info->first_bad_addr);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        mm/kasan/report.c:360:27: note: 'info.first_bad_addr' was declared here
      
      The code seems fine as we only print info.first_bad_addr when there is a
      shadow, and we always initialize it in that case, but this is relatively
      hard for gcc to figure out after the latest rework.
      
      Adding an intialization to the most likely value together with the other
      struct members shuts up that warning.
      
      Fixes: b235b9808664 ("kasan: unify report headers")
      Link: https://patchwork.kernel.org/patch/9641417/
      Link: http://lkml.kernel.org/r/20170725152739.4176967-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Suggested-by: NAlexander Potapenko <glider@google.com>
      Suggested-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7701557
    • M
      userfaultfd: non-cooperative: notify about unmap of destination during mremap · b2282371
      Mike Rapoport 提交于
      When mremap is called with MREMAP_FIXED it unmaps memory at the
      destination address without notifying userfaultfd monitor.
      
      If the destination were registered with userfaultfd, the monitor has no
      way to distinguish between the old and new ranges and to properly relate
      the page faults that would occur in the destination region.
      
      Fixes: 897ab3e0 ("userfaultfd: non-cooperative: add event for memory unmaps")
      Link: http://lkml.kernel.org/r/1500276876-3350-1-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
      Acked-by: NPavel Emelyanov <xemul@virtuozzo.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b2282371
    • M
      mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries · 3ea27719
      Mel Gorman 提交于
      Nadav Amit identified a theoritical race between page reclaim and
      mprotect due to TLB flushes being batched outside of the PTL being held.
      
      He described the race as follows:
      
              CPU0                            CPU1
              ----                            ----
                                              user accesses memory using RW PTE
                                              [PTE now cached in TLB]
              try_to_unmap_one()
              ==> ptep_get_and_clear()
              ==> set_tlb_ubc_flush_pending()
                                              mprotect(addr, PROT_READ)
                                              ==> change_pte_range()
                                              ==> [ PTE non-present - no flush ]
      
                                              user writes using cached RW PTE
              ...
      
              try_to_unmap_flush()
      
      The same type of race exists for reads when protecting for PROT_NONE and
      also exists for operations that can leave an old TLB entry behind such
      as munmap, mremap and madvise.
      
      For some operations like mprotect, it's not necessarily a data integrity
      issue but it is a correctness issue as there is a window where an
      mprotect that limits access still allows access.  For munmap, it's
      potentially a data integrity issue although the race is massive as an
      munmap, mmap and return to userspace must all complete between the
      window when reclaim drops the PTL and flushes the TLB.  However, it's
      theoritically possible so handle this issue by flushing the mm if
      reclaim is potentially currently batching TLB flushes.
      
      Other instances where a flush is required for a present pte should be ok
      as either the page lock is held preventing parallel reclaim or a page
      reference count is elevated preventing a parallel free leading to
      corruption.  In the case of page_mkclean there isn't an obvious path
      that userspace could take advantage of without using the operations that
      are guarded by this patch.  Other users such as gup as a race with
      reclaim looks just at PTEs.  huge page variants should be ok as they
      don't race with reclaim.  mincore only looks at PTEs.  userfault also
      should be ok as if a parallel reclaim takes place, it will either fault
      the page back in or read some of the data before the flush occurs
      triggering a fault.
      
      Note that a variant of this patch was acked by Andy Lutomirski but this
      was for the x86 parts on top of his PCID work which didn't make the 4.13
      merge window as expected.  His ack is dropped from this version and
      there will be a follow-on patch on top of PCID that will include his
      ack.
      
      [akpm@linux-foundation.org: tweak comments]
      [akpm@linux-foundation.org: fix spello]
      Link: http://lkml.kernel.org/r/20170717155523.emckq2esjro6hf3z@suse.deReported-by: NNadav Amit <nadav.amit@gmail.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: <stable@vger.kernel.org>	[v4.4+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ea27719
    • K
      pid: kill pidhash_size in pidhash_init() · 27e37d84
      Kefeng Wang 提交于
      After commit 3d375d78 ("mm: update callers to use HASH_ZERO flag"),
      drop unused pidhash_size in pidhash_init().
      
      Link: http://lkml.kernel.org/r/1500389267-49222-1-git-send-email-wangkefeng.wang@huawei.comSigned-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: NPavel Tatashin <Pasha.Tatashin@Oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27e37d84
    • D
      mm/hugetlb.c: __get_user_pages ignores certain follow_hugetlb_page errors · 2be7cfed
      Daniel Jordan 提交于
      Commit 9a291a7c ("mm/hugetlb: report -EHWPOISON not -EFAULT when
      FOLL_HWPOISON is specified") causes __get_user_pages to ignore certain
      errors from follow_hugetlb_page.  After such error, __get_user_pages
      subsequently calls faultin_page on the same VMA and start address that
      follow_hugetlb_page failed on instead of returning the error immediately
      as it should.
      
      In follow_hugetlb_page, when hugetlb_fault returns a value covered under
      VM_FAULT_ERROR, follow_hugetlb_page returns it without setting nr_pages
      to 0 as __get_user_pages expects in this case, which causes the
      following to happen in __get_user_pages: the "while (nr_pages)" check
      succeeds, we skip the "if (!vma..." check because we got a VMA the last
      time around, we find no page with follow_page_mask, and we call
      faultin_page, which calls hugetlb_fault for the second time.
      
      This issue also slightly changes how __get_user_pages works.  Before, it
      only returned error if it had made no progress (i = 0).  But now,
      follow_hugetlb_page can clobber "i" with an error code since its new
      return path doesn't check for progress.  So if "i" is nonzero before a
      failing call to follow_hugetlb_page, that indication of progress is lost
      and __get_user_pages can return error even if some pages were
      successfully pinned.
      
      To fix this, change follow_hugetlb_page so that it updates nr_pages,
      allowing __get_user_pages to fail immediately and restoring the "error
      only if no progress" behavior to __get_user_pages.
      
      Tested that __get_user_pages returns when expected on error from
      hugetlb_fault in follow_hugetlb_page.
      
      Fixes: 9a291a7c ("mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified")
      Link: http://lkml.kernel.org/r/1500406795-58462-1-git-send-email-daniel.m.jordan@oracle.comSigned-off-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
      Acked-by: NPunit Agrawal <punit.agrawal@arm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: zhong jiang <zhongjiang@huawei.com>
      Cc: <stable@vger.kernel.org>	[4.12.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2be7cfed
    • L
      Merge tag 'platform-drivers-x86-v4.13-3' of git://git.infradead.org/linux-platform-drivers-x86 · 4d3f5d04
      Linus Torvalds 提交于
      Pull x86 platform driver fixes from Darren Hart:
       "Fix two bugs under error or abnormal usage conditions. Correct a
        config dependency:
      
        dell-wmi:
         - Fix driver interface version query
      
        wmi:
         - Fix error handling in acpi_wmi_init()
      
        peaq-wmi:
         - select INPUT_POLLDEV"
      
      * tag 'platform-drivers-x86-v4.13-3' of git://git.infradead.org/linux-platform-drivers-x86:
        platform/x86: dell-wmi: Fix driver interface version query
        platform/x86: wmi: Fix error handling in acpi_wmi_init()
        platform/x86: peaq-wmi: select INPUT_POLLDEV
      4d3f5d04
  4. 02 8月, 2017 6 次提交
  5. 01 8月, 2017 6 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · bc78d646
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Handle notifier registry failures properly in tun/tap driver, from
          Tonghao Zhang.
      
       2) Fix bpf verifier handling of subtraction bounds and add a testcase
          for this, from Edward Cree.
      
       3) Increase reset timeout in ftgmac100 driver, from Ben Herrenschmidt.
      
       4) Fix use after free in prd_retire_rx_blk_timer_exired() in AF_PACKET,
          from Cong Wang.
      
       5) Fix SElinux regression due to recent UDP optimizations, from Paolo
          Abeni.
      
       6) We accidently increment IPSTATS_MIB_FRAGFAILS in the ipv6 code
          paths, fix from Stefano Brivio.
      
       7) Fix some mem leaks in dccp, from Xin Long.
      
       8) Adjust MDIO_BUS kconfig deps to avoid build errors, from Arnd
          Bergmann.
      
       9) Mac address length check and buffer size fixes from Cong Wang.
      
      10) Don't leak sockets in ipv6 udp early demux, from Paolo Abeni.
      
      11) Fix return value when copy_from_user() fails in
          bpf_prog_get_info_by_fd(), from Daniel Borkmann.
      
      12) Handle PHY_HALTED properly in phy library state machine, from
          Florian Fainelli.
      
      13) Fix OOPS in fib_sync_down_dev(), from Ido Schimmel.
      
      14) Fix truesize calculation in virtio_net which led to performance
          regressions, from Michael S Tsirkin.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits)
        samples/bpf: fix bpf tunnel cleanup
        udp6: fix jumbogram reception
        ppp: Fix a scheduling-while-atomic bug in del_chan
        Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config"
        virtio_net: fix truesize for mergeable buffers
        mv643xx_eth: fix of_irq_to_resource() error check
        MAINTAINERS: Add more files to the PHY LIBRARY section
        ipv4: fib: Fix NULL pointer deref during fib_sync_down_dev()
        net: phy: Correctly process PHY_HALTED in phy_stop_machine()
        sunhme: fix up GREG_STAT and GREG_IMASK register offsets
        bpf: fix bpf_prog_get_info_by_fd to dump correct xlated_prog_len
        tcp: avoid bogus gcc-7 array-bounds warning
        net: tc35815: fix spelling mistake: "Intterrupt" -> "Interrupt"
        bpf: don't indicate success when copy_from_user fails
        udp6: fix socket leak on early demux
        net: thunderx: Fix BGX transmit stall due to underflow
        Revert "vhost: cache used event for better performance"
        team: use a larger struct for mac address
        net: check dev->addr_len for dev_set_mac_address()
        phy: bcm-ns-usb3: fix MDIO_BUS dependency
        ...
      bc78d646
    • W
      samples/bpf: fix bpf tunnel cleanup · cc75f851
      William Tu 提交于
      test_tunnel_bpf.sh fails to remove the vxlan11 tunnel device, causing the
      next geneve tunnelling test case fails.  In addition, the geneve reserved bit
      in tcbpf2_kern.c should be zero, according to the RFC.
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc75f851
    • P
      udp6: fix jumbogram reception · cb891fa6
      Paolo Abeni 提交于
      Since commit 67a51780 ("ipv6: udp: leverage scratch area
      helpers") udp6_recvmsg() read the skb len from the scratch area,
      to avoid a cache miss.
      But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their
      length exceeds the 16 bits available in the scratch area. As a side
      effect the length returned by recvmsg() is:
      <ingress datagram len> % (1<<16)
      
      This commit addresses the issue allocating one more bit in the
      IP6CB flags field and setting it for incoming jumbograms.
      Such field is still in the first cacheline, so at recvmsg()
      time we can check it and fallback to access skb->len if
      required, without a measurable overhead.
      
      Fixes: 67a51780 ("ipv6: udp: leverage scratch area helpers")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb891fa6
    • G
      ppp: Fix a scheduling-while-atomic bug in del_chan · ddab8282
      Gao Feng 提交于
      The PPTP set the pptp_sock_destruct as the sock's sk_destruct, it would
      trigger this bug when __sk_free is invoked in atomic context, because of
      the call path pptp_sock_destruct->del_chan->synchronize_rcu.
      
      Now move the synchronize_rcu to pptp_release from del_chan. This is the
      only one case which would free the sock and need the synchronize_rcu.
      
      The following is the panic I met with kernel 3.3.8, but this issue should
      exist in current kernel too according to the codes.
      
      BUG: scheduling while atomic
      __schedule_bug+0x5e/0x64
      __schedule+0x55/0x580
      ? ppp_unregister_channel+0x1cd5/0x1de0 [ppp_generic]
      ? dev_hard_start_xmit+0x423/0x530
      ? sch_direct_xmit+0x73/0x170
      __cond_resched+0x16/0x30
      _cond_resched+0x22/0x30
      wait_for_common+0x18/0x110
      ? call_rcu_bh+0x10/0x10
      wait_for_completion+0x12/0x20
      wait_rcu_gp+0x34/0x40
      ? wait_rcu_gp+0x40/0x40
      synchronize_sched+0x1e/0x20
      0xf8417298
      0xf8417484
      ? sock_queue_rcv_skb+0x109/0x130
      __sk_free+0x16/0x110
      ? udp_queue_rcv_skb+0x1f2/0x290
      sk_free+0x16/0x20
      __udp4_lib_rcv+0x3b8/0x650
      Signed-off-by: NGao Feng <gfree.wind@vip.163.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ddab8282
    • F
      Revert "net: bcmgenet: Remove init parameter from bcmgenet_mii_config" · 00d51094
      Florian Fainelli 提交于
      This reverts commit 28b45910 ("net: bcmgenet: Remove init parameter
      from bcmgenet_mii_config") because in the process of moving from
      dev_info() to dev_info_once() we essentially lost the helpful printed
      messages once the second instance of the driver is loaded.
      dev_info_once() does not actually print the message once per device
      instance, but once period.
      
      Fixes: 28b45910 ("net: bcmgenet: Remove init parameter from bcmgenet_mii_config")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NDoug Berger <opendmb@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00d51094
    • M
      virtio_net: fix truesize for mergeable buffers · 1daa8790
      Michael S. Tsirkin 提交于
      Seth Forshee noticed a performance degradation with some workloads.
      This turns out to be due to packet drops.  Euan Kemp noticed that this
      is because we drop all packets where length exceeds the truesize, but
      for some packets we add in extra memory without updating the truesize.
      This in turn was kept around unchanged from ab7db917 ("virtio-net:
      auto-tune mergeable rx buffer size for improved performance").  That
      commit had an internal reason not to account for the extra space: not
      enough bits to do it.  No longer true so let's account for the allocated
      length exactly.
      
      Many thanks to Seth Forshee for the report and bisecting and Euan Kemp
      for debugging the issue.
      
      Fixes: 680557cf ("virtio_net: rework mergeable buffer handling")
      Reported-by: NEuan Kemp <euan.kemp@coreos.com>
      Tested-by: NEuan Kemp <euan.kemp@coreos.com>
      Reported-by: NSeth Forshee <seth.forshee@canonical.com>
      Tested-by: NSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1daa8790