1. 11 6月, 2019 35 次提交
    • D
      genwqe: Prevent an integer overflow in the ioctl · 67bdeb0c
      Dan Carpenter 提交于
      commit 110080cea0d0e4dfdb0b536e7f8a5633ead6a781 upstream.
      
      There are a couple potential integer overflows here.
      
      	round_up(m->size + (m->addr & ~PAGE_MASK), PAGE_SIZE);
      
      The first thing is that the "m->size + (...)" addition could overflow,
      and the second is that round_up() overflows to zero if the result is
      within PAGE_SIZE of the type max.
      
      In this code, the "m->size" variable is an u64 but we're saving the
      result in "map_size" which is an unsigned long and genwqe_user_vmap()
      takes an unsigned long as well.  So I have used ULONG_MAX as the upper
      bound.  From a practical perspective unsigned long is fine/better than
      trying to change all the types to u64.
      
      Fixes: eaf4722d ("GenWQE Character device and DDCB queue")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67bdeb0c
    • G
      Revert "MIPS: perf: ath79: Fix perfcount IRQ assignment" · 221c44d2
      Greg Kroah-Hartman 提交于
      This reverts commit ca864881 which is
      commit a1e8783db8e0d58891681bc1e6d9ada66eae8e20 upstream.
      
      Petr writes:
      	Karl has reported to me today, that he's experiencing weird
      	reboot hang on his devices with 4.9.180 kernel and that he has
      	bisected it down to my backported patch.
      
      	I would like to kindly ask you for removal of this patch.  This
      	patch should be reverted from all stable kernels up to 5.1,
      	because perf counters were not broken on those kernels, and this
      	patch won't work on the ath79 legacy IRQ code anyway, it needs
      	new irqchip driver which was enabled on ath79 with commit
      	51fa4f8912c0 ("MIPS: ath79: drop legacy IRQ code").
      Reported-by: NPetr Štetiar <ynezz@true.cz>
      Cc: Kevin 'ldir' Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
      Cc: John Crispin <john@phrozen.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: linux-mips@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: Sasha Levin <sashal@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      221c44d2
    • P
      MIPS: pistachio: Build uImage.gz by default · 2d9d3ab5
      Paul Burton 提交于
      commit e4f2d1af7163becb181419af9dece9206001e0a6 upstream.
      
      The pistachio platform uses the U-Boot bootloader & generally boots a
      kernel in the uImage format. As such it's useful to build one when
      building the kernel, but to do so currently requires the user to
      manually specify a uImage target on the make command line.
      
      Make uImage.gz the pistachio platform's default build target, so that
      the default is to build a kernel image that we can actually boot on a
      board such as the MIPS Creator Ci40.
      
      Marked for stable backport as far as v4.1 where pistachio support was
      introduced. This is primarily useful for CI systems such as kernelci.org
      which will benefit from us building a suitable image which can then be
      booted as part of automated testing, extending our test coverage to the
      affected stable branches.
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
      Reviewed-by: NKevin Hilman <khilman@baylibre.com>
      Tested-by: NKevin Hilman <khilman@baylibre.com>
      URL: https://groups.io/g/kernelci/message/388
      Cc: stable@vger.kernel.org # v4.1+
      Cc: linux-mips@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d9d3ab5
    • P
      MIPS: Bounds check virt_addr_valid · eee60963
      Paul Burton 提交于
      commit 074a1e1167afd82c26f6d03a9a8b997d564bb241 upstream.
      
      The virt_addr_valid() function is meant to return true iff
      virt_to_page() will return a valid struct page reference. This is true
      iff the address provided is found within the unmapped address range
      between PAGE_OFFSET & MAP_BASE, but we don't currently check for that
      condition. Instead we simply mask the address to obtain what will be a
      physical address if the virtual address is indeed in the desired range,
      shift it to form a PFN & then call pfn_valid(). This can incorrectly
      return true if called with a virtual address which, after masking,
      happens to form a physical address corresponding to a valid PFN.
      
      For example we may vmalloc an address in the kernel mapped region
      starting a MAP_BASE & obtain the virtual address:
      
        addr = 0xc000000000002000
      
      When masked by virt_to_phys(), which uses __pa() & in turn CPHYSADDR(),
      we obtain the following (bogus) physical address:
      
        addr = 0x2000
      
      In a common system with PHYS_OFFSET=0 this will correspond to a valid
      struct page which should really be accessed by virtual address
      PAGE_OFFSET+0x2000, causing virt_addr_valid() to incorrectly return 1
      indicating that the original address corresponds to a struct page.
      
      This is equivalent to the ARM64 change made in commit ca219452
      ("arm64: Correctly bounds check virt_addr_valid").
      
      This fixes fallout when hardened usercopy is enabled caused by the
      related commit 517e1fbe ("mm/usercopy: Drop extra
      is_vmalloc_or_module() check") which removed a check for the vmalloc
      range that was present from the introduction of the hardened usercopy
      feature.
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      References: ca219452 ("arm64: Correctly bounds check virt_addr_valid")
      References: 517e1fbe ("mm/usercopy: Drop extra is_vmalloc_or_module() check")
      Reported-by: NJulien Cristau <jcristau@debian.org>
      Reviewed-by: NPhilippe Mathieu-Daudé <f4bug@amsat.org>
      Tested-by: NYunQiang Su <ysu@wavecomp.com>
      URL: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=929366
      Cc: stable@vger.kernel.org # v4.12+
      Cc: linux-mips@vger.kernel.org
      Cc: Yunqiang Su <ysu@wavecomp.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eee60963
    • R
      xen-blkfront: switch kcalloc to kvcalloc for large array allocation · b9b75a46
      Roger Pau Monne 提交于
      commit 1d5c76e66433382a1e170d1d5845bb0fed7467aa upstream.
      
      There's no reason to request physically contiguous memory for those
      allocations.
      
      [boris: added CC to stable]
      
      Cc: stable@vger.kernel.org
      Reported-by: NIan Jackson <ian.jackson@citrix.com>
      Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9b75a46
    • G
      s390/mm: fix address space detection in exception handling · 7aad9269
      Gerald Schaefer 提交于
      commit 962f0af83c239c0aef05639631e871c874b00f99 upstream.
      
      Commit 0aaba41b ("s390: remove all code using the access register
      mode") removed access register mode from the kernel, and also from the
      address space detection logic. However, user space could still switch
      to access register mode (trans_exc_code == 1), and exceptions in that
      mode would not be correctly assigned.
      
      Fix this by adding a check for trans_exc_code == 1 to get_fault_type(),
      and remove the wrong comment line before that function.
      
      Fixes: 0aaba41b ("s390: remove all code using the access register mode")
      Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@vger.kernel.org> # v4.15+
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7aad9269
    • R
      i2c: xiic: Add max_read_len quirk · 7737eff0
      Robert Hancock 提交于
      commit 49b809586730a77b57ce620b2f9689de765d790b upstream.
      
      This driver does not support reading more than 255 bytes at once because
      the register for storing the number of bytes to read is only 8 bits. Add
      a max_read_len quirk to enforce this.
      
      This was found when using this driver with the SFP driver, which was
      previously reading all 256 bytes in the SFP EEPROM in one transaction.
      This caused a bunch of hard-to-debug errors in the xiic driver since the
      driver/logic was treating the number of bytes to read as zero.
      Rejecting transactions that aren't supported at least allows the problem
      to be diagnosed more easily.
      Signed-off-by: NRobert Hancock <hancock@sedsystems.ca>
      Reviewed-by: NMichal Simek <michal.simek@xilinx.com>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      Cc: stable@kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7737eff0
    • J
      x86/insn-eval: Fix use-after-free access to LDT entry · b598ddc7
      Jann Horn 提交于
      commit de9f869616dd95e95c00bdd6b0fcd3421e8a4323 upstream.
      
      get_desc() computes a pointer into the LDT while holding a lock that
      protects the LDT from being freed, but then drops the lock and returns the
      (now potentially dangling) pointer to its caller.
      
      Fix it by giving the caller a copy of the LDT entry instead.
      
      Fixes: 670f928b ("x86/insn-eval: Add utility function to get segment descriptor")
      Cc: stable@vger.kernel.org
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b598ddc7
    • J
      x86/power: Fix 'nosmt' vs hibernation triple fault during resume · 4d166206
      Jiri Kosina 提交于
      commit ec527c318036a65a083ef68d8ba95789d2212246 upstream.
      
      As explained in
      
      	0cc3cd21 ("cpu/hotplug: Boot HT siblings at least once")
      
      we always, no matter what, have to bring up x86 HT siblings during boot at
      least once in order to avoid first MCE bringing the system to its knees.
      
      That means that whenever 'nosmt' is supplied on the kernel command-line,
      all the HT siblings are as a result sitting in mwait or cpudile after
      going through the online-offline cycle at least once.
      
      This causes a serious issue though when a kernel, which saw 'nosmt' on its
      commandline, is going to perform resume from hibernation: if the resume
      from the hibernated image is successful, cr3 is flipped in order to point
      to the address space of the kernel that is being resumed, which in turn
      means that all the HT siblings are all of a sudden mwaiting on address
      which is no longer valid.
      
      That results in triple fault shortly after cr3 is switched, and machine
      reboots.
      
      Fix this by always waking up all the SMT siblings before initiating the
      'restore from hibernation' process; this guarantees that all the HT
      siblings will be properly carried over to the resumed kernel waiting in
      resume_play_dead(), and acted upon accordingly afterwards, based on the
      target kernel configuration.
      
      Symmetricaly, the resumed kernel has to push the SMT siblings to mwait
      again in case it has SMT disabled; this means it has to online all
      the siblings when resuming (so that they come out of hlt) and offline
      them again to let them reach mwait.
      
      Cc: 4.19+ <stable@vger.kernel.org> # v4.19+
      Debugged-by: NThomas Gleixner <tglx@linutronix.de>
      Fixes: 0cc3cd21 ("cpu/hotplug: Boot HT siblings at least once")
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d166206
    • K
      pstore/ram: Run without kernel crash dump region · f4d0227f
      Kees Cook 提交于
      commit 8880fa32c557600f5f624084152668ed3c2ea51e upstream.
      
      The ram pstore backend has always had the crash dumper frontend enabled
      unconditionally. However, it was possible to effectively disable it
      by setting a record_size=0. All the machinery would run (storing dumps
      to the temporary crash buffer), but 0 bytes would ultimately get stored
      due to there being no przs allocated for dumps. Commit 89d328f637b9
      ("pstore/ram: Correctly calculate usable PRZ bytes"), however, assumed
      that there would always be at least one allocated dprz for calculating
      the size of the temporary crash buffer. This was, of course, not the
      case when record_size=0, and would lead to a NULL deref trying to find
      the dprz buffer size:
      
      BUG: unable to handle kernel NULL pointer dereference at (null)
      ...
      IP: ramoops_probe+0x285/0x37e (fs/pstore/ram.c:808)
      
              cxt->pstore.bufsize = cxt->dprzs[0]->buffer_size;
      
      Instead, we need to only enable the frontends based on the success of the
      prz initialization and only take the needed actions when those zones are
      available. (This also fixes a possible error in detecting if the ftrace
      frontend should be enabled.)
      Reported-and-tested-by: NYaro Slav <yaro330@gmail.com>
      Fixes: 89d328f637b9 ("pstore/ram: Correctly calculate usable PRZ bytes")
      Cc: stable@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f4d0227f
    • P
      pstore: Set tfm to NULL on free_buf_for_compression · aa73a3b2
      Pi-Hsun Shih 提交于
      commit a9fb94a99bb515d8720ba8440ce3aba84aec80f8 upstream.
      
      Set tfm to NULL on free_buf_for_compression() after crypto_free_comp().
      
      This avoid a use-after-free when allocate_buf_for_compression()
      and free_buf_for_compression() are called twice. Although
      free_buf_for_compression() freed the tfm, allocate_buf_for_compression()
      won't reinitialize the tfm since the tfm pointer is not NULL.
      
      Fixes: 95047b0519c1 ("pstore: Refactor compression initialization")
      Signed-off-by: NPi-Hsun Shih <pihsun@chromium.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aa73a3b2
    • K
      pstore: Convert buf_lock to semaphore · d4128a1b
      Kees Cook 提交于
      commit ea84b580b95521644429cc6748b6c2bf27c8b0f3 upstream.
      
      Instead of running with interrupts disabled, use a semaphore. This should
      make it easier for backends that may need to sleep (e.g. EFI) when
      performing a write:
      
      |BUG: sleeping function called from invalid context at kernel/sched/completion.c:99
      |in_atomic(): 1, irqs_disabled(): 1, pid: 2236, name: sig-xstate-bum
      |Preemption disabled at:
      |[<ffffffff99d60512>] pstore_dump+0x72/0x330
      |CPU: 26 PID: 2236 Comm: sig-xstate-bum Tainted: G      D           4.20.0-rc3 #45
      |Call Trace:
      | dump_stack+0x4f/0x6a
      | ___might_sleep.cold.91+0xd3/0xe4
      | __might_sleep+0x50/0x90
      | wait_for_completion+0x32/0x130
      | virt_efi_query_variable_info+0x14e/0x160
      | efi_query_variable_store+0x51/0x1a0
      | efivar_entry_set_safe+0xa3/0x1b0
      | efi_pstore_write+0x109/0x140
      | pstore_dump+0x11c/0x330
      | kmsg_dump+0xa4/0xd0
      | oops_exit+0x22/0x30
      ...
      Reported-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Fixes: 21b3ddd3 ("efi: Don't use spinlocks for efi vars")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d4128a1b
    • K
      pstore: Remove needless lock during console writes · c63ce716
      Kees Cook 提交于
      commit b77fa617a2ff4d6beccad3d3d4b3a1f2d10368aa upstream.
      
      Since the console writer does not use the preallocated crash dump buffer
      any more, there is no reason to perform locking around it.
      
      Fixes: 70ad35db ("pstore: Convert console write to use ->write_buf")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c63ce716
    • M
      fuse: fallocate: fix return with locked inode · a3b8b4ad
      Miklos Szeredi 提交于
      commit 35d6fcbb7c3e296a52136347346a698a35af3fda upstream.
      
      Do the proper cleanup in case the size check fails.
      
      Tested with xfstests:generic/228
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Fixes: 0cbade024ba5 ("fuse: honor RLIMIT_FSIZE in fuse_file_fallocate")
      Cc: Liu Bo <bo.liu@linux.alibaba.com>
      Cc: <stable@vger.kernel.org> # v3.5
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a3b8b4ad
    • Y
      NFSv4.1: Fix bug only first CB_NOTIFY_LOCK is handled · 56e3f73e
      Yihao Wu 提交于
      commit ba851a39c9703f09684a541885ed176f8fb7c868 upstream.
      
      When a waiter is waked by CB_NOTIFY_LOCK, it will retry
      nfs4_proc_setlk(). The waiter may fail to nfs4_proc_setlk() and sleep
      again. However, the waiter is already removed from clp->cl_lock_waitq
      when handling CB_NOTIFY_LOCK in nfs4_wake_lock_waiter(). So any
      subsequent CB_NOTIFY_LOCK won't wake this waiter anymore. We should
      put the waiter back to clp->cl_lock_waitq before retrying.
      
      Cc: stable@vger.kernel.org #4.9+
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56e3f73e
    • Y
      NFSv4.1: Again fix a race where CB_NOTIFY_LOCK fails to wake a waiter · ea0327b4
      Yihao Wu 提交于
      commit 52b042ab9948cc367b61f9ca9c18603aa7813c3a upstream.
      
      Commit b7dbcc0e "NFSv4.1: Fix a race where CB_NOTIFY_LOCK fails to wake a waiter"
      found this bug. However it didn't fix it.
      
      This commit replaces schedule_timeout() with wait_woken() and
      default_wake_function() with woken_wake_function() in function
      nfs4_retry_setlk() and nfs4_wake_lock_waiter(). wait_woken() uses
      memory barriers in its implementation to avoid potential race condition
      when putting a process into sleeping state and then waking it up.
      
      Fixes: a1d617d8 ("nfs: allow blocking locks to be awoken by lock callbacks")
      Cc: stable@vger.kernel.org #4.9+
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ea0327b4
    • J
      parisc: Use implicit space register selection for loading the coherence index of I/O pdirs · 384c1d93
      John David Anglin 提交于
      commit 63923d2c3800919774f5c651d503d1dd2adaddd5 upstream.
      
      We only support I/O to kernel space. Using %sr1 to load the coherence
      index may be racy unless interrupts are disabled. This patch changes the
      code used to load the coherence index to use implicit space register
      selection. This saves one instruction and eliminates the race.
      
      Tested on rp3440, c8000 and c3750.
      Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      384c1d93
    • L
      rcu: locking and unlocking need to always be at least barriers · 6726307d
      Linus Torvalds 提交于
      commit 66be4e66a7f422128748e3c3ef6ee72b20a6197b upstream.
      
      Herbert Xu pointed out that commit bb73c52b ("rcu: Don't disable
      preemption for Tiny and Tree RCU readers") was incorrect in making the
      preempt_disable/enable() be conditional on CONFIG_PREEMPT_COUNT.
      
      If CONFIG_PREEMPT_COUNT isn't enabled, the preemption enable/disable is
      a no-op, but still is a compiler barrier.
      
      And RCU locking still _needs_ that compiler barrier.
      
      It is simply fundamentally not true that RCU locking would be a complete
      no-op: we still need to guarantee (for example) that things that can
      trap and cause preemption cannot migrate into the RCU locked region.
      
      The way we do that is by making it a barrier.
      
      See for example commit 386afc91 ("spinlocks and preemption points
      need to be at least compiler barriers") from back in 2013 that had
      similar issues with spinlocks that become no-ops on UP: they must still
      constrain the compiler from moving other operations into the critical
      region.
      
      Now, it is true that a lot of RCU operations already use READ_ONCE() and
      WRITE_ONCE() (which in practice likely would never be re-ordered wrt
      anything remotely interesting), but it is also true that that is not
      globally the case, and that it's not even necessarily always possible
      (ie bitfields etc).
      Reported-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Fixes: bb73c52b ("rcu: Don't disable preemption for Tiny and Tree RCU readers")
      Cc: stable@kernel.org
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6726307d
    • E
      mtd: spinand: macronix: Fix ECC Status Read · 39e597d2
      Emil Lenngren 提交于
      commit f4cb4d7b46f6409382fd981eec9556e1f3c1dc5d upstream.
      
      The datasheet specifies the upper four bits are reserved.
      Testing on real hardware shows that these bits can indeed be nonzero.
      Signed-off-by: NEmil Lenngren <emil.lenngren@gmail.com>
      Reviewed-by: NBoris Brezillon <boris.brezillon@bootlin.com>
      Signed-off-by: NMiquel Raynal <miquel.raynal@bootlin.com>
      Cc: Christian Lamparter <chunkeey@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39e597d2
    • O
      ipv6: fix EFAULT on sendto with icmpv6 and hdrincl · 2488b9f9
      Olivier Matz 提交于
      [ Upstream commit b9aa52c4cb457e7416cc0c95f475e72ef4a61336 ]
      
      The following code returns EFAULT (Bad address):
      
        s = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
        setsockopt(s, SOL_IPV6, IPV6_HDRINCL, 1);
        sendto(ipv6_icmp6_packet, addr);   /* returns -1, errno = EFAULT */
      
      The IPv4 equivalent code works. A workaround is to use IPPROTO_RAW
      instead of IPPROTO_ICMPV6.
      
      The failure happens because 2 bytes are eaten from the msghdr by
      rawv6_probe_proto_opt() starting from commit 19e3c66b ("ipv6
      equivalent of "ipv4: Avoid reading user iov twice after
      raw_probe_proto_opt""), but at that time it was not a problem because
      IPV6_HDRINCL was not yet introduced.
      
      Only eat these 2 bytes if hdrincl == 0.
      
      Fixes: 715f504b ("ipv6: add IPV6_HDRINCL option for raw sockets")
      Signed-off-by: NOlivier Matz <olivier.matz@6wind.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2488b9f9
    • O
      ipv6: use READ_ONCE() for inet->hdrincl as in ipv4 · 0b16d956
      Olivier Matz 提交于
      [ Upstream commit 59e3e4b52663a9d97efbce7307f62e4bc5c9ce91 ]
      
      As it was done in commit 8f659a03 ("net: ipv4: fix for a race
      condition in raw_sendmsg") and commit 20b50d79 ("net: ipv4: emulate
      READ_ONCE() on ->hdrincl bit-field in raw_sendmsg()") for ipv4, copy the
      value of inet->hdrincl in a local variable, to avoid introducing a race
      condition in the next commit.
      Signed-off-by: NOlivier Matz <olivier.matz@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0b16d956
    • H
      Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied" · d769853d
      Hangbin Liu 提交于
      [ Upstream commit 4970b42d5c362bf873982db7d93245c5281e58f4 ]
      
      This reverts commit e9919a24d3022f72bcadc407e73a6ef17093a849.
      
      Nathan reported the new behaviour breaks Android, as Android just add
      new rules and delete old ones.
      
      If we return 0 without adding dup rules, Android will remove the new
      added rules and causing system to soft-reboot.
      
      Fixes: e9919a24d302 ("fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied")
      Reported-by: NNathan Chancellor <natechancellor@gmail.com>
      Reported-by: NYaro Slav <yaro330@gmail.com>
      Reported-by: NMaciej Żenczykowski <zenczykowski@gmail.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
      Tested-by: NNathan Chancellor <natechancellor@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d769853d
    • P
      pktgen: do not sleep with the thread lock held. · 396244b6
      Paolo Abeni 提交于
      [ Upstream commit 720f1de4021f09898b8c8443f3b3e995991b6e3a ]
      
      Currently, the process issuing a "start" command on the pktgen procfs
      interface, acquires the pktgen thread lock and never release it, until
      all pktgen threads are completed. The above can blocks indefinitely any
      other pktgen command and any (even unrelated) netdevice removal - as
      the pktgen netdev notifier acquires the same lock.
      
      The issue is demonstrated by the following script, reported by Matteo:
      
      ip -b - <<'EOF'
      	link add type dummy
      	link add type veth
      	link set dummy0 up
      EOF
      modprobe pktgen
      echo reset >/proc/net/pktgen/pgctrl
      {
      	echo rem_device_all
      	echo add_device dummy0
      } >/proc/net/pktgen/kpktgend_0
      echo count 0 >/proc/net/pktgen/dummy0
      echo start >/proc/net/pktgen/pgctrl &
      sleep 1
      rmmod veth
      
      Fix the above releasing the thread lock around the sleep call.
      
      Additionally we must prevent racing with forcefull rmmod - as the
      thread lock no more protects from them. Instead, acquire a self-reference
      before waiting for any thread. As a side effect, running
      
      rmmod pktgen
      
      while some thread is running now fails with "module in use" error,
      before this patch such command hanged indefinitely.
      
      Note: the issue predates the commit reported in the fixes tag, but
      this fix can't be applied before the mentioned commit.
      
      v1 -> v2:
       - no need to check for thread existence after flipping the lock,
         pktgen threads are freed only at net exit time
       -
      
      Fixes: 6146e6a4 ("[PKTGEN]: Removes thread_{un,}lock() macros.")
      Reported-and-tested-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      396244b6
    • W
      packet: unconditionally free po->rollover · da096fe1
      Willem de Bruijn 提交于
      [ Upstream commit afa0925c6fcc6a8f610e996ca09bc3215048033c ]
      
      Rollover used to use a complex RCU mechanism for assignment, which had
      a race condition. The below patch fixed the bug and greatly simplified
      the logic.
      
      The feature depends on fanout, but the state is private to the socket.
      Fanout_release returns f only when the last member leaves and the
      fanout struct is to be freed.
      
      Destroy rollover unconditionally, regardless of fanout state.
      
      Fixes: 57f015f5 ("packet: fix crash in fanout_demux_rollover()")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Diagnosed-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da096fe1
    • J
      net/tls: replace the sleeping lock around RX resync with a bit lock · be0343af
      Jakub Kicinski 提交于
      [ Upstream commit e52972c11d6b1262964db96d65934196db621685 ]
      
      Commit 38030d7cb779 ("net/tls: avoid NULL-deref on resync during device removal")
      tried to fix a potential NULL-dereference by taking the
      context rwsem.  Unfortunately the RX resync may get called
      from soft IRQ, so we can't use the rwsem to protect from
      the device disappearing.  Because we are guaranteed there
      can be only one resync at a time (it's called from strparser)
      use a bit to indicate resync is busy and make device
      removal wait for the bit to get cleared.
      
      Note that there is a leftover "flags" field in struct
      tls_context already.
      
      Fixes: 4799ac81 ("tls: Add rx inline crypto offload")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be0343af
    • R
      net: sfp: read eeprom in maximum 16 byte increments · 9740f4ff
      Russell King 提交于
      [ Upstream commit 28e74a7cfd6403f0d1c0f8b10b45d6fae37b227e ]
      
      Some SFP modules do not like reads longer than 16 bytes, so read the
      EEPROM in chunks of 16 bytes at a time.  This behaviour is not specified
      in the SFP MSAs, which specifies:
      
       "The serial interface uses the 2-wire serial CMOS E2PROM protocol
        defined for the ATMEL AT24C01A/02/04 family of components."
      
      and
      
       "As long as the SFP+ receives an acknowledge, it shall serially clock
        out sequential data words. The sequence is terminated when the host
        responds with a NACK and a STOP instead of an acknowledge."
      
      We must avoid breaking a read across a 16-bit quantity in the diagnostic
      page, thankfully all 16-bit quantities in that page are naturally
      aligned.
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9740f4ff
    • Z
      net: rds: fix memory leak in rds_ib_flush_mr_pool · 7700d5af
      Zhu Yanjun 提交于
      [ Upstream commit 85cb928787eab6a2f4ca9d2a798b6f3bed53ced1 ]
      
      When the following tests last for several hours, the problem will occur.
      
      Server:
          rds-stress -r 1.1.1.16 -D 1M
      Client:
          rds-stress -r 1.1.1.14 -s 1.1.1.16 -D 1M -T 30
      
      The following will occur.
      
      "
      Starting up....
      tsks   tx/s   rx/s  tx+rx K/s    mbi K/s    mbo K/s tx us/c   rtt us cpu
      %
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
        1      0      0       0.00       0.00       0.00    0.00 0.00 -1.00
      "
      >From vmcore, we can find that clean_list is NULL.
      
      >From the source code, rds_mr_flushd calls rds_ib_mr_pool_flush_worker.
      Then rds_ib_mr_pool_flush_worker calls
      "
       rds_ib_flush_mr_pool(pool, 0, NULL);
      "
      Then in function
      "
      int rds_ib_flush_mr_pool(struct rds_ib_mr_pool *pool,
                               int free_all, struct rds_ib_mr **ibmr_ret)
      "
      ibmr_ret is NULL.
      
      In the source code,
      "
      ...
      list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail);
      if (ibmr_ret)
              *ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);
      
      /* more than one entry in llist nodes */
      if (clean_nodes->next)
              llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list);
      ...
      "
      When ibmr_ret is NULL, llist_entry is not executed. clean_nodes->next
      instead of clean_nodes is added in clean_list.
      So clean_nodes is discarded. It can not be used again.
      The workqueue is executed periodically. So more and more clean_nodes are
      discarded. Finally the clean_list is NULL.
      Then this problem will occur.
      
      Fixes: 1bc144b6 ("net, rds, Replace xlist in net/rds/xlist.h with llist")
      Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7700d5af
    • M
      net: mvpp2: Use strscpy to handle stat strings · c6a020e0
      Maxime Chevallier 提交于
      [ Upstream commit d37acd5aa99c57505b64913e0e2624ec3daed8c5 ]
      
      Use a safe strscpy call to copy the ethtool stat strings into the
      relevant buffers, instead of a memcpy that will be accessing
      out-of-bound data.
      
      Fixes: 118d6298 ("net: mvpp2: add ethtool GOP statistics")
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c6a020e0
    • E
      net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query · d305d61f
      Erez Alfasi 提交于
      [ Upstream commit 135dd9594f127c8a82d141c3c8430e9e2143216a ]
      
      Querying EEPROM high pages data for SFP module is currently
      not supported by our driver but is still tried, resulting in
      invalid FW queries.
      
      Set the EEPROM ethtool data length to 256 for SFP module to
      limit the reading for page 0 only and prevent invalid FW queries.
      
      Fixes: 7202da8b ("ethtool, net/mlx4_en: Cable info, get_module_info/eeprom ethtool support")
      Signed-off-by: NErez Alfasi <ereza@mellanox.com>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d305d61f
    • I
      net: ethernet: ti: cpsw_ethtool: fix ethtool ring param set · 831d6d07
      Ivan Khoronzhuk 提交于
      [ Upstream commit 09faf5a7d7c0bcb07faba072f611937af9dd5788 ]
      
      Fix ability to set RX descriptor number, the reason - initially
      "tx_max_pending" was set incorrectly, but the issue appears after
      adding sanity check, so fix is for "sanity" patch.
      
      Fixes: 37e2d99b ("ethtool: Ensure new ring parameters are within bounds during SRINGPARAM")
      Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Reviewed-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      831d6d07
    • D
      neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit · 893e2a5f
      David Ahern 提交于
      [ Upstream commit 4b2a2bfeb3f056461a90bd621e8bd7d03fa47f60 ]
      
      Commit cd9ff4de changed the key for IFF_POINTOPOINT devices to
      INADDR_ANY but neigh_xmit which is used for MPLS encapsulations was not
      updated to use the altered key. The result is that every packet Tx does
      a lookup on the gateway address which does not find an entry, a new one
      is created only to find the existing one in the table right before the
      insert since arp_constructor was updated to reset the primary key. This
      is seen in the allocs and destroys counters:
          ip -s -4 ntable show | head -10 | grep alloc
      
      which increase for each packet showing the unnecessary overhread.
      
      Fix by having neigh_xmit use __ipv4_neigh_lookup_noref for NEIGH_ARP_TABLE.
      
      Fixes: cd9ff4de ("ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY")
      Reported-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      893e2a5f
    • X
      ipv6: fix the check before getting the cookie in rt6_get_cookie · 9fd19a3b
      Xin Long 提交于
      [ Upstream commit b7999b07726c16974ba9ca3bb9fe98ecbec5f81c ]
      
      In Jianlin's testing, netperf was broken with 'Connection reset by peer',
      as the cookie check failed in rt6_check() and ip6_dst_check() always
      returned NULL.
      
      It's caused by Commit 93531c67 ("net/ipv6: separate handling of FIB
      entries from dst based routes"), where the cookie can be got only when
      'c1'(see below) for setting dst_cookie whereas rt6_check() is called
      when !'c1' for checking dst_cookie, as we can see in ip6_dst_check().
      
      Since in ip6_dst_check() both rt6_dst_from_check() (c1) and rt6_check()
      (!c1) will check the 'from' cookie, this patch is to remove the c1 check
      in rt6_get_cookie(), so that the dst_cookie can always be set properly.
      
      c1:
        (rt->rt6i_flags & RTF_PCPU || unlikely(!list_empty(&rt->rt6i_uncached)))
      
      Fixes: 93531c67 ("net/ipv6: separate handling of FIB entries from dst based routes")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9fd19a3b
    • X
      ipv4: not do cache for local delivery if bc_forwarding is enabled · daa11cc8
      Xin Long 提交于
      [ Upstream commit 0a90478b93a46bdcd56ba33c37566a993e455d54 ]
      
      With the topo:
      
          h1 ---| rp1            |
                |     route  rp3 |--- h3 (192.168.200.1)
          h2 ---| rp2            |
      
      If rp1 bc_forwarding is set while rp2 bc_forwarding is not, after
      doing "ping 192.168.200.255" on h1, then ping 192.168.200.255 on
      h2, and the packets can still be forwared.
      
      This issue was caused by the input route cache. It should only do
      the cache for either bc forwarding or local delivery. Otherwise,
      local delivery can use the route cache for bc forwarding of other
      interfaces.
      
      This patch is to fix it by not doing cache for local delivery if
      all.bc_forwarding is enabled.
      
      Note that we don't fix it by checking route cache local flag after
      rt_cache_valid() in "local_input:" and "ip_mkroute_input", as the
      common route code shouldn't be touched for bc_forwarding.
      
      Fixes: 5cbf777c ("route: add support for directed broadcast forwarding")
      Reported-by: NJianlin Shi <jishi@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      daa11cc8
    • N
      Fix memory leak in sctp_process_init · 05b933f2
      Neil Horman 提交于
      [ Upstream commit 0a8dd9f67cd0da7dc284f48b032ce00db1a68791 ]
      
      syzbot found the following leak in sctp_process_init
      BUG: memory leak
      unreferenced object 0xffff88810ef68400 (size 1024):
        comm "syz-executor273", pid 7046, jiffies 4294945598 (age 28.770s)
        hex dump (first 32 bytes):
          1d de 28 8d de 0b 1b e3 b5 c2 f9 68 fd 1a 97 25  ..(........h...%
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000a02cebbd>] kmemleak_alloc_recursive include/linux/kmemleak.h:55
      [inline]
          [<00000000a02cebbd>] slab_post_alloc_hook mm/slab.h:439 [inline]
          [<00000000a02cebbd>] slab_alloc mm/slab.c:3326 [inline]
          [<00000000a02cebbd>] __do_kmalloc mm/slab.c:3658 [inline]
          [<00000000a02cebbd>] __kmalloc_track_caller+0x15d/0x2c0 mm/slab.c:3675
          [<000000009e6245e6>] kmemdup+0x27/0x60 mm/util.c:119
          [<00000000dfdc5d2d>] kmemdup include/linux/string.h:432 [inline]
          [<00000000dfdc5d2d>] sctp_process_init+0xa7e/0xc20
      net/sctp/sm_make_chunk.c:2437
          [<00000000b58b62f8>] sctp_cmd_process_init net/sctp/sm_sideeffect.c:682
      [inline]
          [<00000000b58b62f8>] sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1384
      [inline]
          [<00000000b58b62f8>] sctp_side_effects net/sctp/sm_sideeffect.c:1194
      [inline]
          [<00000000b58b62f8>] sctp_do_sm+0xbdc/0x1d60 net/sctp/sm_sideeffect.c:1165
          [<0000000044e11f96>] sctp_assoc_bh_rcv+0x13c/0x200
      net/sctp/associola.c:1074
          [<00000000ec43804d>] sctp_inq_push+0x7f/0xb0 net/sctp/inqueue.c:95
          [<00000000726aa954>] sctp_backlog_rcv+0x5e/0x2a0 net/sctp/input.c:354
          [<00000000d9e249a8>] sk_backlog_rcv include/net/sock.h:950 [inline]
          [<00000000d9e249a8>] __release_sock+0xab/0x110 net/core/sock.c:2418
          [<00000000acae44fa>] release_sock+0x37/0xd0 net/core/sock.c:2934
          [<00000000963cc9ae>] sctp_sendmsg+0x2c0/0x990 net/sctp/socket.c:2122
          [<00000000a7fc7565>] inet_sendmsg+0x64/0x120 net/ipv4/af_inet.c:802
          [<00000000b732cbd3>] sock_sendmsg_nosec net/socket.c:652 [inline]
          [<00000000b732cbd3>] sock_sendmsg+0x54/0x70 net/socket.c:671
          [<00000000274c57ab>] ___sys_sendmsg+0x393/0x3c0 net/socket.c:2292
          [<000000008252aedb>] __sys_sendmsg+0x80/0xf0 net/socket.c:2330
          [<00000000f7bf23d1>] __do_sys_sendmsg net/socket.c:2339 [inline]
          [<00000000f7bf23d1>] __se_sys_sendmsg net/socket.c:2337 [inline]
          [<00000000f7bf23d1>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2337
          [<00000000a8b4131f>] do_syscall_64+0x76/0x1a0 arch/x86/entry/common.c:3
      
      The problem was that the peer.cookie value points to an skb allocated
      area on the first pass through this function, at which point it is
      overwritten with a heap allocated value, but in certain cases, where a
      COOKIE_ECHO chunk is included in the packet, a second pass through
      sctp_process_init is made, where the cookie value is re-allocated,
      leaking the first allocation.
      
      Fix is to always allocate the cookie value, and free it when we are done
      using it.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reported-by: syzbot+f7e9153b037eac9b1df8@syzkaller.appspotmail.com
      CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: netdev@vger.kernel.org
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      05b933f2
    • V
      ethtool: fix potential userspace buffer overflow · d6782b8c
      Vivien Didelot 提交于
      [ Upstream commit 0ee4e76937d69128a6a66861ba393ebdc2ffc8a2 ]
      
      ethtool_get_regs() allocates a buffer of size ops->get_regs_len(),
      and pass it to the kernel driver via ops->get_regs() for filling.
      
      There is no restriction about what the kernel drivers can or cannot do
      with the open ethtool_regs structure. They usually set regs->version
      and ignore regs->len or set it to the same size as ops->get_regs_len().
      
      But if userspace allocates a smaller buffer for the registers dump,
      we would cause a userspace buffer overflow in the final copy_to_user()
      call, which uses the regs.len value potentially reset by the driver.
      
      To fix this, make this case obvious and store regs.len before calling
      ops->get_regs(), to only copy as much data as requested by userspace,
      up to the value returned by ops->get_regs_len().
      
      While at it, remove the redundant check for non-null regbuf.
      Signed-off-by: NVivien Didelot <vivien.didelot@gmail.com>
      Reviewed-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6782b8c
  2. 09 6月, 2019 5 次提交