1. 28 2月, 2019 8 次提交
    • P
      locking/lockdep: Shrink struct lock_class_key · 28d49e28
      Peter Zijlstra 提交于
      Shrink struct lock_class_key; we never store anything in subkeys[], we
      only use the addresses.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      28d49e28
    • B
      kernel/workqueue: Use dynamic lockdep keys for workqueues · 669de8bd
      Bart Van Assche 提交于
      The following commit:
      
        87915adc ("workqueue: re-add lockdep dependencies for flushing")
      
      improved deadlock checking in the workqueue implementation. Unfortunately
      that patch also introduced a few false positive lockdep complaints.
      
      This patch suppresses these false positives by allocating the workqueue mutex
      lockdep key dynamically.
      
      An example of a false positive lockdep complaint suppressed by this patch
      can be found below. The root cause of the lockdep complaint shown below
      is that the direct I/O code can call alloc_workqueue() from inside a work
      item created by another alloc_workqueue() call and that both workqueues
      share the same lockdep key. This patch avoids that that lockdep complaint
      is triggered by allocating the work queue lockdep keys dynamically.
      
      In other words, this patch guarantees that a unique lockdep key is
      associated with each work queue mutex.
      
        ======================================================
        WARNING: possible circular locking dependency detected
        4.19.0-dbg+ #1 Not tainted
        fio/4129 is trying to acquire lock:
        00000000a01cfe1a ((wq_completion)"dio/%s"sb->s_id){+.+.}, at: flush_workqueue+0xd0/0x970
      
        but task is already holding lock:
        00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (&sb->s_type->i_mutex_key#14){+.+.}:
               down_write+0x3d/0x80
               __generic_file_fsync+0x77/0xf0
               ext4_sync_file+0x3c9/0x780
               vfs_fsync_range+0x66/0x100
               dio_complete+0x2f5/0x360
               dio_aio_complete_work+0x1c/0x20
               process_one_work+0x481/0x9f0
               worker_thread+0x63/0x5a0
               kthread+0x1cf/0x1f0
               ret_from_fork+0x24/0x30
      
        -> #1 ((work_completion)(&dio->complete_work)){+.+.}:
               process_one_work+0x447/0x9f0
               worker_thread+0x63/0x5a0
               kthread+0x1cf/0x1f0
               ret_from_fork+0x24/0x30
      
        -> #0 ((wq_completion)"dio/%s"sb->s_id){+.+.}:
               lock_acquire+0xc5/0x200
               flush_workqueue+0xf3/0x970
               drain_workqueue+0xec/0x220
               destroy_workqueue+0x23/0x350
               sb_init_dio_done_wq+0x6a/0x80
               do_blockdev_direct_IO+0x1f33/0x4be0
               __blockdev_direct_IO+0x79/0x86
               ext4_direct_IO+0x5df/0xbb0
               generic_file_direct_write+0x119/0x220
               __generic_file_write_iter+0x131/0x2d0
               ext4_file_write_iter+0x3fa/0x710
               aio_write+0x235/0x330
               io_submit_one+0x510/0xeb0
               __x64_sys_io_submit+0x122/0x340
               do_syscall_64+0x71/0x220
               entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
        other info that might help us debug this:
      
        Chain exists of:
          (wq_completion)"dio/%s"sb->s_id --> (work_completion)(&dio->complete_work) --> &sb->s_type->i_mutex_key#14
      
         Possible unsafe locking scenario:
      
               CPU0                    CPU1
               ----                    ----
          lock(&sb->s_type->i_mutex_key#14);
                                       lock((work_completion)(&dio->complete_work));
                                       lock(&sb->s_type->i_mutex_key#14);
          lock((wq_completion)"dio/%s"sb->s_id);
      
         *** DEADLOCK ***
      
        1 lock held by fio/4129:
         #0: 00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710
      
        stack backtrace:
        CPU: 3 PID: 4129 Comm: fio Not tainted 4.19.0-dbg+ #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        Call Trace:
         dump_stack+0x86/0xc5
         print_circular_bug.isra.32+0x20a/0x218
         __lock_acquire+0x1c68/0x1cf0
         lock_acquire+0xc5/0x200
         flush_workqueue+0xf3/0x970
         drain_workqueue+0xec/0x220
         destroy_workqueue+0x23/0x350
         sb_init_dio_done_wq+0x6a/0x80
         do_blockdev_direct_IO+0x1f33/0x4be0
         __blockdev_direct_IO+0x79/0x86
         ext4_direct_IO+0x5df/0xbb0
         generic_file_direct_write+0x119/0x220
         __generic_file_write_iter+0x131/0x2d0
         ext4_file_write_iter+0x3fa/0x710
         aio_write+0x235/0x330
         io_submit_one+0x510/0xeb0
         __x64_sys_io_submit+0x122/0x340
         do_syscall_64+0x71/0x220
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: https://lkml.kernel.org/r/20190214230058.196511-20-bvanassche@acm.org
      [ Reworked the changelog a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      669de8bd
    • B
      locking/lockdep: Add support for dynamic keys · 108c1485
      Bart Van Assche 提交于
      A shortcoming of the current lockdep implementation is that it requires
      lock keys to be allocated statically. That forces all instances of lock
      objects that occur in a given data structure to share a lock key. Since
      lock dependency analysis groups lock objects per key sharing lock keys
      can cause false positive lockdep reports. Make it possible to avoid
      such false positive reports by allowing lock keys to be allocated
      dynamically. Require that dynamically allocated lock keys are
      registered before use by calling lockdep_register_key(). Complain about
      attempts to register the same lock key pointer twice without calling
      lockdep_unregister_key() between successive registration calls.
      
      The purpose of the new lock_keys_hash[] data structure that keeps
      track of all dynamic keys is twofold:
      
        - Verify whether the lockdep_register_key() and lockdep_unregister_key()
          functions are used correctly.
      
        - Avoid that lockdep_init_map() complains when encountering a dynamically
          allocated key.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: johannes.berg@intel.com
      Cc: tj@kernel.org
      Link: https://lkml.kernel.org/r/20190214230058.196511-19-bvanassche@acm.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      108c1485
    • B
      locking/lockdep: Free lock classes that are no longer in use · a0b0fd53
      Bart Van Assche 提交于
      Instead of leaving lock classes that are no longer in use in the
      lock_classes array, reuse entries from that array that are no longer in
      use. Maintain a linked list of free lock classes with list head
      'free_lock_class'. Only add freed lock classes to the free_lock_classes
      list after a grace period to avoid that a lock_classes[] element would
      be reused while an RCU reader is accessing it. Since the lockdep
      selftests run in a context where sleeping is not allowed and since the
      selftests require that lock resetting/zapping works with debug_locks
      off, make the behavior of lockdep_free_key_range() and
      lockdep_reset_lock() depend on whether or not these are called from
      the context of the lockdep selftests.
      
      Thanks to Peter for having shown how to modify get_pending_free()
      such that that function does not have to sleep.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: johannes.berg@intel.com
      Cc: tj@kernel.org
      Link: https://lkml.kernel.org/r/20190214230058.196511-12-bvanassche@acm.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a0b0fd53
    • B
      locking/lockdep: Make it easy to detect whether or not inside a selftest · cdc84d79
      Bart Van Assche 提交于
      The patch that frees unused lock classes will modify the behavior of
      lockdep_free_key_range() and lockdep_reset_lock() depending on whether
      or not these functions are called from the context of the lockdep
      selftests. Hence make it easy to detect whether or not lockdep code
      is called from the context of a lockdep selftest.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: johannes.berg@intel.com
      Cc: tj@kernel.org
      Link: https://lkml.kernel.org/r/20190214230058.196511-10-bvanassche@acm.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cdc84d79
    • B
      locking/lockdep: Make zap_class() remove all matching lock order entries · 86cffb80
      Bart Van Assche 提交于
      Make sure that all lock order entries that refer to a class are removed
      from the list_entries[] array when a kernel module is unloaded.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: johannes.berg@intel.com
      Cc: tj@kernel.org
      Link: https://lkml.kernel.org/r/20190214230058.196511-7-bvanassche@acm.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      86cffb80
    • B
      locking/lockdep: Reorder struct lock_class members · 09329d1c
      Bart Van Assche 提交于
      This patch does not change any functionality but makes the patch that
      frees lock classes that are no longer in use easier to read.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: johannes.berg@intel.com
      Cc: tj@kernel.org
      Link: https://lkml.kernel.org/r/20190214230058.196511-6-bvanassche@acm.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      09329d1c
    • P
      locking/percpu-rwsem: Remove preempt_disable variants · 02e525b2
      Peter Zijlstra 提交于
      Effective revert commit:
      
        87709e28 ("fs/locks: Use percpu_down_read_preempt_disable()")
      
      This is causing major pain for PREEMPT_RT.
      
      Sebastian did a lot of lockperf runs on 2 and 4 node machines with all
      preemption modes (PREEMPT=n should be an obvious NOP for this patch
      and thus serves as a good control) and no results showed significance
      over 2-sigma (the PREEMPT=n results were almost empty at 1-sigma).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      02e525b2
  2. 26 2月, 2019 1 次提交
    • L
      Revert "x86/fault: BUG() when uaccess helpers fault on kernel addresses" · 53a41cb7
      Linus Torvalds 提交于
      This reverts commit 9da3f2b7.
      
      It was well-intentioned, but wrong.  Overriding the exception tables for
      instructions for random reasons is just wrong, and that is what the new
      code did.
      
      It caused problems for tracing, and it caused problems for strncpy_from_user(),
      because the new checks made perfectly valid use cases break, rather than
      catch things that did bad things.
      
      Unchecked user space accesses are a problem, but that's not a reason to
      add invalid checks that then people have to work around with silly flags
      (in this case, that 'kernel_uaccess_faults_ok' flag, which is just an
      odd way to say "this commit was wrong" and was sprinked into random
      places to hide the wrongness).
      
      The real fix to unchecked user space accesses is to get rid of the
      special "let's not check __get_user() and __put_user() at all" logic.
      Make __{get|put}_user() be just aliases to the regular {get|put}_user()
      functions, and make it impossible to access user space without having
      the proper checks in places.
      
      The raison d'être of the special double-underscore versions used to be
      that the range check was expensive, and if you did multiple user
      accesses, you'd do the range check up front (like the signal frame
      handling code, for example).  But SMAP (on x86) and PAN (on ARM) have
      made that optimization pointless, because the _real_ expense is the "set
      CPU flag to allow user space access".
      
      Do let's not break the valid cases to catch invalid cases that shouldn't
      even exist.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Tobin C. Harding <tobin@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Jann Horn <jannh@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53a41cb7
  3. 24 2月, 2019 1 次提交
    • L
      net: phy: realtek: Dummy IRQ calls for RTL8366RB · 4c8e0459
      Linus Walleij 提交于
      This fixes a regression introduced by
      commit 0d2e778e
      "net: phy: replace PHY_HAS_INTERRUPT with a check for
      config_intr and ack_interrupt".
      
      This assumes that a PHY cannot trigger interrupt unless
      it has .config_intr() or .ack_interrupt() implemented.
      A later patch makes the code assume both need to be
      implemented for interrupts to be present.
      
      But this PHY (which is inside a DSA) will happily
      fire interrupts without either callback.
      
      Implement dummy callbacks for .config_intr() and
      .ack_interrupt() in the phy header to fix this.
      
      Tested on the RTL8366RB on D-Link DIR-685.
      
      Fixes: 0d2e778e ("net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt")
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c8e0459
  4. 23 2月, 2019 1 次提交
  5. 22 2月, 2019 2 次提交
    • A
      phonet: fix building with clang · 6321aa19
      Arnd Bergmann 提交于
      clang warns about overflowing the data[] member in the struct pnpipehdr:
      
      net/phonet/pep.c:295:8: warning: array index 4 is past the end of the array (which contains 1 element) [-Warray-bounds]
                              if (hdr->data[4] == PEP_IND_READY)
                                  ^         ~
      include/net/phonet/pep.h:66:3: note: array 'data' declared here
                      u8              data[1];
      
      Using a flexible array member at the end of the struct avoids the
      warning, but since we cannot have a flexible array member inside
      of the union, each index now has to be moved back by one, which
      makes it a little uglier.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NRémi Denis-Courmont <remi@remlab.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6321aa19
    • W
      net: avoid false positives in untrusted gso validation · 9e8db591
      Willem de Bruijn 提交于
      GSO packets with vnet_hdr must conform to a small set of gso_types.
      The below commit uses flow dissection to drop packets that do not.
      
      But it has false positives when the skb is not fully initialized.
      Dissection needs skb->protocol and skb->network_header.
      
      Infer skb->protocol from gso_type as the two must agree.
      SKB_GSO_UDP can use both ipv4 and ipv6, so try both.
      
      Exclude callers for which network header offset is not known.
      
      Fixes: d5be7f63 ("net: validate untrusted gso packets without csum offload")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e8db591
  6. 17 2月, 2019 1 次提交
  7. 16 2月, 2019 7 次提交
    • A
      efi/arm: Revert "Defer persistent reservations until after paging_init()" · 582a32e7
      Ard Biesheuvel 提交于
      This reverts commit eff89628, which
      deferred the processing of persistent memory reservations to a point
      where the memory may have already been allocated and overwritten,
      defeating the purpose.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190215123333.21209-3-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      582a32e7
    • A
      arm64, mm, efi: Account for GICv3 LPI tables in static memblock reserve table · 8a5b403d
      Ard Biesheuvel 提交于
      In the irqchip and EFI code, we have what basically amounts to a quirk
      to work around a peculiarity in the GICv3 architecture, which permits
      the system memory address of LPI tables to be programmable only once
      after a CPU reset. This means kexec kernels must use the same memory
      as the first kernel, and thus ensure that this memory has not been
      given out for other purposes by the time the ITS init code runs, which
      is not very early for secondary CPUs.
      
      On systems with many CPUs, these reservations could overflow the
      memblock reservation table, and this was addressed in commit:
      
        eff89628 ("efi/arm: Defer persistent reservations until after paging_init()")
      
      However, this turns out to have made things worse, since the allocation
      of page tables and heap space for the resized memblock reservation table
      itself may overwrite the regions we are attempting to reserve, which may
      cause all kinds of corruption, also considering that the ITS will still
      be poking bits into that memory in response to incoming MSIs.
      
      So instead, let's grow the static memblock reservation table on such
      systems so it can accommodate these reservations at an earlier time.
      This will permit us to revert the above commit in a subsequent patch.
      
      [ mingo: Minor cleanups. ]
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20190215123333.21209-2-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8a5b403d
    • W
      net: validate untrusted gso packets without csum offload · d5be7f63
      Willem de Bruijn 提交于
      Syzkaller again found a path to a kernel crash through bad gso input.
      By building an excessively large packet to cause an skb field to wrap.
      
      If VIRTIO_NET_HDR_F_NEEDS_CSUM was set this would have been dropped in
      skb_partial_csum_set.
      
      GSO packets that do not set checksum offload are suspicious and rare.
      Most callers of virtio_net_hdr_to_skb already pass them to
      skb_probe_transport_header.
      
      Move that test forward, change it to detect parse failure and drop
      packets on failure as those cleary are not one of the legitimate
      VIRTIO_NET_HDR_GSO types.
      
      Fixes: bfd5f4a3 ("packet: Add GSO/csum offload support.")
      Fixes: f43798c2 ("tun: Allow GSO using virtio_net_hdr")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5be7f63
    • H
      net: Fix for_each_netdev_feature on Big endian · 3b89ea9c
      Hauke Mehrtens 提交于
      The features attribute is of type u64 and stored in the native endianes on
      the system. The for_each_set_bit() macro takes a pointer to a 32 bit array
      and goes over the bits in this area. On little Endian systems this also
      works with an u64 as the most significant bit is on the highest address,
      but on big endian the words are swapped. When we expect bit 15 here we get
      bit 47 (15 + 32).
      
      This patch converts it more or less to its own for_each_set_bit()
      implementation which works on 64 bit integers directly. This is then
      completely in host endianness and should work like expected.
      
      Fixes: fd867d51 ("net/core: generic support for disabling netdev features down stack")
      Signed-off-by: NHauke Mehrtens <hauke.mehrtens@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b89ea9c
    • D
      keys: Fix dependency loop between construction record and auth key · 822ad64d
      David Howells 提交于
      In the request_key() upcall mechanism there's a dependency loop by which if
      a key type driver overrides the ->request_key hook and the userspace side
      manages to lose the authorisation key, the auth key and the internal
      construction record (struct key_construction) can keep each other pinned.
      
      Fix this by the following changes:
      
       (1) Killing off the construction record and using the auth key instead.
      
       (2) Including the operation name in the auth key payload and making the
           payload available outside of security/keys/.
      
       (3) The ->request_key hook is given the authkey instead of the cons
           record and operation name.
      
      Changes (2) and (3) allow the auth key to naturally be cleaned up if the
      keyring it is in is destroyed or cleared or the auth key is unlinked.
      
      Fixes: 7ee02a316600 ("keys: Fix dependency loop between construction record and auth key")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJames Morris <james.morris@microsoft.com>
      822ad64d
    • M
      include/linux/module.h: copy __init/__exit attrs to init/cleanup_module · a6e60d84
      Miguel Ojeda 提交于
      The upcoming GCC 9 release extends the -Wmissing-attributes warnings
      (enabled by -Wall) to C and aliases: it warns when particular function
      attributes are missing in the aliases but not in their target.
      
      In particular, it triggers for all the init/cleanup_module
      aliases in the kernel (defined by the module_init/exit macros),
      ending up being very noisy.
      
      These aliases point to the __init/__exit functions of a module,
      which are defined as __cold (among other attributes). However,
      the aliases themselves do not have the __cold attribute.
      
      Since the compiler behaves differently when compiling a __cold
      function as well as when compiling paths leading to calls
      to __cold functions, the warning is trying to point out
      the possibly-forgotten attribute in the alias.
      
      In order to keep the warning enabled, we decided to silence
      this case. Ideally, we would mark the aliases directly
      as __init/__exit. However, there are currently around 132 modules
      in the kernel which are missing __init/__exit in their init/cleanup
      functions (either because they are missing, or for other reasons,
      e.g. the functions being called from somewhere else); and
      a section mismatch is a hard error.
      
      A conservative alternative was to mark the aliases as __cold only.
      However, since we would like to eventually enforce __init/__exit
      to be always marked,  we chose to use the new __copy function
      attribute (introduced by GCC 9 as well to deal with this).
      With it, we copy the attributes used by the target functions
      into the aliases. This way, functions that were not marked
      as __init/__exit won't have their aliases marked either,
      and therefore there won't be a section mismatch.
      
      Note that the warning would go away marking either the extern
      declaration, the definition, or both. However, we only mark
      the definition of the alias, since we do not want callers
      (which only see the declaration) to be compiled as if the function
      was __cold (and therefore the paths leading to those calls
      would be assumed to be unlikely).
      
      Link: https://lore.kernel.org/lkml/20190123173707.GA16603@gmail.com/
      Link: https://lore.kernel.org/lkml/20190206175627.GA20399@gmail.com/Suggested-by: NMartin Sebor <msebor@gcc.gnu.org>
      Acked-by: NJessica Yu <jeyu@kernel.org>
      Signed-off-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      a6e60d84
    • M
      Compiler Attributes: add support for __copy (gcc >= 9) · c0d9782f
      Miguel Ojeda 提交于
      From the GCC manual:
      
        copy
        copy(function)
      
          The copy attribute applies the set of attributes with which function
          has been declared to the declaration of the function to which
          the attribute is applied. The attribute is designed for libraries
          that define aliases or function resolvers that are expected
          to specify the same set of attributes as their targets. The copy
          attribute can be used with functions, variables, or types. However,
          the kind of symbol to which the attribute is applied (either
          function or variable) must match the kind of symbol to which
          the argument refers. The copy attribute copies only syntactic and
          semantic attributes but not attributes that affect a symbol’s
          linkage or visibility such as alias, visibility, or weak.
          The deprecated attribute is also not copied.
      
        https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
      
      The upcoming GCC 9 release extends the -Wmissing-attributes warnings
      (enabled by -Wall) to C and aliases: it warns when particular function
      attributes are missing in the aliases but not in their target, e.g.:
      
          void __cold f(void) {}
          void __alias("f") g(void);
      
      diagnoses:
      
          warning: 'g' specifies less restrictive attribute than
          its target 'f': 'cold' [-Wmissing-attributes]
      
      Using __copy(f) we can copy the __cold attribute from f to g:
      
          void __cold f(void) {}
          void __copy(f) __alias("f") g(void);
      
      This attribute is most useful to deal with situations where an alias
      is declared but we don't know the exact attributes the target has.
      
      For instance, in the kernel, the widely used module_init/exit macros
      define the init/cleanup_module aliases, but those cannot be marked
      always as __init/__exit since some modules do not have their
      functions marked as such.
      Suggested-by: NMartin Sebor <msebor@gcc.gnu.org>
      Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      c0d9782f
  8. 15 2月, 2019 1 次提交
  9. 14 2月, 2019 1 次提交
  10. 13 2月, 2019 3 次提交
  11. 11 2月, 2019 2 次提交
    • J
      perf/x86: Add check_period PMU callback · 81ec3f3c
      Jiri Olsa 提交于
      Vince (and later on Ravi) reported crashes in the BTS code during
      fuzzing with the following backtrace:
      
        general protection fault: 0000 [#1] SMP PTI
        ...
        RIP: 0010:perf_prepare_sample+0x8f/0x510
        ...
        Call Trace:
         <IRQ>
         ? intel_pmu_drain_bts_buffer+0x194/0x230
         intel_pmu_drain_bts_buffer+0x160/0x230
         ? tick_nohz_irq_exit+0x31/0x40
         ? smp_call_function_single_interrupt+0x48/0xe0
         ? call_function_single_interrupt+0xf/0x20
         ? call_function_single_interrupt+0xa/0x20
         ? x86_schedule_events+0x1a0/0x2f0
         ? x86_pmu_commit_txn+0xb4/0x100
         ? find_busiest_group+0x47/0x5d0
         ? perf_event_set_state.part.42+0x12/0x50
         ? perf_mux_hrtimer_restart+0x40/0xb0
         intel_pmu_disable_event+0xae/0x100
         ? intel_pmu_disable_event+0xae/0x100
         x86_pmu_stop+0x7a/0xb0
         x86_pmu_del+0x57/0x120
         event_sched_out.isra.101+0x83/0x180
         group_sched_out.part.103+0x57/0xe0
         ctx_sched_out+0x188/0x240
         ctx_resched+0xa8/0xd0
         __perf_event_enable+0x193/0x1e0
         event_function+0x8e/0xc0
         remote_function+0x41/0x50
         flush_smp_call_function_queue+0x68/0x100
         generic_smp_call_function_single_interrupt+0x13/0x30
         smp_call_function_single_interrupt+0x3e/0xe0
         call_function_single_interrupt+0xf/0x20
         </IRQ>
      
      The reason is that while event init code does several checks
      for BTS events and prevents several unwanted config bits for
      BTS event (like precise_ip), the PERF_EVENT_IOC_PERIOD allows
      to create BTS event without those checks being done.
      
      Following sequence will cause the crash:
      
      If we create an 'almost' BTS event with precise_ip and callchains,
      and it into a BTS event it will crash the perf_prepare_sample()
      function because precise_ip events are expected to come
      in with callchain data initialized, but that's not the
      case for intel_pmu_drain_bts_buffer() caller.
      
      Adding a check_period callback to be called before the period
      is changed via PERF_EVENT_IOC_PERIOD. It will deny the change
      if the event would become BTS. Plus adding also the limit_period
      check as well.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20190204123532.GA4794@kravaSigned-off-by: NIngo Molnar <mingo@kernel.org>
      81ec3f3c
    • W
      bpf: only adjust gso_size on bytestream protocols · b90efd22
      Willem de Bruijn 提交于
      bpf_skb_change_proto and bpf_skb_adjust_room change skb header length.
      For GSO packets they adjust gso_size to maintain the same MTU.
      
      The gso size can only be safely adjusted on bytestream protocols.
      Commit d02f51cb ("bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat
      to deal with gso sctp skbs") excluded SKB_GSO_SCTP.
      
      Since then type SKB_GSO_UDP_L4 has been added, whose contents are one
      gso_size unit per datagram. Also exclude these.
      
      Move from a blacklist to a whitelist check to future proof against
      additional such new GSO types, e.g., for fraglist based GRO.
      
      Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      b90efd22
  12. 09 2月, 2019 1 次提交
    • L
      net: ipv4: use a dedicated counter for icmp_v4 redirect packets · c09551c6
      Lorenzo Bianconi 提交于
      According to the algorithm described in the comment block at the
      beginning of ip_rt_send_redirect, the host should try to send
      'ip_rt_redirect_number' ICMP redirect packets with an exponential
      backoff and then stop sending them at all assuming that the destination
      ignores redirects.
      If the device has previously sent some ICMP error packets that are
      rate-limited (e.g TTL expired) and continues to receive traffic,
      the redirect packets will never be transmitted. This happens since
      peer->rate_tokens will be typically greater than 'ip_rt_redirect_number'
      and so it will never be reset even if the redirect silence timeout
      (ip_rt_redirect_silence) has elapsed without receiving any packet
      requiring redirects.
      
      Fix it by using a dedicated counter for the number of ICMP redirect
      packets that has been sent by the host
      
      I have not been able to identify a given commit that introduced the
      issue since ip_rt_send_redirect implements the same rate-limiting
      algorithm from commit 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c09551c6
  13. 08 2月, 2019 2 次提交
    • Z
      mmc: block: handle complete_work on separate workqueue · dcf6e2e3
      Zachary Hays 提交于
      The kblockd workqueue is created with the WQ_MEM_RECLAIM flag set.
      This generates a rescuer thread for that queue that will trigger when
      the CPU is under heavy load and collect the uncompleted work.
      
      In the case of mmc, this creates the possibility of a deadlock when
      there are multiple partitions on the device as other blk-mq work is
      also run on the same queue. For example:
      
      - worker 0 claims the mmc host to work on partition 1
      - worker 1 attempts to claim the host for partition 2 but has to wait
        for worker 0 to finish
      - worker 0 schedules complete_work to release the host
      - rescuer thread is triggered after time-out and collects the dangling
        work
      - rescuer thread attempts to complete the work in order starting with
        claim host
      - the task to release host is now blocked by a task to claim it and
        will never be called
      
      The above results in multiple hung tasks that lead to failures to
      mount partitions.
      
      Handling complete_work on a separate workqueue avoids this by keeping
      the work completion tasks separate from the other blk-mq work. This
      allows the host to be released without getting blocked by other tasks
      attempting to claim the host.
      Signed-off-by: NZachary Hays <zhays@lexmark.com>
      Fixes: 81196976 ("mmc: block: Add blk-mq support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      dcf6e2e3
    • J
      blktrace: Show requests without sector · 0803de78
      Jan Kara 提交于
      Currently, blktrace will not show requests that don't have any data as
      rq->__sector is initialized to -1 which is out of device range and thus
      discarded by act_log_check(). This is most notably the case for cache
      flush requests sent to the device. Fix the problem by making
      blk_rq_trace_sector() return 0 for requests without initialized sector.
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0803de78
  14. 06 2月, 2019 2 次提交
    • C
      ALSA: compress: Fix stop handling on compressed capture streams · 4f2ab5e1
      Charles Keepax 提交于
      It is normal user behaviour to start, stop, then start a stream
      again without closing it. Currently this works for compressed
      playback streams but not capture ones.
      
      The states on a compressed capture stream go directly from OPEN to
      PREPARED, unlike a playback stream which moves to SETUP and waits
      for a write of data before moving to PREPARED. Currently however,
      when a stop is sent the state is set to SETUP for both types of
      streams. This leaves a capture stream in the situation where a new
      start can't be sent as that requires the state to be PREPARED and
      a new set_params can't be sent as that requires the state to be
      OPEN. The only option being to close the stream, and then reopen.
      
      Correct this issues by allowing snd_compr_drain_notify to set the
      state depending on the stream direction, as we already do in
      set_params.
      
      Fixes: 49bb6402 ("ALSA: compress_core: Add support for capture streams")
      Signed-off-by: NCharles Keepax <ckeepax@opensource.cirrus.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      4f2ab5e1
    • M
      virtio: drop internal struct from UAPI · 9c0644ee
      Michael S. Tsirkin 提交于
      There's no reason to expose struct vring_packed in UAPI - if we do we
      won't be able to change or drop it, and it's not part of any interface.
      
      Let's move it to virtio_ring.c
      
      Cc: Tiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      9c0644ee
  15. 05 2月, 2019 2 次提交
    • C
      xfrm: destroy xfrm_state synchronously on net exit path · f75a2804
      Cong Wang 提交于
      xfrm_state_put() moves struct xfrm_state to the GC list
      and schedules the GC work to clean it up. On net exit call
      path, xfrm_state_flush() is called to clean up and
      xfrm_flush_gc() is called to wait for the GC work to complete
      before exit.
      
      However, this doesn't work because one of the ->destructor(),
      ipcomp_destroy(), schedules the same GC work again inside
      the GC work. It is hard to wait for such a nested async
      callback. This is also why syzbot still reports the following
      warning:
      
       WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351
       ...
        ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
        cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
        process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
        worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
        kthread+0x357/0x430 kernel/kthread.c:246
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      
      In fact, it is perfectly fine to bypass GC and destroy xfrm_state
      synchronously on net exit call path, because it is in process context
      and doesn't need a work struct to do any blocking work.
      
      This patch introduces xfrm_state_put_sync() which simply bypasses
      GC, and lets its callers to decide whether to use this synchronous
      version. On net exit path, xfrm_state_fini() and
      xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is
      blocking, it can use xfrm_state_put_sync() directly too.
      
      Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to
      reflect this change.
      
      Fixes: b48c05ab ("xfrm: Fix warning in xfrm6_tunnel_net_exit.")
      Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      f75a2804
    • P
      netfilter: nf_tables: unbind set in rule from commit path · f6ac8585
      Pablo Neira Ayuso 提交于
      Anonymous sets that are bound to rules from the same transaction trigger
      a kernel splat from the abort path due to double set list removal and
      double free.
      
      This patch updates the logic to search for the transaction that is
      responsible for creating the set and disable the set list removal and
      release, given the rule is now responsible for this. Lookup is reverse
      since the transaction that adds the set is likely to be at the tail of
      the list.
      
      Moreover, this patch adds the unbind step to deliver the event from the
      commit path.  This should not be done from the worker thread, since we
      have no guarantees of in-order delivery to the listener.
      
      This patch removes the assumption that both activate and deactivate
      callbacks need to be provided.
      
      Fixes: cd5125d8 ("netfilter: nf_tables: split set destruction in deactivate and destroy phase")
      Reported-by: NMikhail Morfikov <mmorfikov@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f6ac8585
  16. 04 2月, 2019 1 次提交
    • D
      sched/wake_q: Reduce reference counting for special users · 07879c6a
      Davidlohr Bueso 提交于
      Some users, specifically futexes and rwsems, required fixes
      that allowed the callers to be safe when wakeups occur before
      they are expected by wake_up_q(). Such scenarios also play
      games and rely on reference counting, and until now were
      pivoting on wake_q doing it. With the wake_q_add() call being
      moved down, this can no longer be the case. As such we end up
      with a a double task refcounting overhead; and these callers
      care enough about this (being rather core-ish).
      
      This patch introduces a wake_q_add_safe() call that serves
      for callers that have already done refcounting and therefore the
      task is 'safe' from wake_q point of view (int that it requires
      reference throughout the entire queue/>wakeup cycle). In the one
      case it has internal reference counting, in the other case it
      consumes the reference counting.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Xie Yongji <xieyongji@baidu.com>
      Cc: Yongji Xie <elohimes@gmail.com>
      Cc: andrea.parri@amarulasolutions.com
      Cc: lilin24@baidu.com
      Cc: liuqi16@baidu.com
      Cc: nixun@baidu.com
      Cc: yuanlinsi01@baidu.com
      Cc: zhangyu31@baidu.com
      Link: https://lkml.kernel.org/r/20181218195352.7orq3upiwfdbrdne@linux-r8p5Signed-off-by: NIngo Molnar <mingo@kernel.org>
      07879c6a
  17. 02 2月, 2019 3 次提交
    • J
      x86/resctrl: Avoid confusion over the new X86_RESCTRL config · e6d42931
      Johannes Weiner 提交于
      "Resource Control" is a very broad term for this CPU feature, and a term
      that is also associated with containers, cgroups etc. This can easily
      cause confusion.
      
      Make the user prompt more specific. Match the config symbol name.
      
       [ bp: In the future, the corresponding ARM arch-specific code will be
         under ARM_CPU_RESCTRL and the arch-agnostic bits will be carved out
         under the CPU_RESCTRL umbrella symbol. ]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Babu Moger <Babu.Moger@amd.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: linux-doc@vger.kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pu Wen <puwen@hygon.cn>
      Cc: Reinette Chatre <reinette.chatre@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190130195621.GA30653@cmpxchg.org
      e6d42931
    • Q
      mm/hotplug: invalid PFNs from pfn_to_online_page() · b13bc351
      Qian Cai 提交于
      On an arm64 ThunderX2 server, the first kmemleak scan would crash [1]
      with CONFIG_DEBUG_VM_PGFLAGS=y due to page_to_nid() found a pfn that is
      not directly mapped (MEMBLOCK_NOMAP).  Hence, the page->flags is
      uninitialized.
      
      This is due to the commit 9f1eb38e ("mm, kmemleak: little
      optimization while scanning") starts to use pfn_to_online_page() instead
      of pfn_valid().  However, in the CONFIG_MEMORY_HOTPLUG=y case,
      pfn_to_online_page() does not call memblock_is_map_memory() while
      pfn_valid() does.
      
      Historically, the commit 68709f45 ("arm64: only consider memblocks
      with NOMAP cleared for linear mapping") causes pages marked as nomap
      being no long reassigned to the new zone in memmap_init_zone() by
      calling __init_single_page().
      
      Since the commit 2d070eab ("mm: consider zone which is not fully
      populated to have holes") introduced pfn_to_online_page() and was
      designed to return a valid pfn only, but it is clearly broken on arm64.
      
      Therefore, let pfn_to_online_page() call pfn_valid_within(), so it can
      handle nomap thanks to the commit f52bb98f ("arm64: mm: always
      enable CONFIG_HOLES_IN_ZONE"), while it will be optimized away on
      architectures where have no HOLES_IN_ZONE.
      
      [1]
        Unable to handle kernel NULL pointer dereference at virtual address 0000000000000006
        Mem abort info:
          ESR = 0x96000005
          Exception class = DABT (current EL), IL = 32 bits
          SET = 0, FnV = 0
          EA = 0, S1PTW = 0
        Data abort info:
          ISV = 0, ISS = 0x00000005
          CM = 0, WnR = 0
        Internal error: Oops: 96000005 [#1] SMP
        CPU: 60 PID: 1408 Comm: kmemleak Not tainted 5.0.0-rc2+ #8
        pstate: 60400009 (nZCv daif +PAN -UAO)
        pc : page_mapping+0x24/0x144
        lr : __dump_page+0x34/0x3dc
        sp : ffff00003a5cfd10
        x29: ffff00003a5cfd10 x28: 000000000000802f
        x27: 0000000000000000 x26: 0000000000277d00
        x25: ffff000010791f56 x24: ffff7fe000000000
        x23: ffff000010772f8b x22: ffff00001125f670
        x21: ffff000011311000 x20: ffff000010772f8b
        x19: fffffffffffffffe x18: 0000000000000000
        x17: 0000000000000000 x16: 0000000000000000
        x15: 0000000000000000 x14: ffff802698b19600
        x13: ffff802698b1a200 x12: ffff802698b16f00
        x11: ffff802698b1a400 x10: 0000000000001400
        x9 : 0000000000000001 x8 : ffff00001121a000
        x7 : 0000000000000000 x6 : ffff0000102c53b8
        x5 : 0000000000000000 x4 : 0000000000000003
        x3 : 0000000000000100 x2 : 0000000000000000
        x1 : ffff000010772f8b x0 : ffffffffffffffff
        Process kmemleak (pid: 1408, stack limit = 0x(____ptrval____))
        Call trace:
         page_mapping+0x24/0x144
         __dump_page+0x34/0x3dc
         dump_page+0x28/0x4c
         kmemleak_scan+0x4ac/0x680
         kmemleak_scan_thread+0xb4/0xdc
         kthread+0x12c/0x13c
         ret_from_fork+0x10/0x18
        Code: d503201f f9400660 36000040 d1000413 (f9400661)
        ---[ end trace 4d4bd7f573490c8e ]---
        Kernel panic - not syncing: Fatal exception
        SMP: stopping secondary CPUs
        Kernel Offset: disabled
        CPU features: 0x002,20000c38
        Memory Limit: none
        ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Link: http://lkml.kernel.org/r/20190122132916.28360-1-cai@lca.pw
      Fixes: 9f1eb38e ("mm, kmemleak: little optimization while scanning")
      Signed-off-by: NQian Cai <cai@lca.pw>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b13bc351
    • T
      oom, oom_reaper: do not enqueue same task twice · 9bcdeb51
      Tetsuo Handa 提交于
      Arkadiusz reported that enabling memcg's group oom killing causes
      strange memcg statistics where there is no task in a memcg despite the
      number of tasks in that memcg is not 0.  It turned out that there is a
      bug in wake_oom_reaper() which allows enqueuing same task twice which
      makes impossible to decrease the number of tasks in that memcg due to a
      refcount leak.
      
      This bug existed since the OOM reaper became invokable from
      task_will_free_mem(current) path in out_of_memory() in Linux 4.7,
      
        T1@P1     |T2@P1     |T3@P1     |OOM reaper
        ----------+----------+----------+------------
                                         # Processing an OOM victim in a different memcg domain.
                              try_charge()
                                mem_cgroup_out_of_memory()
                                  mutex_lock(&oom_lock)
                   try_charge()
                     mem_cgroup_out_of_memory()
                       mutex_lock(&oom_lock)
        try_charge()
          mem_cgroup_out_of_memory()
            mutex_lock(&oom_lock)
                                  out_of_memory()
                                    oom_kill_process(P1)
                                      do_send_sig_info(SIGKILL, @P1)
                                      mark_oom_victim(T1@P1)
                                      wake_oom_reaper(T1@P1) # T1@P1 is enqueued.
                                  mutex_unlock(&oom_lock)
                       out_of_memory()
                         mark_oom_victim(T2@P1)
                         wake_oom_reaper(T2@P1) # T2@P1 is enqueued.
                       mutex_unlock(&oom_lock)
            out_of_memory()
              mark_oom_victim(T1@P1)
              wake_oom_reaper(T1@P1) # T1@P1 is enqueued again due to oom_reaper_list == T2@P1 && T1@P1->oom_reaper_list == NULL.
            mutex_unlock(&oom_lock)
                                         # Completed processing an OOM victim in a different memcg domain.
                                         spin_lock(&oom_reaper_lock)
                                         # T1P1 is dequeued.
                                         spin_unlock(&oom_reaper_lock)
      
      but memcg's group oom killing made it easier to trigger this bug by
      calling wake_oom_reaper() on the same task from one out_of_memory()
      request.
      
      Fix this bug using an approach used by commit 855b0183 ("oom,
      oom_reaper: disable oom_reaper for oom_kill_allocating_task").  As a
      side effect of this patch, this patch also avoids enqueuing multiple
      threads sharing memory via task_will_free_mem(current) path.
      
      Link: http://lkml.kernel.org/r/e865a044-2c10-9858-f4ef-254bc71d6cc2@i-love.sakura.ne.jp
      Link: http://lkml.kernel.org/r/5ee34fc6-1485-34f8-8790-903ddabaa809@i-love.sakura.ne.jp
      Fixes: af8e15cc ("oom, oom_reaper: do not enqueue task if it is on the oom_reaper_list head")
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-by: NArkadiusz Miskiewicz <arekm@maven.pl>
      Tested-by: NArkadiusz Miskiewicz <arekm@maven.pl>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Aleksa Sarai <asarai@suse.de>
      Cc: Jay Kamat <jgkamat@fb.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9bcdeb51
  18. 01 2月, 2019 1 次提交
    • T
      ALSA: hda - Serialize codec registrations · 305a0ade
      Takashi Iwai 提交于
      In the current code, the codec registration may happen both at the
      codec bind time and the end of the controller probe time.  In a rare
      occasion, they race with each other, leading to Oops due to the still
      uninitialized card device.
      
      This patch introduces a simple flag to prevent the codec registration
      at the codec bind time as long as the controller probe is going on.
      The controller probe invokes snd_card_register() that does the whole
      registration task, and we don't need to register each piece
      beforehand.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      305a0ade