1. 05 3月, 2019 1 次提交
    • L
      aio: simplify - and fix - fget/fput for io_submit() · 84c4e1f8
      Linus Torvalds 提交于
      Al Viro root-caused a race where the IOCB_CMD_POLL handling of
      fget/fput() could cause us to access the file pointer after it had
      already been freed:
      
       "In more details - normally IOCB_CMD_POLL handling looks so:
      
         1) io_submit(2) allocates aio_kiocb instance and passes it to
            aio_poll()
      
         2) aio_poll() resolves the descriptor to struct file by req->file =
            fget(iocb->aio_fildes)
      
         3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
            aio_kiocb to 2 (bumps by 1, that is).
      
         4) aio_poll() calls vfs_poll(). After sanity checks (basically,
            "poll_wait() had been called and only once") it locks the queue.
            That's what the extra reference to iocb had been for - we know we
            can safely access it.
      
         5) With queue locked, we check if ->woken has already been set to
            true (by aio_poll_wake()) and, if it had been, we unlock the
            queue, drop a reference to aio_kiocb and bugger off - at that
            point it's a responsibility to aio_poll_wake() and the stuff
            called/scheduled by it. That code will drop the reference to file
            in req->file, along with the other reference to our aio_kiocb.
      
         6) otherwise, we see whether we need to wait. If we do, we unlock the
            queue, drop one reference to aio_kiocb and go away - eventual
            wakeup (or cancel) will deal with the reference to file and with
            the other reference to aio_kiocb
      
         7) otherwise we remove ourselves from waitqueue (still under the
            queue lock), so that wakeup won't get us. No async activity will
            be happening, so we can safely drop req->file and iocb ourselves.
      
        If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
        won't get freed under us, so we can do all the checks and locking
        safely. And we don't touch ->file if we detect that case.
      
        However, vfs_poll() most certainly *does* touch the file it had been
        given. So wakeup coming while we are still in ->poll() might end up
        doing fput() on that file. That case is not too rare, and usually we
        are saved by the still present reference from descriptor table - that
        fput() is not the final one.
      
        But if another thread closes that descriptor right after our fget()
        and wakeup does happen before ->poll() returns, we are in trouble -
        final fput() done while we are in the middle of a method:
      
      Al also wrote a patch to take an extra reference to the file descriptor
      to fix this, but I instead suggested we just streamline the whole file
      pointer handling by submit_io() so that the generic aio submission code
      simply keeps the file pointer around until the aio has completed.
      
      Fixes: bfe4037e ("aio: implement IOCB_CMD_POLL")
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      84c4e1f8
  2. 04 3月, 2019 8 次提交
    • H
      net: phy: remove gen10g_no_soft_reset · 7be3ad84
      Heiner Kallweit 提交于
      genphy_no_soft_reset and gen10g_no_soft_reset are both the same no-ops,
      one is enough.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7be3ad84
    • H
      net: phy: don't export gen10g_read_status · d81210c2
      Heiner Kallweit 提交于
      gen10g_read_status is deprecated, therefore stop exporting it.
      We don't want to encourage anybody to use it.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d81210c2
    • H
      net: phy: remove gen10g_config_init · c5e91d39
      Heiner Kallweit 提交于
      ETHTOOL_LINK_MODE_10000baseT_Full_BIT is set anyway in the supported
      and advertising bitmap because it's part of PHY_10GBIT_FEATURES.
      And all users of gen10g_config_init use PHY_10GBIT_FEATURES.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5e91d39
    • H
      net: phy: remove gen10g_suspend and gen10g_resume · a6d0aa97
      Heiner Kallweit 提交于
      phy_suspend() and phy_resume() are no-ops anyway if no callback is
      defined. Therefore we don't need these stubs.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6d0aa97
    • F
      net: ipv6: add socket option IPV6_ROUTER_ALERT_ISOLATE · 9036b2fe
      Francesco Ruggeri 提交于
      By default IPv6 socket with IPV6_ROUTER_ALERT socket option set will
      receive all IPv6 RA packets from all namespaces.
      IPV6_ROUTER_ALERT_ISOLATE socket option restricts packets received by
      the socket to be only from the socket's namespace.
      Signed-off-by: NMaxim Martynov <maxim@arista.com>
      Signed-off-by: NFrancesco Ruggeri <fruggeri@arista.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9036b2fe
    • A
      regulator: core: Add set/get_current_limit helpers for regmap users · a32e0c77
      Axel Lin 提交于
      By setting curr_table, n_current_limits, csel_reg and csel_mask, the
      regmap users can use regulator_set_current_limit_regmap and
      regulator_get_current_limit_regmap for set/get_current_limit callbacks.
      Signed-off-by: NAxel Lin <axel.lin@ingics.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      a32e0c77
    • A
      regulator: Fix comment for csel_reg and csel_mask · 35d838ff
      Axel Lin 提交于
      The csel_reg and csel_mask fields in struct regulator_desc needs to
      be generic for drivers. Not just for TPS65218.
      Signed-off-by: NAxel Lin <axel.lin@ingics.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      35d838ff
    • Y
      appletalk: Fix use-after-free in atalk_proc_exit · 6377f787
      YueHaibing 提交于
      KASAN report this:
      
      BUG: KASAN: use-after-free in pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71
      Read of size 8 at addr ffff8881f41fe5b0 by task syz-executor.0/2806
      
      CPU: 0 PID: 2806 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0xfa/0x1ce lib/dump_stack.c:113
       print_address_description+0x65/0x270 mm/kasan/report.c:187
       kasan_report+0x149/0x18d mm/kasan/report.c:317
       pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71
       remove_proc_entry+0xe8/0x420 fs/proc/generic.c:667
       atalk_proc_exit+0x18/0x820 [appletalk]
       atalk_exit+0xf/0x5a [appletalk]
       __do_sys_delete_module kernel/module.c:1018 [inline]
       __se_sys_delete_module kernel/module.c:961 [inline]
       __x64_sys_delete_module+0x3dc/0x5e0 kernel/module.c:961
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fb2de6b9c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200001c0
      RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb2de6ba6bc
      R13: 00000000004bccaa R14: 00000000006f6bc8 R15: 00000000ffffffff
      
      Allocated by task 2806:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496
       slab_post_alloc_hook mm/slab.h:444 [inline]
       slab_alloc_node mm/slub.c:2739 [inline]
       slab_alloc mm/slub.c:2747 [inline]
       kmem_cache_alloc+0xcf/0x250 mm/slub.c:2752
       kmem_cache_zalloc include/linux/slab.h:730 [inline]
       __proc_create+0x30f/0xa20 fs/proc/generic.c:408
       proc_mkdir_data+0x47/0x190 fs/proc/generic.c:469
       0xffffffffc10c01bb
       0xffffffffc10c0166
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 2806:
       set_track mm/kasan/common.c:85 [inline]
       __kasan_slab_free+0x130/0x180 mm/kasan/common.c:458
       slab_free_hook mm/slub.c:1409 [inline]
       slab_free_freelist_hook mm/slub.c:1436 [inline]
       slab_free mm/slub.c:2986 [inline]
       kmem_cache_free+0xa6/0x2a0 mm/slub.c:3002
       pde_put+0x6e/0x80 fs/proc/generic.c:647
       remove_proc_entry+0x1d3/0x420 fs/proc/generic.c:684
       0xffffffffc10c031c
       0xffffffffc10c0166
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8881f41fe500
       which belongs to the cache proc_dir_entry of size 256
      The buggy address is located 176 bytes inside of
       256-byte region [ffff8881f41fe500, ffff8881f41fe600)
      The buggy address belongs to the page:
      page:ffffea0007d07f80 count:1 mapcount:0 mapping:ffff8881f6e69a00 index:0x0
      flags: 0x2fffc0000000200(slab)
      raw: 02fffc0000000200 dead000000000100 dead000000000200 ffff8881f6e69a00
      raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8881f41fe480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
       ffff8881f41fe500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8881f41fe580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                           ^
       ffff8881f41fe600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
       ffff8881f41fe680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      It should check the return value of atalk_proc_init fails,
      otherwise atalk_exit will trgger use-after-free in pde_subdir_find
      while unload the module.This patch fix error cleanup path of atalk_init
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6377f787
  3. 02 3月, 2019 2 次提交
  4. 01 3月, 2019 1 次提交
  5. 28 2月, 2019 14 次提交
    • S
      kthread: Do not use TIMER_IRQSAFE · ad01423a
      Sebastian Andrzej Siewior 提交于
      The TIMER_IRQSAFE usage was introduced in commit 22597dc3 ("kthread:
      initial support for delayed kthread work") which modelled the delayed
      kthread code after workqueue's code. The workqueue code requires the flag
      TIMER_IRQSAFE for synchronisation purpose. This is not true for kthread's
      delay timer since all operations occur under a lock.
      
      Remove TIMER_IRQSAFE from the timer initialisation and use timer_setup()
      for initialisation purpose which is the official function.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Link: https://lkml.kernel.org/r/20190212162554.19779-2-bigeasy@linutronix.de
      ad01423a
    • J
      kthread: Convert worker lock to raw spinlock · fe99a4f4
      Julia Cartwright 提交于
      In order to enable the queuing of kthread work items from hardirq context
      even when PREEMPT_RT_FULL is enabled, convert the worker spin_lock to a
      raw_spin_lock.
      
      This is only acceptable to do because the work performed under the lock is
      well-bounded and minimal.
      Reported-by: NSteffen Trumtrar <s.trumtrar@pengutronix.de>
      Reported-by: NTim Sander <tim@krieglstein.org>
      Signed-off-by: NJulia Cartwright <julia@ni.com>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NSteffen Trumtrar <s.trumtrar@pengutronix.de>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Link: https://lkml.kernel.org/r/20190212162554.19779-1-bigeasy@linutronix.de
      fe99a4f4
    • A
      mmc: core: Add discard support to sd · bc47e2f6
      Avri Altman 提交于
      SD spec v5.1 adds discard support. The flows and commands are similar to
      mmc, so just set the discard arg in CMD38.
      
      A host which supports DISCARD shall check if the DISCARD_SUPPORT (b313)
      is set in the SD_STATUS register.  If the card does not support discard,
      the host shall not issue DISCARD command, but ERASE command instead.
      
      Post the DISCARD operation, the card may de-allocate the discarded
      blocks partially or completely. So the host mustn't make any assumptions
      concerning the content of the discarded region. This is unlike ERASE
      command, in which the region is guaranteed to contain either '0's or
      '1's, depends on the content of DATA_STAT_AFTER_ERASE (b55) in the scr
      register.
      
      One more important difference compared to ERASE is the busy timeout
      which we will address on the next patch.
      Signed-off-by: NAvri Altman <avri.altman@wdc.com>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      bc47e2f6
    • P
      locking/lockdep: Shrink struct lock_class_key · 28d49e28
      Peter Zijlstra 提交于
      Shrink struct lock_class_key; we never store anything in subkeys[], we
      only use the addresses.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      28d49e28
    • B
      kernel/workqueue: Use dynamic lockdep keys for workqueues · 669de8bd
      Bart Van Assche 提交于
      The following commit:
      
        87915adc ("workqueue: re-add lockdep dependencies for flushing")
      
      improved deadlock checking in the workqueue implementation. Unfortunately
      that patch also introduced a few false positive lockdep complaints.
      
      This patch suppresses these false positives by allocating the workqueue mutex
      lockdep key dynamically.
      
      An example of a false positive lockdep complaint suppressed by this patch
      can be found below. The root cause of the lockdep complaint shown below
      is that the direct I/O code can call alloc_workqueue() from inside a work
      item created by another alloc_workqueue() call and that both workqueues
      share the same lockdep key. This patch avoids that that lockdep complaint
      is triggered by allocating the work queue lockdep keys dynamically.
      
      In other words, this patch guarantees that a unique lockdep key is
      associated with each work queue mutex.
      
        ======================================================
        WARNING: possible circular locking dependency detected
        4.19.0-dbg+ #1 Not tainted
        fio/4129 is trying to acquire lock:
        00000000a01cfe1a ((wq_completion)"dio/%s"sb->s_id){+.+.}, at: flush_workqueue+0xd0/0x970
      
        but task is already holding lock:
        00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #2 (&sb->s_type->i_mutex_key#14){+.+.}:
               down_write+0x3d/0x80
               __generic_file_fsync+0x77/0xf0
               ext4_sync_file+0x3c9/0x780
               vfs_fsync_range+0x66/0x100
               dio_complete+0x2f5/0x360
               dio_aio_complete_work+0x1c/0x20
               process_one_work+0x481/0x9f0
               worker_thread+0x63/0x5a0
               kthread+0x1cf/0x1f0
               ret_from_fork+0x24/0x30
      
        -> #1 ((work_completion)(&dio->complete_work)){+.+.}:
               process_one_work+0x447/0x9f0
               worker_thread+0x63/0x5a0
               kthread+0x1cf/0x1f0
               ret_from_fork+0x24/0x30
      
        -> #0 ((wq_completion)"dio/%s"sb->s_id){+.+.}:
               lock_acquire+0xc5/0x200
               flush_workqueue+0xf3/0x970
               drain_workqueue+0xec/0x220
               destroy_workqueue+0x23/0x350
               sb_init_dio_done_wq+0x6a/0x80
               do_blockdev_direct_IO+0x1f33/0x4be0
               __blockdev_direct_IO+0x79/0x86
               ext4_direct_IO+0x5df/0xbb0
               generic_file_direct_write+0x119/0x220
               __generic_file_write_iter+0x131/0x2d0
               ext4_file_write_iter+0x3fa/0x710
               aio_write+0x235/0x330
               io_submit_one+0x510/0xeb0
               __x64_sys_io_submit+0x122/0x340
               do_syscall_64+0x71/0x220
               entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
        other info that might help us debug this:
      
        Chain exists of:
          (wq_completion)"dio/%s"sb->s_id --> (work_completion)(&dio->complete_work) --> &sb->s_type->i_mutex_key#14
      
         Possible unsafe locking scenario:
      
               CPU0                    CPU1
               ----                    ----
          lock(&sb->s_type->i_mutex_key#14);
                                       lock((work_completion)(&dio->complete_work));
                                       lock(&sb->s_type->i_mutex_key#14);
          lock((wq_completion)"dio/%s"sb->s_id);
      
         *** DEADLOCK ***
      
        1 lock held by fio/4129:
         #0: 00000000a0acecf9 (&sb->s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710
      
        stack backtrace:
        CPU: 3 PID: 4129 Comm: fio Not tainted 4.19.0-dbg+ #1
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        Call Trace:
         dump_stack+0x86/0xc5
         print_circular_bug.isra.32+0x20a/0x218
         __lock_acquire+0x1c68/0x1cf0
         lock_acquire+0xc5/0x200
         flush_workqueue+0xf3/0x970
         drain_workqueue+0xec/0x220
         destroy_workqueue+0x23/0x350
         sb_init_dio_done_wq+0x6a/0x80
         do_blockdev_direct_IO+0x1f33/0x4be0
         __blockdev_direct_IO+0x79/0x86
         ext4_direct_IO+0x5df/0xbb0
         generic_file_direct_write+0x119/0x220
         __generic_file_write_iter+0x131/0x2d0
         ext4_file_write_iter+0x3fa/0x710
         aio_write+0x235/0x330
         io_submit_one+0x510/0xeb0
         __x64_sys_io_submit+0x122/0x340
         do_syscall_64+0x71/0x220
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: https://lkml.kernel.org/r/20190214230058.196511-20-bvanassche@acm.org
      [ Reworked the changelog a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      669de8bd
    • B
      locking/lockdep: Add support for dynamic keys · 108c1485
      Bart Van Assche 提交于
      A shortcoming of the current lockdep implementation is that it requires
      lock keys to be allocated statically. That forces all instances of lock
      objects that occur in a given data structure to share a lock key. Since
      lock dependency analysis groups lock objects per key sharing lock keys
      can cause false positive lockdep reports. Make it possible to avoid
      such false positive reports by allowing lock keys to be allocated
      dynamically. Require that dynamically allocated lock keys are
      registered before use by calling lockdep_register_key(). Complain about
      attempts to register the same lock key pointer twice without calling
      lockdep_unregister_key() between successive registration calls.
      
      The purpose of the new lock_keys_hash[] data structure that keeps
      track of all dynamic keys is twofold:
      
        - Verify whether the lockdep_register_key() and lockdep_unregister_key()
          functions are used correctly.
      
        - Avoid that lockdep_init_map() complains when encountering a dynamically
          allocated key.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: johannes.berg@intel.com
      Cc: tj@kernel.org
      Link: https://lkml.kernel.org/r/20190214230058.196511-19-bvanassche@acm.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      108c1485
    • B
      locking/lockdep: Free lock classes that are no longer in use · a0b0fd53
      Bart Van Assche 提交于
      Instead of leaving lock classes that are no longer in use in the
      lock_classes array, reuse entries from that array that are no longer in
      use. Maintain a linked list of free lock classes with list head
      'free_lock_class'. Only add freed lock classes to the free_lock_classes
      list after a grace period to avoid that a lock_classes[] element would
      be reused while an RCU reader is accessing it. Since the lockdep
      selftests run in a context where sleeping is not allowed and since the
      selftests require that lock resetting/zapping works with debug_locks
      off, make the behavior of lockdep_free_key_range() and
      lockdep_reset_lock() depend on whether or not these are called from
      the context of the lockdep selftests.
      
      Thanks to Peter for having shown how to modify get_pending_free()
      such that that function does not have to sleep.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: johannes.berg@intel.com
      Cc: tj@kernel.org
      Link: https://lkml.kernel.org/r/20190214230058.196511-12-bvanassche@acm.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a0b0fd53
    • B
      locking/lockdep: Make it easy to detect whether or not inside a selftest · cdc84d79
      Bart Van Assche 提交于
      The patch that frees unused lock classes will modify the behavior of
      lockdep_free_key_range() and lockdep_reset_lock() depending on whether
      or not these functions are called from the context of the lockdep
      selftests. Hence make it easy to detect whether or not lockdep code
      is called from the context of a lockdep selftest.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: johannes.berg@intel.com
      Cc: tj@kernel.org
      Link: https://lkml.kernel.org/r/20190214230058.196511-10-bvanassche@acm.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cdc84d79
    • B
      locking/lockdep: Make zap_class() remove all matching lock order entries · 86cffb80
      Bart Van Assche 提交于
      Make sure that all lock order entries that refer to a class are removed
      from the list_entries[] array when a kernel module is unloaded.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: johannes.berg@intel.com
      Cc: tj@kernel.org
      Link: https://lkml.kernel.org/r/20190214230058.196511-7-bvanassche@acm.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      86cffb80
    • B
      locking/lockdep: Reorder struct lock_class members · 09329d1c
      Bart Van Assche 提交于
      This patch does not change any functionality but makes the patch that
      frees lock classes that are no longer in use easier to read.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: johannes.berg@intel.com
      Cc: tj@kernel.org
      Link: https://lkml.kernel.org/r/20190214230058.196511-6-bvanassche@acm.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      09329d1c
    • P
      locking/percpu-rwsem: Remove preempt_disable variants · 02e525b2
      Peter Zijlstra 提交于
      Effective revert commit:
      
        87709e28 ("fs/locks: Use percpu_down_read_preempt_disable()")
      
      This is causing major pain for PREEMPT_RT.
      
      Sebastian did a lot of lockperf runs on 2 and 4 node machines with all
      preemption modes (PREEMPT=n should be an obvious NOP for this patch
      and thus serves as a good control) and no results showed significance
      over 2-sigma (the PREEMPT=n results were almost empty at 1-sigma).
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      02e525b2
    • F
      net: Remove switchdev_ops · 3d705f07
      Florian Fainelli 提交于
      Now that we have converted all possible callers to using a switchdev
      notifier for attributes we do not have a need for implementing
      switchdev_ops anymore, and this can be removed from all drivers the
      net_device structure.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d705f07
    • A
      net: dev: Use unsigned integer as an argument to left-shift · f4d7b3e2
      Andy Shevchenko 提交于
      1 << 31 is Undefined Behaviour according to the C standard.
      Use U type modifier to avoid theoretical overflow.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4d7b3e2
    • A
      bpf: enable program stats · 492ecee8
      Alexei Starovoitov 提交于
      JITed BPF programs are indistinguishable from kernel functions, but unlike
      kernel code BPF code can be changed often.
      Typical approach of "perf record" + "perf report" profiling and tuning of
      kernel code works just as well for BPF programs, but kernel code doesn't
      need to be monitored whereas BPF programs do.
      Users load and run large amount of BPF programs.
      These BPF stats allow tools monitor the usage of BPF on the server.
      The monitoring tools will turn sysctl kernel.bpf_stats_enabled
      on and off for few seconds to sample average cost of the programs.
      Aggregated data over hours and days will provide an insight into cost of BPF
      and alarms can trigger in case given program suddenly gets more expensive.
      
      The cost of two sched_clock() per program invocation adds ~20 nsec.
      Fast BPF progs (like selftests/bpf/progs/test_pkt_access.c) will slow down
      from ~10 nsec to ~30 nsec.
      static_key minimizes the cost of the stats collection.
      There is no measurable difference before/after this patch
      with kernel.bpf_stats_enabled=0
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      492ecee8
  6. 27 2月, 2019 1 次提交
  7. 26 2月, 2019 1 次提交
    • L
      Revert "x86/fault: BUG() when uaccess helpers fault on kernel addresses" · 53a41cb7
      Linus Torvalds 提交于
      This reverts commit 9da3f2b7.
      
      It was well-intentioned, but wrong.  Overriding the exception tables for
      instructions for random reasons is just wrong, and that is what the new
      code did.
      
      It caused problems for tracing, and it caused problems for strncpy_from_user(),
      because the new checks made perfectly valid use cases break, rather than
      catch things that did bad things.
      
      Unchecked user space accesses are a problem, but that's not a reason to
      add invalid checks that then people have to work around with silly flags
      (in this case, that 'kernel_uaccess_faults_ok' flag, which is just an
      odd way to say "this commit was wrong" and was sprinked into random
      places to hide the wrongness).
      
      The real fix to unchecked user space accesses is to get rid of the
      special "let's not check __get_user() and __put_user() at all" logic.
      Make __{get|put}_user() be just aliases to the regular {get|put}_user()
      functions, and make it impossible to access user space without having
      the proper checks in places.
      
      The raison d'être of the special double-underscore versions used to be
      that the range check was expensive, and if you did multiple user
      accesses, you'd do the range check up front (like the signal frame
      handling code, for example).  But SMAP (on x86) and PAN (on ARM) have
      made that optimization pointless, because the _real_ expense is the "set
      CPU flag to allow user space access".
      
      Do let's not break the valid cases to catch invalid cases that shouldn't
      even exist.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Tobin C. Harding <tobin@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Jann Horn <jannh@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53a41cb7
  8. 25 2月, 2019 11 次提交
  9. 24 2月, 2019 1 次提交
    • L
      net: phy: realtek: Dummy IRQ calls for RTL8366RB · 4c8e0459
      Linus Walleij 提交于
      This fixes a regression introduced by
      commit 0d2e778e
      "net: phy: replace PHY_HAS_INTERRUPT with a check for
      config_intr and ack_interrupt".
      
      This assumes that a PHY cannot trigger interrupt unless
      it has .config_intr() or .ack_interrupt() implemented.
      A later patch makes the code assume both need to be
      implemented for interrupts to be present.
      
      But this PHY (which is inside a DSA) will happily
      fire interrupts without either callback.
      
      Implement dummy callbacks for .config_intr() and
      .ack_interrupt() in the phy header to fix this.
      
      Tested on the RTL8366RB on D-Link DIR-685.
      
      Fixes: 0d2e778e ("net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt")
      Cc: Heiner Kallweit <hkallweit1@gmail.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c8e0459