1. 03 11月, 2020 1 次提交
    • E
      bpf: Fix error path in htab_map_alloc() · 8aaeed81
      Eric Dumazet 提交于
      syzbot was able to trigger a use-after-free in htab_map_alloc() [1]
      
      htab_map_alloc() lacks a call to lockdep_unregister_key() in its error path.
      
      lockdep_register_key() and lockdep_unregister_key() can not fail,
      it seems better to use them right after htab allocation and before
      htab freeing, avoiding more goto/labels in htab_map_alloc()
      
      [1]
      BUG: KASAN: use-after-free in lockdep_register_key+0x356/0x3e0 kernel/locking/lockdep.c:1182
      Read of size 8 at addr ffff88805fa67ad8 by task syz-executor.3/2356
      
      CPU: 1 PID: 2356 Comm: syz-executor.3 Not tainted 5.9.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x107/0x163 lib/dump_stack.c:118
       print_address_description.constprop.0.cold+0xae/0x4c8 mm/kasan/report.c:385
       __kasan_report mm/kasan/report.c:545 [inline]
       kasan_report.cold+0x1f/0x37 mm/kasan/report.c:562
       lockdep_register_key+0x356/0x3e0 kernel/locking/lockdep.c:1182
       htab_init_buckets kernel/bpf/hashtab.c:144 [inline]
       htab_map_alloc+0x6c5/0x14a0 kernel/bpf/hashtab.c:521
       find_and_alloc_map kernel/bpf/syscall.c:122 [inline]
       map_create kernel/bpf/syscall.c:825 [inline]
       __do_sys_bpf+0xa80/0x5180 kernel/bpf/syscall.c:4381
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45deb9
      Code: 0d b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f0eafee1c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
      RAX: ffffffffffffffda RBX: 0000000000001a00 RCX: 000000000045deb9
      RDX: 0000000000000040 RSI: 0000000020000040 RDI: 405a020000000000
      RBP: 000000000118bf60 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000118bf2c
      R13: 00007ffd3cf9eabf R14: 00007f0eafee29c0 R15: 000000000118bf2c
      
      Allocated by task 2053:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
       kasan_set_track mm/kasan/common.c:56 [inline]
       __kasan_kmalloc.constprop.0+0xc2/0xd0 mm/kasan/common.c:461
       kmalloc include/linux/slab.h:554 [inline]
       kzalloc include/linux/slab.h:666 [inline]
       htab_map_alloc+0xdf/0x14a0 kernel/bpf/hashtab.c:454
       find_and_alloc_map kernel/bpf/syscall.c:122 [inline]
       map_create kernel/bpf/syscall.c:825 [inline]
       __do_sys_bpf+0xa80/0x5180 kernel/bpf/syscall.c:4381
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 2053:
       kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
       kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
       kasan_set_free_info+0x1b/0x30 mm/kasan/generic.c:355
       __kasan_slab_free+0x102/0x140 mm/kasan/common.c:422
       slab_free_hook mm/slub.c:1544 [inline]
       slab_free_freelist_hook+0x5d/0x150 mm/slub.c:1577
       slab_free mm/slub.c:3142 [inline]
       kfree+0xdb/0x360 mm/slub.c:4124
       htab_map_alloc+0x3f9/0x14a0 kernel/bpf/hashtab.c:549
       find_and_alloc_map kernel/bpf/syscall.c:122 [inline]
       map_create kernel/bpf/syscall.c:825 [inline]
       __do_sys_bpf+0xa80/0x5180 kernel/bpf/syscall.c:4381
       do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The buggy address belongs to the object at ffff88805fa67800
       which belongs to the cache kmalloc-1k of size 1024
      The buggy address is located 728 bytes inside of
       1024-byte region [ffff88805fa67800, ffff88805fa67c00)
      The buggy address belongs to the page:
      page:000000003c5582c4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fa60
      head:000000003c5582c4 order:3 compound_mapcount:0 compound_pincount:0
      flags: 0xfff00000010200(slab|head)
      raw: 00fff00000010200 ffffea0000bc1200 0000000200000002 ffff888010041140
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88805fa67980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88805fa67a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                          ^
       ffff88805fa67b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88805fa67b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: c50eb518 ("bpf: Use separate lockdep class for each hashtab")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20201102114100.3103180-1-eric.dumazet@gmail.com
      8aaeed81
  2. 31 10月, 2020 3 次提交
    • A
      Merge branch 'bpf: safeguard hashtab locking in NMI context' · cb5dc5b0
      Alexei Starovoitov 提交于
      Song Liu says:
      
      ====================
      LOCKDEP NMI warning highlighted potential deadlock of hashtab in NMI
      context:
      
      [   74.828971] ================================
      [   74.828972] WARNING: inconsistent lock state
      [   74.828973] 5.9.0-rc8+ #275 Not tainted
      [   74.828974] --------------------------------
      [   74.828975] inconsistent {INITIAL USE} -> {IN-NMI} usage.
      [   74.828976] taskset/1174 [HC2[2]:SC0[0]:HE0:SE1] takes:
      [...]
      [   74.828999]  Possible unsafe locking scenario:
      [   74.828999]
      [   74.829000]        CPU0
      [   74.829001]        ----
      [   74.829001]   lock(&htab->buckets[i].raw_lock);
      [   74.829003]   <Interrupt>
      [   74.829004]     lock(&htab->buckets[i].raw_lock);
      
      Please refer to patch 1/2 for full trace.
      
      This warning is a false alert, as "INITIAL USE" and "IN-NMI" in the tests
      are from different hashtab. On the other hand, in theory, it is possible
      to deadlock when a hashtab is access from both non-NMI and NMI context.
      Patch 1/2 fixes this false alert by assigning separate lockdep class to
      each hashtab. Patch 2/2 introduces map_locked counters, which is similar to
      bpf_prog_active counter, to avoid hashtab deadlock in NMI context.
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      cb5dc5b0
    • S
      bpf: Avoid hashtab deadlock with map_locked · 20b6cc34
      Song Liu 提交于
      If a hashtab is accessed in both non-NMI and NMI context, the system may
      deadlock on bucket->lock. Fix this issue with percpu counter map_locked.
      map_locked rejects concurrent access to the same bucket from the same CPU.
      To reduce memory overhead, map_locked is not added per bucket. Instead,
      8 percpu counters are added to each hashtab. buckets are assigned to these
      counters based on the lower bits of its hash.
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20201029071925.3103400-3-songliubraving@fb.com
      20b6cc34
    • S
      bpf: Use separate lockdep class for each hashtab · c50eb518
      Song Liu 提交于
      If a hashtab is accessed in both NMI and non-NMI contexts, it may cause
      deadlock in bucket->lock. LOCKDEP NMI warning highlighted this issue:
      
      ./test_progs -t stacktrace
      
      [   74.828970]
      [   74.828971] ================================
      [   74.828972] WARNING: inconsistent lock state
      [   74.828973] 5.9.0-rc8+ #275 Not tainted
      [   74.828974] --------------------------------
      [   74.828975] inconsistent {INITIAL USE} -> {IN-NMI} usage.
      [   74.828976] taskset/1174 [HC2[2]:SC0[0]:HE0:SE1] takes:
      [   74.828977] ffffc90000ee96b0 (&htab->buckets[i].raw_lock){....}-{2:2}, at: htab_map_update_elem+0x271/0x5a0
      [   74.828981] {INITIAL USE} state was registered at:
      [   74.828982]   lock_acquire+0x137/0x510
      [   74.828983]   _raw_spin_lock_irqsave+0x43/0x90
      [   74.828984]   htab_map_update_elem+0x271/0x5a0
      [   74.828984]   0xffffffffa0040b34
      [   74.828985]   trace_call_bpf+0x159/0x310
      [   74.828986]   perf_trace_run_bpf_submit+0x5f/0xd0
      [   74.828987]   perf_trace_urandom_read+0x1be/0x220
      [   74.828988]   urandom_read_nowarn.isra.0+0x26f/0x380
      [   74.828989]   vfs_read+0xf8/0x280
      [   74.828989]   ksys_read+0xc9/0x160
      [   74.828990]   do_syscall_64+0x33/0x40
      [   74.828991]   entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   74.828992] irq event stamp: 1766
      [   74.828993] hardirqs last  enabled at (1765): [<ffffffff82800ace>] asm_exc_page_fault+0x1e/0x30
      [   74.828994] hardirqs last disabled at (1766): [<ffffffff8267df87>] irqentry_enter+0x37/0x60
      [   74.828995] softirqs last  enabled at (856): [<ffffffff81043e7c>] fpu__clear+0xac/0x120
      [   74.828996] softirqs last disabled at (854): [<ffffffff81043df0>] fpu__clear+0x20/0x120
      [   74.828997]
      [   74.828998] other info that might help us debug this:
      [   74.828999]  Possible unsafe locking scenario:
      [   74.828999]
      [   74.829000]        CPU0
      [   74.829001]        ----
      [   74.829001]   lock(&htab->buckets[i].raw_lock);
      [   74.829003]   <Interrupt>
      [   74.829004]     lock(&htab->buckets[i].raw_lock);
      [   74.829006]
      [   74.829006]  *** DEADLOCK ***
      [   74.829007]
      [   74.829008] 1 lock held by taskset/1174:
      [   74.829008]  #0: ffff8883ec3fd020 (&cpuctx_lock){-...}-{2:2}, at: perf_event_task_tick+0x101/0x650
      [   74.829012]
      [   74.829013] stack backtrace:
      [   74.829014] CPU: 0 PID: 1174 Comm: taskset Not tainted 5.9.0-rc8+ #275
      [   74.829015] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
      [   74.829016] Call Trace:
      [   74.829016]  <NMI>
      [   74.829017]  dump_stack+0x9a/0xd0
      [   74.829018]  lock_acquire+0x461/0x510
      [   74.829019]  ? lock_release+0x6b0/0x6b0
      [   74.829020]  ? stack_map_get_build_id_offset+0x45e/0x800
      [   74.829021]  ? htab_map_update_elem+0x271/0x5a0
      [   74.829022]  ? rcu_read_lock_held_common+0x1a/0x50
      [   74.829022]  ? rcu_read_lock_held+0x5f/0xb0
      [   74.829023]  _raw_spin_lock_irqsave+0x43/0x90
      [   74.829024]  ? htab_map_update_elem+0x271/0x5a0
      [   74.829025]  htab_map_update_elem+0x271/0x5a0
      [   74.829026]  bpf_prog_1fd9e30e1438d3c5_oncpu+0x9c/0xe88
      [   74.829027]  bpf_overflow_handler+0x127/0x320
      [   74.829028]  ? perf_event_text_poke_output+0x4d0/0x4d0
      [   74.829029]  ? sched_clock_cpu+0x18/0x130
      [   74.829030]  __perf_event_overflow+0xae/0x190
      [   74.829030]  handle_pmi_common+0x34c/0x470
      [   74.829031]  ? intel_pmu_save_and_restart+0x90/0x90
      [   74.829032]  ? lock_acquire+0x3f8/0x510
      [   74.829033]  ? lock_release+0x6b0/0x6b0
      [   74.829034]  intel_pmu_handle_irq+0x11e/0x240
      [   74.829034]  perf_event_nmi_handler+0x40/0x60
      [   74.829035]  nmi_handle+0x110/0x360
      [   74.829036]  ? __intel_pmu_enable_all.constprop.0+0x72/0xf0
      [   74.829037]  default_do_nmi+0x6b/0x170
      [   74.829038]  exc_nmi+0x106/0x130
      [   74.829038]  end_repeat_nmi+0x16/0x55
      [   74.829039] RIP: 0010:__intel_pmu_enable_all.constprop.0+0x72/0xf0
      [   74.829042] Code: 2f 1f 03 48 8d bb b8 0c 00 00 e8 29 09 41 00 48 ...
      [   74.829043] RSP: 0000:ffff8880a604fc90 EFLAGS: 00000002
      [   74.829044] RAX: 000000070000000f RBX: ffff8883ec2195a0 RCX: 000000000000038f
      [   74.829045] RDX: 0000000000000007 RSI: ffffffff82e72c20 RDI: ffff8883ec21a258
      [   74.829046] RBP: 000000070000000f R08: ffffffff8101b013 R09: fffffbfff0a7982d
      [   74.829047] R10: ffffffff853cc167 R11: fffffbfff0a7982c R12: 0000000000000000
      [   74.829049] R13: ffff8883ec3f0af0 R14: ffff8883ec3fd120 R15: ffff8883e9c92098
      [   74.829049]  ? intel_pmu_lbr_enable_all+0x43/0x240
      [   74.829050]  ? __intel_pmu_enable_all.constprop.0+0x72/0xf0
      [   74.829051]  ? __intel_pmu_enable_all.constprop.0+0x72/0xf0
      [   74.829052]  </NMI>
      [   74.829053]  perf_event_task_tick+0x48d/0x650
      [   74.829054]  scheduler_tick+0x129/0x210
      [   74.829054]  update_process_times+0x37/0x70
      [   74.829055]  tick_sched_handle.isra.0+0x35/0x90
      [   74.829056]  tick_sched_timer+0x8f/0xb0
      [   74.829057]  __hrtimer_run_queues+0x364/0x7d0
      [   74.829058]  ? tick_sched_do_timer+0xa0/0xa0
      [   74.829058]  ? enqueue_hrtimer+0x1e0/0x1e0
      [   74.829059]  ? recalibrate_cpu_khz+0x10/0x10
      [   74.829060]  ? ktime_get_update_offsets_now+0x1a3/0x360
      [   74.829061]  hrtimer_interrupt+0x1bb/0x360
      [   74.829062]  ? rcu_read_lock_sched_held+0xa1/0xd0
      [   74.829063]  __sysvec_apic_timer_interrupt+0xed/0x3d0
      [   74.829064]  sysvec_apic_timer_interrupt+0x3f/0xd0
      [   74.829064]  ? asm_sysvec_apic_timer_interrupt+0xa/0x20
      [   74.829065]  asm_sysvec_apic_timer_interrupt+0x12/0x20
      [   74.829066] RIP: 0033:0x7fba18d579b4
      [   74.829068] Code: 74 54 44 0f b6 4a 04 41 83 e1 0f 41 80 f9 ...
      [   74.829069] RSP: 002b:00007ffc9ba69570 EFLAGS: 00000206
      [   74.829071] RAX: 00007fba192084c0 RBX: 00007fba18c24d28 RCX: 00000000000007a4
      [   74.829072] RDX: 00007fba18c30488 RSI: 0000000000000000 RDI: 000000000000037b
      [   74.829073] RBP: 00007fba18ca5760 R08: 00007fba18c248fc R09: 00007fba18c94c30
      [   74.829074] R10: 000000000000002f R11: 0000000000073c30 R12: 00007ffc9ba695e0
      [   74.829075] R13: 00000000000003f3 R14: 00007fba18c21ac8 R15: 00000000000058d6
      
      However, such warning should not apply across multiple hashtabs. The
      system will not deadlock if one hashtab is used in NMI, while another
      hashtab is used in non-NMI.
      
      Use separate lockdep class for each hashtab, so that we don't get this
      false alert.
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20201029071925.3103400-2-songliubraving@fb.com
      c50eb518
  3. 29 10月, 2020 1 次提交
    • Y
      bpf: Permit cond_resched for some iterators · cf83b2d2
      Yonghong Song 提交于
      Commit e679654a ("bpf: Fix a rcu_sched stall issue with
      bpf task/task_file iterator") tries to fix rcu stalls warning
      which is caused by bpf task_file iterator when running
      "bpftool prog".
      
            rcu: INFO: rcu_sched self-detected stall on CPU
            rcu: \x097-....: (20999 ticks this GP) idle=302/1/0x4000000000000000 softirq=1508852/1508852 fqs=4913
            \x09(t=21031 jiffies g=2534773 q=179750)
            NMI backtrace for cpu 7
            CPU: 7 PID: 184195 Comm: bpftool Kdump: loaded Tainted: G        W         5.8.0-00004-g68bfc7f8c1b4 #6
            Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019
            Call Trace:
            <IRQ>
            dump_stack+0x57/0x70
            nmi_cpu_backtrace.cold+0x14/0x53
            ? lapic_can_unplug_cpu.cold+0x39/0x39
            nmi_trigger_cpumask_backtrace+0xb7/0xc7
            rcu_dump_cpu_stacks+0xa2/0xd0
            rcu_sched_clock_irq.cold+0x1ff/0x3d9
            ? tick_nohz_handler+0x100/0x100
            update_process_times+0x5b/0x90
            tick_sched_timer+0x5e/0xf0
            __hrtimer_run_queues+0x12a/0x2a0
            hrtimer_interrupt+0x10e/0x280
            __sysvec_apic_timer_interrupt+0x51/0xe0
            asm_call_on_stack+0xf/0x20
            </IRQ>
            sysvec_apic_timer_interrupt+0x6f/0x80
            ...
            task_file_seq_next+0x52/0xa0
            bpf_seq_read+0xb9/0x320
            vfs_read+0x9d/0x180
            ksys_read+0x5f/0xe0
            do_syscall_64+0x38/0x60
            entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      The fix is to limit the number of bpf program runs to be
      one million. This fixed the program in most cases. But
      we also found under heavy load, which can increase the wallclock
      time for bpf_seq_read(), the warning may still be possible.
      
      For example, calling bpf_delay() in the "while" loop of
      bpf_seq_read(), which will introduce artificial delay,
      the warning will show up in my qemu run.
      
        static unsigned q;
        volatile unsigned *p = &q;
        volatile unsigned long long ll;
        static void bpf_delay(void)
        {
               int i, j;
      
               for (i = 0; i < 10000; i++)
                       for (j = 0; j < 10000; j++)
                               ll += *p;
        }
      
      There are two ways to fix this issue. One is to reduce the above
      one million threshold to say 100,000 and hopefully rcu warning will
      not show up any more. Another is to introduce a target feature
      which enables bpf_seq_read() calling cond_resched().
      
      This patch took second approach as the first approach may cause
      more -EAGAIN failures for read() syscalls. Note that not all bpf_iter
      targets can permit cond_resched() in bpf_seq_read() as some, e.g.,
      netlink seq iterator, rcu read lock critical section spans through
      seq_ops->next() -> seq_ops->show() -> seq_ops->next().
      
      For the kernel code with the above hack, "bpftool p" roughly takes
      38 seconds to finish on my VM with 184 bpf program runs.
      Using the following command, I am able to collect the number of
      context switches:
         perf stat -e context-switches -- ./bpftool p >& log
      Without this patch,
         69      context-switches
      With this patch,
         75      context-switches
      This patch added additional 6 context switches, roughly every 6 seconds
      to reschedule, to avoid lengthy no-rescheduling which may cause the
      above RCU warnings.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20201028061054.1411116-1-yhs@fb.com
      cf83b2d2
  4. 24 10月, 2020 10 次提交
    • L
      Merge tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 3cb12d27
      Linus Torvalds 提交于
      Pull networking fixes from Jakub Kicinski:
       "Cross-tree/merge window issues:
      
         - rtl8150: don't incorrectly assign random MAC addresses; fix late in
           the 5.9 cycle started depending on a return code from a function
           which changed with the 5.10 PR from the usb subsystem
      
        Current release regressions:
      
         - Revert "virtio-net: ethtool configurable RXCSUM", it was causing
           crashes at probe when control vq was not negotiated/available
      
        Previous release regressions:
      
         - ixgbe: fix probing of multi-port 10 Gigabit Intel NICs with an MDIO
           bus, only first device would be probed correctly
      
         - nexthop: Fix performance regression in nexthop deletion by
           effectively switching from recently added synchronize_rcu() to
           synchronize_rcu_expedited()
      
         - netsec: ignore 'phy-mode' device property on ACPI systems; the
           property is not populated correctly by the firmware, but firmware
           configures the PHY so just keep boot settings
      
        Previous releases - always broken:
      
         - tcp: fix to update snd_wl1 in bulk receiver fast path, addressing
           bulk transfers getting "stuck"
      
         - icmp: randomize the global rate limiter to prevent attackers from
           getting useful signal
      
         - r8169: fix operation under forced interrupt threading, make the
           driver always use hard irqs, even on RT, given the handler is light
           and only wants to schedule napi (and do so through a _irqoff()
           variant, preferably)
      
         - bpf: Enforce pointer id generation for all may-be-null register
           type to avoid pointers erroneously getting marked as null-checked
      
         - tipc: re-configure queue limit for broadcast link
      
         - net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN
           tunnels
      
         - fix various issues in chelsio inline tls driver
      
        Misc:
      
         - bpf: improve just-added bpf_redirect_neigh() helper api to support
           supplying nexthop by the caller - in case BPF program has already
           done a lookup we can avoid doing another one
      
         - remove unnecessary break statements
      
         - make MCTCP not select IPV6, but rather depend on it"
      
      * tag 'net-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (62 commits)
        tcp: fix to update snd_wl1 in bulk receiver fast path
        net: Properly typecast int values to set sk_max_pacing_rate
        netfilter: nf_fwd_netdev: clear timestamp in forwarding path
        ibmvnic: save changed mac address to adapter->mac_addr
        selftests: mptcp: depends on built-in IPv6
        Revert "virtio-net: ethtool configurable RXCSUM"
        rtnetlink: fix data overflow in rtnl_calcit()
        net: ethernet: mtk-star-emac: select REGMAP_MMIO
        net: hdlc_raw_eth: Clear the IFF_TX_SKB_SHARING flag after calling ether_setup
        net: hdlc: In hdlc_rcv, check to make sure dev is an HDLC device
        bpf, libbpf: Guard bpf inline asm from bpf_tail_call_static
        bpf, selftests: Extend test_tc_redirect to use modified bpf_redirect_neigh()
        bpf: Fix bpf_redirect_neigh helper api to support supplying nexthop
        mptcp: depends on IPV6 but not as a module
        sfc: move initialisation of efx->filter_sem to efx_init_struct()
        mpls: load mpls_gso after mpls_iptunnel
        net/sched: act_tunnel_key: fix OOB write in case of IPv6 ERSPAN tunnels
        net/sched: act_gate: Unlock ->tcfa_lock in tc_setup_flow_action()
        net: dsa: bcm_sf2: make const array static, makes object smaller
        mptcp: MPTCP_IPV6 should depend on IPV6 instead of selecting it
        ...
      3cb12d27
    • L
      Merge tag 'gfs2-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 · 0adc313c
      Linus Torvalds 提交于
      Pull gfs2 updates from Andreas Gruenbacher:
      
       - Use iomap for non-journaled buffered I/O. This largely eliminates
         buffer heads on filesystems where the block size matches the page
         size. Many thanks to Christoph Hellwig for this patch!
      
       - Fixes for some more journaled data filesystem bugs, found by running
         xfstests with data journaling on for all files (chattr +j $MNT) (Bob
         Peterson)
      
       - gfs2_evict_inode refactoring (Bob Peterson)
      
       - Use the statfs data in the journal during recovery instead of reading
         it in from the local statfs inodes (Abhi Das)
      
       - Several other minor fixes by various people
      
      * tag 'gfs2-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (30 commits)
        gfs2: Recover statfs info in journal head
        gfs2: lookup local statfs inodes prior to journal recovery
        gfs2: Add fields for statfs info in struct gfs2_log_header_host
        gfs2: Ignore subsequent errors after withdraw in rgrp_go_sync
        gfs2: Eliminate gl_vm
        gfs2: Only access gl_delete for iopen glocks
        gfs2: Fix comments to glock_hash_walk
        gfs2: eliminate GLF_QUEUED flag in favor of list_empty(gl_holders)
        gfs2: Ignore journal log writes for jdata holes
        gfs2: simplify gfs2_block_map
        gfs2: Only set PageChecked if we have a transaction
        gfs2: don't lock sd_ail_lock in gfs2_releasepage
        gfs2: make gfs2_ail1_empty_one return the count of active items
        gfs2: Wipe jdata and ail1 in gfs2_journal_wipe, formerly gfs2_meta_wipe
        gfs2: enhance log_blocks trace point to show log blocks free
        gfs2: add missing log_blocks trace points in gfs2_write_revokes
        gfs2: rename gfs2_write_full_page to gfs2_write_jdata_page, remove parm
        gfs2: add validation checks for size of superblock
        gfs2: use-after-free in sysfs deregistration
        gfs2: Fix NULL pointer dereference in gfs2_rgrp_dump
        ...
      0adc313c
    • L
      Merge tag '5.10-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6 · 0613ed91
      Linus Torvalds 提交于
      Pull cifs updates from Steve French:
      
       - add support for recognizing special file types (char/block/fifo/
         symlink) for files created by Linux on WSL (a format we plan to move
         to as the default for creating special files on Linux, as it has
         advantages over the other current option, the SFU format) in readdir.
      
       - fix double queries to root directory when directory leases not
         supported (e.g. Samba)
      
       - fix querying mode bits (modefromsid mount option) for special file
         types
      
       - stronger encryption (gcm256), disabled by default until tested more
         broadly
      
       - allow querying owner when server reports 'well known SID' on query
         dir with SMB3.1.1 POSIX extensions
      
      * tag '5.10-rc-smb3-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6: (30 commits)
        SMB3: add support for recognizing WSL reparse tags
        cifs: remove bogus debug code
        smb3.1.1: fix typo in compression flag
        cifs: move smb version mount options into fs_context.c
        cifs: move cache mount options to fs_context.ch
        cifs: move security mount options into fs_context.ch
        cifs: add files to host new mount api
        smb3: do not try to cache root directory if dir leases not supported
        smb3: fix stat when special device file and mounted with modefromsid
        cifs: Print the address and port we are connecting to in generic_ip_connect()
        SMB3: Resolve data corruption of TCP server info fields
        cifs: make const array static, makes object smaller
        SMB3.1.1: Fix ids returned in POSIX query dir
        smb3: add dynamic trace point to trace when credits obtained
        smb3.1.1: do not fail if no encryption required but server doesn't support it
        cifs: Return the error from crypt_message when enc/dec key not found.
        smb3.1.1: set gcm256 when requested
        smb3.1.1: rename nonces used for GCM and CCM encryption
        smb3.1.1: print warning if server does not support requested encryption type
        smb3.1.1: add new module load parm enable_gcm_256
        ...
      0613ed91
    • L
      Merge tag 'vfs-5.10-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · c4728cfb
      Linus Torvalds 提交于
      Pull clone/dedupe/remap code refactoring from Darrick Wong:
       "Move the generic file range remap (aka reflink and dedupe) functions
        out of mm/filemap.c and fs/read_write.c and into fs/remap_range.c to
        reduce clutter in the first two files"
      
      * tag 'vfs-5.10-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        vfs: move the generic write and copy checks out of mm
        vfs: move the remap range helpers to remap_range.c
        vfs: move generic_remap_checks out of mm
      c4728cfb
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · f9a705ad
      Linus Torvalds 提交于
      Pull KVM updates from Paolo Bonzini:
       "For x86, there is a new alternative and (in the future) more scalable
        implementation of extended page tables that does not need a reverse
        map from guest physical addresses to host physical addresses.
      
        For now it is disabled by default because it is still lacking a few of
        the existing MMU's bells and whistles. However it is a very solid
        piece of work and it is already available for people to hammer on it.
      
        Other updates:
      
        ARM:
         - New page table code for both hypervisor and guest stage-2
         - Introduction of a new EL2-private host context
         - Allow EL2 to have its own private per-CPU variables
         - Support of PMU event filtering
         - Complete rework of the Spectre mitigation
      
        PPC:
         - Fix for running nested guests with in-kernel IRQ chip
         - Fix race condition causing occasional host hard lockup
         - Minor cleanups and bugfixes
      
        x86:
         - allow trapping unknown MSRs to userspace
         - allow userspace to force #GP on specific MSRs
         - INVPCID support on AMD
         - nested AMD cleanup, on demand allocation of nested SVM state
         - hide PV MSRs and hypercalls for features not enabled in CPUID
         - new test for MSR_IA32_TSC writes from host and guest
         - cleanups: MMU, CPUID, shared MSRs
         - LAPIC latency optimizations ad bugfixes"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (232 commits)
        kvm: x86/mmu: NX largepage recovery for TDP MMU
        kvm: x86/mmu: Don't clear write flooding count for direct roots
        kvm: x86/mmu: Support MMIO in the TDP MMU
        kvm: x86/mmu: Support write protection for nesting in tdp MMU
        kvm: x86/mmu: Support disabling dirty logging for the tdp MMU
        kvm: x86/mmu: Support dirty logging for the TDP MMU
        kvm: x86/mmu: Support changed pte notifier in tdp MMU
        kvm: x86/mmu: Add access tracking for tdp_mmu
        kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU
        kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU
        kvm: x86/mmu: Add TDP MMU PF handler
        kvm: x86/mmu: Remove disallowed_hugepage_adjust shadow_walk_iterator arg
        kvm: x86/mmu: Support zapping SPTEs in the TDP MMU
        KVM: Cache as_id in kvm_memory_slot
        kvm: x86/mmu: Add functions to handle changed TDP SPTEs
        kvm: x86/mmu: Allocate and free TDP MMU roots
        kvm: x86/mmu: Init / Uninit the TDP MMU
        kvm: x86/mmu: Introduce tdp_iter
        KVM: mmu: extract spte.h and spte.c
        KVM: mmu: Separate updating a PTE from kvm_set_pte_rmapp
        ...
      f9a705ad
    • L
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 9313f802
      Linus Torvalds 提交于
      Pull virtio updates from Michael Tsirkin:
       "vhost, vdpa, and virtio cleanups and fixes
      
        A very quiet cycle, no new features"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        MAINTAINERS: add URL for virtio-mem
        vhost_vdpa: remove unnecessary spin_lock in vhost_vring_call
        vringh: fix __vringh_iov() when riov and wiov are different
        vdpa/mlx5: Setup driver only if VIRTIO_CONFIG_S_DRIVER_OK
        s390: virtio: PV needs VIRTIO I/O device protection
        virtio: let arch advertise guest's memory access restrictions
        vhost_vdpa: Fix duplicate included kernel.h
        vhost: reduce stack usage in log_used
        virtio-mem: Constify mem_id_table
        virtio_input: Constify id_table
        virtio-balloon: Constify id_table
        vdpa/mlx5: Fix failure to bring link up
        vdpa/mlx5: Make use of a specific 16 bit endianness API
      9313f802
    • L
      Merge tag 'tag-chrome-platform-for-v5.10' of... · 090a7d04
      Linus Torvalds 提交于
      Merge tag 'tag-chrome-platform-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux
      
      Pull chrome platform updates from Benson Leung:
       "cros-ec:
         - Error code cleanup across cros-ec by Guenter
         - Remove cros_ec_cmd_xfer in favor of cros_ec_cmd_xfer_status
      
        cros_ec_typec:
         - Landed initial USB4 support in typec connector class driver for
           cros_ec
         - Role switch bugfix on disconnect, and reordering configuration
           steps
      
        cros_ec_lightbar:
         - Fix buffer outsize and result for get_lightbar_version
      
        misc:
         - Remove config MFD_CROS_EC, now that transition from MFD is complete
         - Enable KEY_LEFTMETA in new location on arm based cros-ec-keyboard
           keymap"
      
      * tag 'tag-chrome-platform-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux:
        ARM: dts: cros-ec-keyboard: Add alternate keymap for KEY_LEFTMETA
        platform/chrome: Use kobj_to_dev() instead of container_of()
        platform/chrome: cros_ec_proto: Drop cros_ec_cmd_xfer()
        platform/chrome: cros_ec_proto: Update cros_ec_cmd_xfer() call-sites
        platform/chrome: Kconfig: Remove the transitional MFD_CROS_EC config
        platform/chrome: cros_ec_lightbar: Reduce ligthbar get version command
        platform/chrome: cros_ec_trace: Add fields to command traces
        platform/chrome: cros_ec_typec: Re-order connector configuration steps
        platform/chrome: cros_ec_typec: Avoid setting usb role twice during disconnect
        platform/chrome: cros_ec_typec: Send enum values to usb_role_switch_set_role()
        platform/chrome: cros_ec_typec: USB4 support
        pwm: cros-ec: Simplify EC error handling
        platform/chrome: cros_ec_proto: Convert EC error codes to Linux error codes
        platform/input: cros_ec: Replace -ENOTSUPP with -ENOPROTOOPT
        pwm: cros-ec: Accept more error codes from cros_ec_cmd_xfer_status
        platform/chrome: cros_ec_sysfs: Report range of error codes from EC
        cros_ec_lightbar: Accept more error codes from cros_ec_cmd_xfer_status
        iio: cros_ec: Accept -EOPNOTSUPP as 'not supported' error code
      090a7d04
    • L
      Merge tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block · 4a22709e
      Linus Torvalds 提交于
      Pull arch task_work cleanups from Jens Axboe:
       "Two cleanups that don't fit other categories:
      
         - Finally get the task_work_add() cleanup done properly, so we don't
           have random 0/1/false/true/TWA_SIGNAL confusing use cases. Updates
           all callers, and also fixes up the documentation for
           task_work_add().
      
         - While working on some TIF related changes for 5.11, this
           TIF_NOTIFY_RESUME cleanup fell out of that. Remove some arch
           duplication for how that is handled"
      
      * tag 'arch-cleanup-2020-10-22' of git://git.kernel.dk/linux-block:
        task_work: cleanup notification modes
        tracehook: clear TIF_NOTIFY_RESUME in tracehook_notify_resume()
      4a22709e
    • L
      Merge tag 'arc-5.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · 0a14d764
      Linus Torvalds 提交于
      Pull ARC fix from Vineet Gupta:
       "I found a snafu in perf driver which made it into 5.9-rc4 and the fix
        should go in now than wait"
      
      * tag 'arc-5.10-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
        ARC: perf: redo the pct irq missing in device-tree handling
      0a14d764
    • L
      Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 032c7ed9
      Linus Torvalds 提交于
      Pull more arm64 updates from Will Deacon:
       "A small selection of further arm64 fixes and updates. Most of these
        are fixes that came in during the merge window, with the exception of
        the HAVE_MOVE_PMD mremap() speed-up which we discussed back in 2018
        and somehow forgot to enable upstream.
      
         - Improve performance of Spectre-v2 mitigation on Falkor CPUs (if
           you're lucky enough to have one)
      
         - Select HAVE_MOVE_PMD. This has been shown to improve mremap()
           performance, which is used heavily by the Android runtime GC, and
           it seems we forgot to enable this upstream back in 2018.
      
         - Ensure linker flags are consistent between LLVM and BFD
      
         - Fix stale comment in Spectre mitigation rework
      
         - Fix broken copyright header
      
         - Fix KASLR randomisation of the linear map
      
         - Prevent arm64-specific prctl()s from compat tasks (return -EINVAL)"
      
      Link: https://lore.kernel.org/kvmarm/20181108181201.88826-3-joelaf@google.com/
      
      * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: proton-pack: Update comment to reflect new function name
        arm64: spectre-v2: Favour CPU-specific mitigation at EL2
        arm64: link with -z norelro regardless of CONFIG_RELOCATABLE
        arm64: Fix a broken copyright header in gen_vdso_offsets.sh
        arm64: mremap speedup - Enable HAVE_MOVE_PMD
        arm64: mm: use single quantity to represent the PA to VA translation
        arm64: reject prctl(PR_PAC_RESET_KEYS) on compat tasks
      032c7ed9
  5. 23 10月, 2020 25 次提交
    • A
      gfs2: Recover statfs info in journal head · bedb0f05
      Abhi Das 提交于
      Apply the outstanding statfs changes in the journal head to the
      master statfs file. Zero out the local statfs file for good measure.
      
      Previously, statfs updates would be read in from the local statfs inode and
      synced to the master statfs inode during recovery.
      
      We now use the statfs updates in the journal head to update the master statfs
      inode instead of reading in from the local statfs inode. To preserve backward
      compatibility with kernels that can't do this, we still need to keep the
      local statfs inode up to date by writing changes to it. At some point in the
      future, we can do away with the local statfs inodes altogether and keep the
      statfs changes solely in the journal.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      bedb0f05
    • A
      gfs2: lookup local statfs inodes prior to journal recovery · 97fd734b
      Abhi Das 提交于
      We need to lookup the master statfs inode and the local statfs
      inodes earlier in the mount process (in init_journal) so journal
      recovery can use them when it attempts to recover the statfs info.
      We lookup all the local statfs inodes and store them in a linked
      list to allow a node to recover statfs info for other nodes in the
      cluster.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      97fd734b
    • B
      kvm: x86/mmu: NX largepage recovery for TDP MMU · 29cf0f50
      Ben Gardon 提交于
      When KVM maps a largepage backed region at a lower level in order to
      make it executable (i.e. NX large page shattering), it reduces the TLB
      performance of that region. In order to avoid making this degradation
      permanent, KVM must periodically reclaim shattered NX largepages by
      zapping them and allowing them to be rebuilt in the page fault handler.
      
      With this patch, the TDP MMU does not respect KVM's rate limiting on
      reclaim. It traverses the entire TDP structure every time. This will be
      addressed in a future patch.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-21-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      29cf0f50
    • B
      kvm: x86/mmu: Don't clear write flooding count for direct roots · daa5b6c1
      Ben Gardon 提交于
      Direct roots don't have a write flooding count because the guest can't
      affect that paging structure. Thus there's no need to clear the write
      flooding count on a fast CR3 switch for direct roots.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-20-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      daa5b6c1
    • B
      kvm: x86/mmu: Support MMIO in the TDP MMU · 95fb5b02
      Ben Gardon 提交于
      In order to support MMIO, KVM must be able to walk the TDP paging
      structures to find mappings for a given GFN. Support this walk for
      the TDP MMU.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
      
      v2: Thanks to Dan Carpenter and kernel test robot for finding that root
      was used uninitialized in get_mmio_spte.
      Signed-off-by: NBen Gardon <bgardon@google.com>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Message-Id: <20201014182700.2888246-19-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      95fb5b02
    • B
      kvm: x86/mmu: Support write protection for nesting in tdp MMU · 46044f72
      Ben Gardon 提交于
      To support nested virtualization, KVM will sometimes need to write
      protect pages which are part of a shadowed paging structure or are not
      writable in the shadowed paging structure. Add a function to write
      protect GFN mappings for this purpose.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-18-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      46044f72
    • B
      kvm: x86/mmu: Support disabling dirty logging for the tdp MMU · 14881998
      Ben Gardon 提交于
      Dirty logging ultimately breaks down MMU mappings to 4k granularity.
      When dirty logging is no longer needed, these granaular mappings
      represent a useless performance penalty. When dirty logging is disabled,
      search the paging structure for mappings that could be re-constituted
      into a large page mapping. Zap those mappings so that they can be
      faulted in again at a higher mapping level.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-17-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      14881998
    • B
      kvm: x86/mmu: Support dirty logging for the TDP MMU · a6a0b05d
      Ben Gardon 提交于
      Dirty logging is a key feature of the KVM MMU and must be supported by
      the TDP MMU. Add support for both the write protection and PML dirty
      logging modes.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-16-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a6a0b05d
    • B
      kvm: x86/mmu: Support changed pte notifier in tdp MMU · 1d8dd6b3
      Ben Gardon 提交于
      In order to interoperate correctly with the rest of KVM and other Linux
      subsystems, the TDP MMU must correctly handle various MMU notifiers. Add
      a hook and handle the change_pte MMU notifier.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-15-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1d8dd6b3
    • B
      kvm: x86/mmu: Add access tracking for tdp_mmu · f8e14497
      Ben Gardon 提交于
      In order to interoperate correctly with the rest of KVM and other Linux
      subsystems, the TDP MMU must correctly handle various MMU notifiers. The
      main Linux MM uses the access tracking MMU notifiers for swap and other
      features. Add hooks to handle the test/flush HVA (range) family of
      MMU notifiers.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-14-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f8e14497
    • B
      kvm: x86/mmu: Support invalidate range MMU notifier for TDP MMU · 063afacd
      Ben Gardon 提交于
      In order to interoperate correctly with the rest of KVM and other Linux
      subsystems, the TDP MMU must correctly handle various MMU notifiers. Add
      hooks to handle the invalidate range family of MMU notifiers.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-13-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      063afacd
    • B
      kvm: x86/mmu: Allocate struct kvm_mmu_pages for all pages in TDP MMU · 89c0fd49
      Ben Gardon 提交于
      Attach struct kvm_mmu_pages to every page in the TDP MMU to track
      metadata, facilitate NX reclaim, and enable inproved parallelism of MMU
      operations in future patches.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-12-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      89c0fd49
    • B
      kvm: x86/mmu: Add TDP MMU PF handler · bb18842e
      Ben Gardon 提交于
      Add functions to handle page faults in the TDP MMU. These page faults
      are currently handled in much the same way as the x86 shadow paging
      based MMU, however the ordering of some operations is slightly
      different. Future patches will add eager NX splitting, a fast page fault
      handler, and parallel page faults.
      
      Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
      machine. This series introduced no new failures.
      
      This series can be viewed in Gerrit at:
      	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
      Message-Id: <20201014182700.2888246-11-bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bb18842e
    • L
      Merge tag 'kconfig-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · f9893351
      Linus Torvalds 提交于
      Pull Kconfig updates from Masahiro Yamada:
      
       - Remove unused or useless code from qconf
      
       - Allow to edit "int", "hex", "string" options in place, and remove the
         separate edit box from qconf
      
      * tag 'kconfig-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
        kconfig: qconf: create QApplication after option checks
        kconfig: qconf: remove Y, M, N columns
        kconfig: qconf: remove ConfigView class
        kconfig: qconf: move setShowName/Range() to ConfigList from ConfigView
        kconfig: qconf: remove ConfigLineEdit class
        kconfig: qconf: allow to edit "int", "hex", "string" menus in-place
        kconfig: qconf: show data column all the time
        kconfig: qconf: move ConfigView::updateList(All) to ConfigList class
        kconfig: qconf: remove unused ConfigItem::okRename()
        kconfig: qconf: update the intro message to match to the current code
        kconfig: qconf: reformat the intro message
      f9893351
    • L
      Merge tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild · 746b25b1
      Linus Torvalds 提交于
      Pull Kbuild updates from Masahiro Yamada:
      
       - Support 'make compile_commands.json' to generate the compilation
         database more easily, avoiding stale entries
      
       - Support 'make clang-analyzer' and 'make clang-tidy' for static checks
         using clang-tidy
      
       - Preprocess scripts/modules.lds.S to allow CONFIG options in the
         module linker script
      
       - Drop cc-option tests from compiler flags supported by our minimal
         GCC/Clang versions
      
       - Use always 12-digits commit hash for CONFIG_LOCALVERSION_AUTO=y
      
       - Use sha1 build id for both BFD linker and LLD
      
       - Improve deb-pkg for reproducible builds and rootless builds
      
       - Remove stale, useless scripts/namespace.pl
      
       - Turn -Wreturn-type warning into error
      
       - Fix build error of deb-pkg when CONFIG_MODULES=n
      
       - Replace 'hostname' command with more portable 'uname -n'
      
       - Various Makefile cleanups
      
      * tag 'kbuild-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (34 commits)
        kbuild: Use uname for LINUX_COMPILE_HOST detection
        kbuild: Only add -fno-var-tracking-assignments for old GCC versions
        kbuild: remove leftover comment for filechk utility
        treewide: remove DISABLE_LTO
        kbuild: deb-pkg: clean up package name variables
        kbuild: deb-pkg: do not build linux-headers package if CONFIG_MODULES=n
        kbuild: enforce -Werror=return-type
        scripts: remove namespace.pl
        builddeb: Add support for all required debian/rules targets
        builddeb: Enable rootless builds
        builddeb: Pass -n to gzip for reproducible packages
        kbuild: split the build log of kallsyms
        kbuild: explicitly specify the build id style
        scripts/setlocalversion: make git describe output more reliable
        kbuild: remove cc-option test of -Werror=date-time
        kbuild: remove cc-option test of -fno-stack-check
        kbuild: remove cc-option test of -fno-strict-overflow
        kbuild: move CFLAGS_{KASAN,UBSAN,KCSAN} exports to relevant Makefiles
        kbuild: remove redundant CONFIG_KASAN check from scripts/Makefile.kasan
        kbuild: do not create built-in objects for external module builds
        ...
      746b25b1
    • L
      Merge tag 'modules-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux · 2b714820
      Linus Torvalds 提交于
      Pull modules updates from Jessica Yu:
       "Code cleanups: more informative error messages and statically
        initialize init_free_wq to avoid a workqueue warning"
      
      * tag 'modules-for-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
        module: statically initialize init section freeing data
        module: Add more error message for failed kernel module loading
      2b714820
    • L
      Merge tag 'vfio-v5.10-rc1' of git://github.com/awilliam/linux-vfio · fc996db9
      Linus Torvalds 提交于
      Pull VFIO updates from Alex Williamson:
      
       - New fsl-mc vfio bus driver supporting userspace drivers of objects
         within NXP's DPAA2 architecture (Diana Craciun)
      
       - Support for exposing zPCI information on s390 (Matthew Rosato)
      
       - Fixes for "detached" VFs on s390 (Matthew Rosato)
      
       - Fixes for pin-pages and dma-rw accesses (Yan Zhao)
      
       - Cleanups and optimize vconfig regen (Zenghui Yu)
      
       - Fix duplicate irq-bypass token registration (Alex Williamson)
      
      * tag 'vfio-v5.10-rc1' of git://github.com/awilliam/linux-vfio: (30 commits)
        vfio iommu type1: Fix memory leak in vfio_iommu_type1_pin_pages
        vfio/pci: Clear token on bypass registration failure
        vfio/fsl-mc: fix the return of the uninitialized variable ret
        vfio/fsl-mc: Fix the dead code in vfio_fsl_mc_set_irq_trigger
        vfio/fsl-mc: Fixed vfio-fsl-mc driver compilation on 32 bit
        MAINTAINERS: Add entry for s390 vfio-pci
        vfio-pci/zdev: Add zPCI capabilities to VFIO_DEVICE_GET_INFO
        vfio/fsl-mc: Add support for device reset
        vfio/fsl-mc: Add read/write support for fsl-mc devices
        vfio/fsl-mc: trigger an interrupt via eventfd
        vfio/fsl-mc: Add irq infrastructure for fsl-mc devices
        vfio/fsl-mc: Added lock support in preparation for interrupt handling
        vfio/fsl-mc: Allow userspace to MMAP fsl-mc device MMIO regions
        vfio/fsl-mc: Implement VFIO_DEVICE_GET_REGION_INFO ioctl call
        vfio/fsl-mc: Implement VFIO_DEVICE_GET_INFO ioctl
        vfio/fsl-mc: Scan DPRC objects on vfio-fsl-mc driver bind
        vfio: Introduce capability definitions for VFIO_DEVICE_GET_INFO
        s390/pci: track whether util_str is valid in the zpci_dev
        s390/pci: stash version in the zpci_dev
        vfio/fsl-mc: Add VFIO framework skeleton for fsl-mc devices
        ...
      fc996db9
    • L
      Merge tag 'rpmsg-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc · 60573c29
      Linus Torvalds 提交于
      Pull rpmsg updates from Bjorn Andersson:
       "This introduces rpmsg_char support for GLINK and fixes a few issues"
      
      * tag 'rpmsg-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
        rpmsg: glink: Expose rpmsg name attr for glink
        rpmsg: glink: Add support for rpmsg glink chrdev
        rpmsg: Guard against null endpoint ops in destroy
        rpmsg: glink: Use complete_all for open states
        rpmsg: virtio: fix compilation warning for virtio_rpmsg_channel description
        rpmsg: Avoid double-free in mtk_rpmsg_register_device
        rpmsg: smd: Fix a kobj leak in in qcom_smd_parse_edge()
      60573c29
    • L
      Merge tag 'rproc-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc · 1553d968
      Linus Torvalds 提交于
      Pull remoteproc updates from Bjorn Andersson:
       "This introduces support for the Mediatek MT9182 SCP and controlling
        the Cortex R5F processors found in TI K3 platforms. It clones the
        longstanding debugfs interface for controlling crash handling to
        sysfs. Lastly it solves a bug where after a warm reset of Qualcomm
        platforms the modem would crash upon first boot"
      
      * tag 'rproc-v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/andersson/remoteproc:
        remoteproc/mediatek: Remove non-standard dsb()
        remoteproc: Add recovery configuration to the sysfs interface
        remoteproc: Add coredump as part of sysfs interface
        remoteproc: Change default dump configuration to "disabled"
        remoteproc: k3-r5: Add loading support for on-chip SRAM regions
        remoteproc: k3-r5: Initialize TCM memories for ECC
        remoteproc: k3-r5: Add a remoteproc driver for R5F subsystem
        dt-bindings: remoteproc: Add bindings for R5F subsystem on TI K3 SoCs
        remoteproc/mediatek: Add support for mt8192 SCP
        remoteproc: Fixup coredump debugfs disable request
        remoteproc: qcom_q6v5: Assign mpss region to Q6 before MBA boot
        remoteproc/mediatek: fix null pointer dereference on null scp pointer
        remoteproc: stm32: Fix pointer assignement
        remoteproc: scp: add COMPILE_TEST dependency
      1553d968
    • L
      Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux · 3fec0eaa
      Linus Torvalds 提交于
      Pull clk updates from Stephen Boyd:
       "This contains no changes to the core framework. It is a collection of
        various clk driver updates.
      
        The biggest driver updates in terms of lines of code is the Allwinner
        driver, closely followed by the Qualcomm and Mediatek drivers. All of
        those hit high because we add so many lines of clk data. Coming in
        fourth place is i.MX which also adds a bunch of clk data. This
        accounts for the new driver additions this time around.
      
        Otherwise the patches are lots of little cleanups and fixes for
        various clk drivers that have baked in linux-next for a while. I
        suppose one highlight or theme is that more clk drivers are being
        updated to work as modules, which is interesting to see such critical
        SoC infrastructure work as a loadable module.
      
        New Drivers:
         - Support qcom SM8150/SM8250 video and display clks
         - Support Mediatek MT8167 clks
         - Add clock for CRC block found on vf610 SoCs
         - Add support for the Renesas R-Car V3U (R8A779A0) SoC
         - Add support for the VSP for Resizing clock on Renesas RZ/G1H
         - Support Allwinner A100 SoC clks
      
        Removed Drivers:
         - Remove i.MX21 clock driver, as i.MX21 platform support is being
           dropped
      
        Updates:
         - Change how qcom's display port clks work
         - Small non-critical fixes for TI clk driver
         - Remove various unused variables in clk drivers
         - Allow Rockchip clk driver to be a module
         - Remove most __clk_lookup() calls in Samsung drivers (yay!)
         - Support building i.MX ARMv8 platforms clock driver as module
         - Some kerneldoc fixes here and there
         - A couple of minor i.MX clk data corrections
         - Update audio clock inverter and fdiv2 flag on Amlogic g12
         - Make amlogic clk drivers configurable in Kconfig
         - Fix Renesas VSP clock names to match corrected hardware
           documentation
         - Sigma-delta modulation on Allwinner R40
         - Various fixes for at91 clk driver
         - Use semicolons instead of commas in some places
         - Mark some variables const so they can move to RO memory"
      
      * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (102 commits)
        clk: imx8mq: Fix usdhc parents order
        clk: qcom: gdsc: Keep RETAIN_FF bit set if gdsc is already on
        clk: Restrict CLK_HSDK to ARC_SOC_HSDK
        clk: at91: sam9x60: support only two programmable clocks
        clk: ingenic: Respect CLK_SET_RATE_PARENT in .round_rate
        clk: ingenic: Don't tag custom clocks with CLK_SET_RATE_PARENT
        clk: ingenic: Don't use CLK_SET_RATE_GATE for PLL
        clk: ingenic: Use readl_poll_timeout instead of custom loop
        clk: ingenic: Use to_clk_info() macro for all clocks
        clk: bcm2835: add missing release if devm_clk_hw_register fails
        clk: at91: clk-sam9x60-pll: remove unused variable
        clk: at91: clk-main: update key before writing AT91_CKGR_MOR
        clk: at91: remove the checking of parent_name
        clk: clk-prima2: fix return value check in prima2_clk_init()
        clk: mmp2: Fix the display clock divider base
        clk: pxa: Constify static struct clk_ops
        clk: baikal-t1: Mark Ethernet PLL as critical
        clk: qoriq: modify MAX_PLL_DIV to 32
        clk: axi-clkgen: Set power bits for fractional mode
        clk: axi-clkgen: Add support for fractional dividers
        ...
      3fec0eaa
    • L
      Merge tag 'pwm/for-5.10-rc1' of... · ceae608a
      Linus Torvalds 提交于
      Merge tag 'pwm/for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
      
      Pull pwm updates from Thierry Reding:
       "This release cycle's updates are mostly cleanup and some minor fixes"
      
      * tag 'pwm/for-5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
        dt-bindings: pwm: renesas,pwm-rcar: Add r8a7742 support
        dt-bindings: pwm: renesas,tpu-pwm: Document r8a7742 support
        pwm: Allow store 64-bit duty cycle from sysfs interface
        pwm: img: Fix null pointer access in probe
        pwm: pca9685: Disable unused alternative addresses
        pwm: pca9685: Use BIT() macro instead of shift
        pwm: pca9685: Make comments more consistent
        pwm: sun4i: Simplify with dev_err_probe()
        pwm: sprd: Simplify with dev_err_probe()
        pwm: sifive: Simplify with dev_err_probe()
        pwm: rockchip: Simplify with dev_err_probe()
        pwm: jz4740: Simplify with dev_err_probe()
        pwm: bcm2835: Simplify with dev_err_probe()
        pwm: Convert to use DEFINE_SEQ_ATTRIBUTE macro
        pwm: rockchip: Keep enabled PWMs running while probing
        dt-bindings: pwm: renesas,pwm-rcar: Add r8a774e1 support
      ceae608a
    • L
      Merge tag 'pci-v5.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 00937f36
      Linus Torvalds 提交于
      Pull PCI updates from Bjorn Helgaas:
       "Enumeration:
         - Print IRQ number used by PCIe Link Bandwidth Notification (Dongdong
           Liu)
         - Add schedule point in pci_read_config() to reduce max latency
           (Jiang Biao)
         - Add Kconfig options for MPS/MRRS strategy (Jim Quinlan)
      
        Resource management:
         - Fix pci_iounmap() memory leak when !CONFIG_GENERIC_IOMAP (Lorenzo
           Pieralisi)
      
        PCIe native device hotplug:
         - Reduce noisiness on hot removal (Lukas Wunner)
      
        Power management:
         - Revert "PCI/PM: Apply D2 delay as milliseconds, not microseconds"
           that was done on the basis of spec typo (Bjorn Helgaas)
         - Rename pci_dev.d3_delay to d3hot_delay to remove D3hot/D3cold
           ambiguity (Krzysztof Wilczyński)
         - Remove unused pcibios_pm_ops (Vaibhav Gupta)
      
        IOMMU:
         - Enable Translation Blocking for external devices to harden against
           DMA attacks (Rajat Jain)
      
        Error handling:
         - Add an ACPI APEI notifier chain for vendor CPER records to enable
           device-specific error handling (Shiju Jose)
      
        ASPM:
         - Remove struct aspm_register_info to simplify code (Saheed O.
           Bolarinwa)
      
        Amlogic Meson PCIe controller driver:
         - Build as module by default (Kevin Hilman)
      
        Ampere Altra PCIe controller driver:
         - Add MCFG quirk to work around non-standard ECAM implementation
           (Tuan Phan)
      
        Broadcom iProc PCIe controller driver:
         - Set affinity mask on MSI interrupts (Mark Tomlinson)
      
        Broadcom STB PCIe controller driver:
         - Make PCIE_BRCMSTB depend on ARCH_BRCMSTB (Jim Quinlan)
         - Add DT bindings for more Brcmstb chips (Jim Quinlan)
         - Add bcm7278 register info (Jim Quinlan)
         - Add bcm7278 PERST# support (Jim Quinlan)
         - Add suspend and resume pm_ops (Jim Quinlan)
         - Add control of rescal reset (Jim Quinlan)
         - Set additional internal memory DMA viewport sizes (Jim Quinlan)
         - Accommodate MSI for older chips (Jim Quinlan)
         - Set bus max burst size by chip type (Jim Quinlan)
         - Add support for bcm7211, bcm7216, bcm7445, bcm7278 (Jim Quinlan)
      
        Freescale i.MX6 PCIe controller driver:
         - Use dev_err_probe() to reduce redundant messages (Anson Huang)
      
        Freescale Layerscape PCIe controller driver:
         - Enforce 4K DMA buffer alignment in endpoint test (Hou Zhiqiang)
         - Add DT compatible strings for ls1088a, ls2088a (Xiaowei Bao)
         - Add endpoint support for ls1088a, ls2088a (Xiaowei Bao)
         - Add endpoint test support for lS1088a (Xiaowei Bao)
         - Add MSI-X support for ls1088a (Xiaowei Bao)
      
        HiSilicon HIP PCIe controller driver:
         - Handle HIP-specific errors via ACPI APEI (Yicong Yang)
      
        HiSilicon Kirin PCIe controller driver:
         - Return -EPROBE_DEFER if the GPIO isn't ready (Bean Huo)
      
        Intel VMD host bridge driver:
         - Factor out physical offset, bus offset, IRQ domain, IRQ allocation
           (Jon Derrick)
         - Use generic PCI PM correctly (Jon Derrick)
      
        Marvell Aardvark PCIe controller driver:
         - Fix compilation on s390 (Pali Rohár)
         - Implement driver 'remove' function and allow to build it as module
           (Pali Rohár)
         - Move PCIe reset card code to advk_pcie_train_link() (Pali Rohár)
         - Convert mvebu a3700 internal SMCC firmware return codes to errno
           (Pali Rohár)
         - Fix initialization with old Marvell's Arm Trusted Firmware (Pali
           Rohár)
      
        Microsoft Hyper-V host bridge driver:
         - Fix hibernation in case interrupts are not re-created (Dexuan Cui)
      
        NVIDIA Tegra PCIe controller driver:
         - Stop checking return value of debugfs_create() functions (Greg
           Kroah-Hartman)
         - Convert to use DEFINE_SEQ_ATTRIBUTE macro (Liu Shixin)
      
        Qualcomm PCIe controller driver:
         - Reset PCIe to work around Qsdk U-Boot issue (Ansuel Smith)
      
        Renesas R-Car PCIe controller driver:
         - Add DT documentation for r8a774a1, r8a774b1, r8a774e1 endpoints
           (Lad Prabhakar)
         - Add RZ/G2M, RZ/G2N, RZ/G2H IDs to endpoint test (Lad Prabhakar)
         - Add DT support for r8a7742 (Lad Prabhakar)
      
        Socionext UniPhier Pro5 controller driver:
         - Add DT descriptions of iATU register (host and endpoint) (Kunihiko
           Hayashi)
      
        Synopsys DesignWare PCIe controller driver:
         - Add link up check in dw_child_pcie_ops.map_bus() (racy, but seems
           unavoidable) (Hou Zhiqiang)
         - Fix endpoint Header Type check so multi-function devices work (Hou
           Zhiqiang)
         - Skip PCIE_MSI_INTR0* programming if MSI is disabled (Jisheng Zhang)
         - Stop leaking MSI page in suspend/resume (Jisheng Zhang)
         - Add common iATU register support instead of keystone-specific code
           (Kunihiko Hayashi)
         - Major config space access and other cleanups in dwc core and
           drivers that use it (al, exynos, histb, imx6, intel-gw, keystone,
           kirin, meson, qcom, tegra) (Rob Herring)
         - Add multiple PFs support for endpoint (Xiaowei Bao)
         - Add MSI-X doorbell mode in endpoint mode (Xiaowei Bao)
      
        Miscellaneous:
         - Use fallthrough pseudo-keyword (Gustavo A. R. Silva)
         - Fix "0 used as NULL pointer" warnings (Gustavo Pimentel)
         - Fix "cast truncates bits from constant value" warnings (Gustavo
           Pimentel)
         - Remove redundant zeroing for sg_init_table() (Julia Lawall)
         - Use scnprintf(), not snprintf(), in sysfs "show" functions
           (Krzysztof Wilczyński)
         - Remove unused assignments (Krzysztof Wilczyński)
         - Fix "0 used as NULL pointer" warning (Krzysztof Wilczyński)
         - Simplify bool comparisons (Krzysztof Wilczyński)
         - Use for_each_child_of_node() and for_each_node_by_name() (Qinglang
           Miao)
         - Simplify return expressions (Qinglang Miao)"
      
      * tag 'pci-v5.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (147 commits)
        PCI: vmd: Update VMD PM to correctly use generic PCI PM
        PCI: vmd: Create IRQ allocation helper
        PCI: vmd: Create IRQ Domain configuration helper
        PCI: vmd: Create bus offset configuration helper
        PCI: vmd: Create physical offset helper
        PCI: v3-semi: Remove unneeded break
        PCI: dwc: Add link up check in dw_child_pcie_ops.map_bus()
        PCI/ASPM: Remove struct pcie_link_state.l1ss
        PCI/ASPM: Remove struct aspm_register_info.l1ss_cap
        PCI/ASPM: Pass L1SS Capabilities value, not struct aspm_register_info
        PCI/ASPM: Remove struct aspm_register_info.l1ss_ctl1
        PCI/ASPM: Remove struct aspm_register_info.l1ss_ctl2 (unused)
        PCI/ASPM: Remove struct aspm_register_info.l1ss_cap_ptr
        PCI/ASPM: Remove struct aspm_register_info.latency_encoding
        PCI/ASPM: Remove struct aspm_register_info.enabled
        PCI/ASPM: Remove struct aspm_register_info.support
        PCI/ASPM: Use 'parent' and 'child' for readability
        PCI/ASPM: Move LTR path check to where it's used
        PCI/ASPM: Move pci_clear_and_set_dword() earlier
        PCI: dwc: Fix MSI page leakage in suspend/resume
        ...
      00937f36
    • N
      tcp: fix to update snd_wl1 in bulk receiver fast path · 18ded910
      Neal Cardwell 提交于
      In the header prediction fast path for a bulk data receiver, if no
      data is newly acknowledged then we do not call tcp_ack() and do not
      call tcp_ack_update_window(). This means that a bulk receiver that
      receives large amounts of data can have the incoming sequence numbers
      wrap, so that the check in tcp_may_update_window fails:
         after(ack_seq, tp->snd_wl1)
      
      If the incoming receive windows are zero in this state, and then the
      connection that was a bulk data receiver later wants to send data,
      that connection can find itself persistently rejecting the window
      updates in incoming ACKs. This means the connection can persistently
      fail to discover that the receive window has opened, which in turn
      means that the connection is unable to send anything, and the
      connection's sending process can get permanently "stuck".
      
      The fix is to update snd_wl1 in the header prediction fast path for a
      bulk data receiver, so that it keeps up and does not see wrapping
      problems.
      
      This fix is based on a very nice and thorough analysis and diagnosis
      by Apollon Oikonomopoulos (see link below).
      
      This is a stable candidate but there is no Fixes tag here since the
      bug predates current git history. Just for fun: looks like the bug
      dates back to when header prediction was added in Linux v2.1.8 in Nov
      1996. In that version tcp_rcv_established() was added, and the code
      only updates snd_wl1 in tcp_ack(), and in the new "Bulk data transfer:
      receiver" code path it does not call tcp_ack(). This fix seems to
      apply cleanly at least as far back as v3.2.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Reported-by: NApollon Oikonomopoulos <apoikos@dmesg.gr>
      Tested-by: NApollon Oikonomopoulos <apoikos@dmesg.gr>
      Link: https://www.spinics.net/lists/netdev/msg692430.htmlAcked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20201022143331.1887495-1-ncardwell.kernel@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      18ded910
    • K
      net: Properly typecast int values to set sk_max_pacing_rate · 700465fd
      Ke Li 提交于
      In setsockopt(SO_MAX_PACING_RATE) on 64bit systems, sk_max_pacing_rate,
      after extended from 'u32' to 'unsigned long', takes unintentionally
      hiked value whenever assigned from an 'int' value with MSB=1, due to
      binary sign extension in promoting s32 to u64, e.g. 0x80000000 becomes
      0xFFFFFFFF80000000.
      
      Thus inflated sk_max_pacing_rate causes subsequent getsockopt to return
      ~0U unexpectedly. It may also result in increased pacing rate.
      
      Fix by explicitly casting the 'int' value to 'unsigned int' before
      assigning it to sk_max_pacing_rate, for zero extension to happen.
      
      Fixes: 76a9ebe8 ("net: extend sk_pacing_rate to unsigned long")
      Signed-off-by: NJi Li <jli@akamai.com>
      Signed-off-by: NKe Li <keli@akamai.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20201022064146.79873-1-keli@akamai.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      700465fd
    • J
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 594850ca
      Jakub Kicinski 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      1) Update debugging in IPVS tcp protocol handler to make it easier
         to understand, from longguang.yue
      
      2) Update TCP tracker to deal with keepalive packet after
         re-registration, from Franceso Ruggeri.
      
      3) Missing IP6SKB_FRAGMENTED from netfilter fragment reassembly,
         from Georg Kohmann.
      
      4) Fix bogus packet drop in ebtables nat extensions, from
         Thimothee Cocault.
      
      5) Fix typo in flowtable documentation.
      
      6) Reset skb timestamp in nft_fwd_netdev.
      ====================
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      594850ca