1. 21 4月, 2018 1 次提交
  2. 19 4月, 2018 1 次提交
    • Y
      tools/bpf: fix test_sock and test_sock_addr.sh failure · 0a0a7e00
      Yonghong Song 提交于
      The bpf selftests test_sock and test_sock_addr.sh failed
      in my test machine. The failure looks like:
          $ ./test_sock
          Test case: bind4 load with invalid access: src_ip6 .. [PASS]
          Test case: bind4 load with invalid access: mark .. [PASS]
          Test case: bind6 load with invalid access: src_ip4 .. [PASS]
          Test case: sock_create load with invalid access: src_port .. [PASS]
          Test case: sock_create load w/o expected_attach_type (compat mode) .. [FAIL]
          Test case: sock_create load w/ expected_attach_type .. [FAIL]
          Test case: attach type mismatch bind4 vs bind6 .. [FAIL]
          ...
          Summary: 4 PASSED, 12 FAILED
          $ ./test_sock_addr.sh
          Wait for testing IPv4/IPv6 to become available .....
          ERROR: Timeout waiting for test IP to become available.
      
      In test_sock, bpf program loads failed due to hitting memlock limits.
      In test_sock_addr.sh, my test machine is a ipv6 only test box and using
      "ping" without specifying address family for an ipv6 address does not work.
      
      This patch fixed the issue by including header bpf_rlimit.h in test_sock.c
      and test_sock_addr.c, and specifying address family for ping command.
      
      Cc: Andrey Ignatov <rdna@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      0a0a7e00
  3. 11 4月, 2018 2 次提交
    • Y
      bpf/tracing: fix a deadlock in perf_event_detach_bpf_prog · 3a38bb98
      Yonghong Song 提交于
      syzbot reported a possible deadlock in perf_event_detach_bpf_prog.
      The error details:
        ======================================================
        WARNING: possible circular locking dependency detected
        4.16.0-rc7+ #3 Not tainted
        ------------------------------------------------------
        syz-executor7/24531 is trying to acquire lock:
         (bpf_event_mutex){+.+.}, at: [<000000008a849b07>] perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854
      
        but task is already holding lock:
         (&mm->mmap_sem){++++}, at: [<0000000038768f87>] vm_mmap_pgoff+0x198/0x280 mm/util.c:353
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (&mm->mmap_sem){++++}:
             __might_fault+0x13a/0x1d0 mm/memory.c:4571
             _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
             copy_to_user include/linux/uaccess.h:155 [inline]
             bpf_prog_array_copy_info+0xf2/0x1c0 kernel/bpf/core.c:1694
             perf_event_query_prog_array+0x1c7/0x2c0 kernel/trace/bpf_trace.c:891
             _perf_ioctl kernel/events/core.c:4750 [inline]
             perf_ioctl+0x3e1/0x1480 kernel/events/core.c:4770
             vfs_ioctl fs/ioctl.c:46 [inline]
             do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
             SYSC_ioctl fs/ioctl.c:701 [inline]
             SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
             do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
        -> #0 (bpf_event_mutex){+.+.}:
             lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
             __mutex_lock_common kernel/locking/mutex.c:756 [inline]
             __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
             mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
             perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854
             perf_event_free_bpf_prog kernel/events/core.c:8147 [inline]
             _free_event+0xbdb/0x10f0 kernel/events/core.c:4116
             put_event+0x24/0x30 kernel/events/core.c:4204
             perf_mmap_close+0x60d/0x1010 kernel/events/core.c:5172
             remove_vma+0xb4/0x1b0 mm/mmap.c:172
             remove_vma_list mm/mmap.c:2490 [inline]
             do_munmap+0x82a/0xdf0 mm/mmap.c:2731
             mmap_region+0x59e/0x15a0 mm/mmap.c:1646
             do_mmap+0x6c0/0xe00 mm/mmap.c:1483
             do_mmap_pgoff include/linux/mm.h:2223 [inline]
             vm_mmap_pgoff+0x1de/0x280 mm/util.c:355
             SYSC_mmap_pgoff mm/mmap.c:1533 [inline]
             SyS_mmap_pgoff+0x462/0x5f0 mm/mmap.c:1491
             SYSC_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
             SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:91
             do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
        other info that might help us debug this:
      
         Possible unsafe locking scenario:
      
               CPU0                    CPU1
               ----                    ----
          lock(&mm->mmap_sem);
                                       lock(bpf_event_mutex);
                                       lock(&mm->mmap_sem);
          lock(bpf_event_mutex);
      
         *** DEADLOCK ***
        ======================================================
      
      The bug is introduced by Commit f371b304 ("bpf/tracing: allow
      user space to query prog array on the same tp") where copy_to_user,
      which requires mm->mmap_sem, is called inside bpf_event_mutex lock.
      At the same time, during perf_event file descriptor close,
      mm->mmap_sem is held first and then subsequent
      perf_event_detach_bpf_prog needs bpf_event_mutex lock.
      Such a senario caused a deadlock.
      
      As suggested by Daniel, moving copy_to_user out of the
      bpf_event_mutex lock should fix the problem.
      
      Fixes: f371b304 ("bpf/tracing: allow user space to query prog array on the same tp")
      Reported-by: syzbot+dc5ca0e4c9bfafaf2bae@syzkaller.appspotmail.com
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      3a38bb98
    • A
  4. 10 4月, 2018 13 次提交
    • S
      ip_gre: clear feature flags when incompatible o_flags are set · 1cc5954f
      Sabrina Dubroca 提交于
      Commit dd9d598c ("ip_gre: add the support for i/o_flags update via
      netlink") added the ability to change o_flags, but missed that the
      GSO/LLTX features are disabled by default, and only enabled some gre
      features are unused. Thus we also need to disable the GSO/LLTX features
      on the device when the TUNNEL_SEQ or TUNNEL_CSUM flags are set.
      
      These two examples should result in the same features being set:
      
          ip link add gre_none type gre local 192.168.0.10 remote 192.168.0.20 ttl 255 key 0
      
          ip link set gre_none type gre seq
          ip link add gre_seq type gre local 192.168.0.10 remote 192.168.0.20 ttl 255 key 1 seq
      
      Fixes: dd9d598c ("ip_gre: add the support for i/o_flags update via netlink")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cc5954f
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c18bb396
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) The sockmap code has to free socket memory on close if there is
          corked data, from John Fastabend.
      
       2) Tunnel names coming from userspace need to be length validated. From
          Eric Dumazet.
      
       3) arp_filter() has to take VRFs properly into account, from Miguel
          Fadon Perlines.
      
       4) Fix oops in error path of tcf_bpf_init(), from Davide Caratti.
      
       5) Missing idr_remove() in u32_delete_key(), from Cong Wang.
      
       6) More syzbot stuff. Several use of uninitialized value fixes all
          over, from Eric Dumazet.
      
       7) Do not leak kernel memory to userspace in sctp, also from Eric
          Dumazet.
      
       8) Discard frames from unused ports in DSA, from Andrew Lunn.
      
       9) Fix DMA mapping and reset/failover problems in ibmvnic, from Thomas
          Falcon.
      
      10) Do not access dp83640 PHY registers prematurely after reset, from
          Esben Haabendal.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (46 commits)
        vhost-net: set packet weight of tx polling to 2 * vq size
        net: thunderx: rework mac addresses list to u64 array
        inetpeer: fix uninit-value in inet_getpeer
        dp83640: Ensure against premature access to PHY registers after reset
        devlink: convert occ_get op to separate registration
        ARM: dts: ls1021a: Specify TBIPA register address
        net/fsl_pq_mdio: Allow explicit speficition of TBIPA address
        ibmvnic: Do not reset CRQ for Mobility driver resets
        ibmvnic: Fix failover case for non-redundant configuration
        ibmvnic: Fix reset scheduler error handling
        ibmvnic: Zero used TX descriptor counter on reset
        ibmvnic: Fix DMA mapping mistakes
        tipc: use the right skb in tipc_sk_fill_sock_diag()
        sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6
        net: dsa: Discard frames from unused ports
        sctp: do not leak kernel memory to user space
        soreuseport: initialise timewait reuseport field
        ipv4: fix uninit-value in ip_route_output_key_hash_rcu()
        dccp: initialize ireq->ir_mark
        net: fix uninit-value in __hw_addr_add_ex()
        ...
      c18bb396
    • L
      Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · fd3b36d2
      Linus Torvalds 提交于
      Pull vfs namei updates from Al Viro:
      
       - make lookup_one_len() safe with parent locked only shared(incoming
         afs series wants that)
      
       - fix of getname_kernel() regression from 2015 (-stable fodder, that
         one).
      
      * 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        getname_kernel() needs to make sure that ->name != ->iname in long case
        make lookup_one_len() safe to use with directory locked shared
        new helper: __lookup_slow()
        merge common parts of lookup_one_len{,_unlocked} into common helper
      fd3b36d2
    • L
      Merge tag 'for-linus-4.17-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux · 8ea4a5d8
      Linus Torvalds 提交于
      Pull orangefs updates from Mike Marshall:
       "Fixes and cleanups:
      
         - Documentation cleanups
      
         - removal of unused code
      
         - make some structs static
      
         - implement Orangefs vm_operations fault callout
      
         - eliminate two single-use functions and put their cleaned up code in
           line.
      
         - replace a vmalloc/memset instance with vzalloc
      
         - fix a race condition bug in wait code"
      
      * tag 'for-linus-4.17-ofs' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
        Orangefs: documentation updates
        orangefs: document package install and xfstests procedure
        orangefs: remove unused code
        orangefs: make several *_operations structs static
        orangefs: implement vm_ops->fault
        orangefs: open code short single-use functions
        orangefs: replace vmalloc and memset with vzalloc
        orangefs: bug fix for a race condition when getting a slot
      8ea4a5d8
    • L
      Merge tag 'pstore-v4.17-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 190f2ace
      Linus Torvalds 提交于
      Pull pstore fix from Kees Cook:
       "Fix another compression Kconfig combination missed in testing (Tobias
        Regnery)"
      
      * tag 'pstore-v4.17-rc1-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        pstore: fix crypto dependencies without compression
      190f2ace
    • S
      selinux: fix missing dput() before selinuxfs unmount · fd40ffc7
      Stephen Smalley 提交于
      Commit 0619f0f5 ("selinux: wrap selinuxfs state") triggers a BUG
      when SELinux is runtime-disabled (i.e. systemd or equivalent disables
      SELinux before initial policy load via /sys/fs/selinux/disable based on
      /etc/selinux/config SELINUX=disabled).
      
      This does not manifest if SELinux is disabled via kernel command line
      argument or if SELinux is enabled (permissive or enforcing).
      
      Before:
        SELinux:  Disabled at runtime.
        BUG: Dentry 000000006d77e5c7{i=17,n=null}  still in use (1) [unmount of selinuxfs selinuxfs]
      
      After:
        SELinux:  Disabled at runtime.
      
      Fixes: 0619f0f5 ("selinux: wrap selinuxfs state")
      Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd40ffc7
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · d8312a3f
      Linus Torvalds 提交于
      Pull kvm updates from Paolo Bonzini:
       "ARM:
         - VHE optimizations
      
         - EL2 address space randomization
      
         - speculative execution mitigations ("variant 3a", aka execution past
           invalid privilege register access)
      
         - bugfixes and cleanups
      
        PPC:
         - improvements for the radix page fault handler for HV KVM on POWER9
      
        s390:
         - more kvm stat counters
      
         - virtio gpu plumbing
      
         - documentation
      
         - facilities improvements
      
        x86:
         - support for VMware magic I/O port and pseudo-PMCs
      
         - AMD pause loop exiting
      
         - support for AMD core performance extensions
      
         - support for synchronous register access
      
         - expose nVMX capabilities to userspace
      
         - support for Hyper-V signaling via eventfd
      
         - use Enlightened VMCS when running on Hyper-V
      
         - allow userspace to disable MWAIT/HLT/PAUSE vmexits
      
         - usual roundup of optimizations and nested virtualization bugfixes
      
        Generic:
         - API selftest infrastructure (though the only tests are for x86 as
           of now)"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (174 commits)
        kvm: x86: fix a prototype warning
        kvm: selftests: add sync_regs_test
        kvm: selftests: add API testing infrastructure
        kvm: x86: fix a compile warning
        KVM: X86: Add Force Emulation Prefix for "emulate the next instruction"
        KVM: X86: Introduce handle_ud()
        KVM: vmx: unify adjacent #ifdefs
        x86: kvm: hide the unused 'cpu' variable
        KVM: VMX: remove bogus WARN_ON in handle_ept_misconfig
        Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown"
        kvm: Add emulation for movups/movupd
        KVM: VMX: raise internal error for exception during invalid protected mode state
        KVM: nVMX: Optimization: Dont set KVM_REQ_EVENT when VMExit with nested_run_pending
        KVM: nVMX: Require immediate-exit when event reinjected to L2 and L1 event pending
        KVM: x86: Fix misleading comments on handling pending exceptions
        KVM: x86: Rename interrupt.pending to interrupt.injected
        KVM: VMX: No need to clear pending NMI/interrupt on inject realmode interrupt
        x86/kvm: use Enlightened VMCS when running on Hyper-V
        x86/hyper-v: detect nested features
        x86/hyper-v: define struct hv_enlightened_vmcs and clean field bits
        ...
      d8312a3f
    • L
      Fix subtle macro variable shadowing in min_not_zero() · e9092d0d
      Linus Torvalds 提交于
      Commit 3c8ba0d6 ("kernel.h: Retain constant expression output for
      max()/min()") rewrote our min/max macros to be very clever, but in the
      meantime resurrected a variable name shadow issue that we had had
      previously fixed in commit 589a9785 ("min/max: remove sparse
      warnings when they're nested").
      
      That commit talks about the sparse warnings that this shadowing causes,
      which we ignored as just a minor annoyance.  But it turns out that the
      sparse warning is the least of our problems.  We actually have a real
      bug due to the shadowing through the interaction with "min_not_zero()",
      which ends up doing
      
         min(__x, __y)
      
      internally, and then the new declaration of "__x" and "__y" as new
      variables in __cmp_once() results in a complete mess of an expression,
      and "min_not_zero()" doesn't work at all.
      
      For some odd reason, this only ever caused (reported) problems on s390,
      even though it is a generic issue and most of the (obviously successful)
      testing of the problematic commit had happened on other architectures.
      
      Quoting Sebastian Ott:
       "What happened is that the bio build by the partition detection code
        was attempted to be split by the block layer because the block queue
        had a max_sector setting of 0. blk_queue_max_hw_sectors uses
        min_not_zero."
      
      So re-introduce the use of __UNIQUE_ID() to make sure that the min/max
      macros do not have these kinds of clashes.
      
      [ That said, __UNIQUE_ID() itself has several issues that make it less
        than wonderful.
      
        In particular, the "uniqueness" has a fallback on the line number,
        which means that it's not actually unique in more complex cases if you
        don't build with gcc or clang (which have working unique counters that
        aren't tied to line numbers).
      
        That historical broken fallback also means that we have that pointless
        "prefix" argument that doesn't actually make much sense _except_ for
        the known-broken case. Oh well. ]
      
      Fixes: 3c8ba0d6 ("kernel.h: Retain constant expression output for max()/min()")
      Reported-and-tested-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9092d0d
    • L
      Merge branch 'for-linus-sa1100' of git://git.armlinux.org.uk/~rmk/linux-arm · 7886e8aa
      Linus Torvalds 提交于
      Pull ARM SA1100 updates from Russell King:
       "We have support for arbitary MMIO registers providing platform GPIOs,
        which allows us to abstract some of the SA11x0 CF support.
      
        This set of updates makes that change"
      
      * 'for-linus-sa1100' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: sa1100/simpad: switch simpad CF to use gpiod APIs
        ARM: sa1100/shannon: convert to generic CF sockets
        ARM: sa1100/nanoengine: convert to generic CF sockets
        ARM: sa1100/h3xxx: switch h3xxx PCMCIA to use gpiod APIs
        ARM: sa1100/cerf: convert to generic CF sockets
        ARM: sa1100/assabet: convert to generic CF sockets
        ARM: sa1100: provide infrastructure to support generic CF sockets
        pcmcia: sa1100: provide generic CF support
      7886e8aa
    • L
      Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm · 4a1e0052
      Linus Torvalds 提交于
      Pull ARM updates from Russell King:
       "A number of core ARM changes:
      
         - Refactoring linker script by Nicolas Pitre
      
         - Enable source fortification
      
         - Add support for Cortex R8"
      
      * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: decompressor: fix warning introduced in fortify patch
        ARM: 8751/1: Add support for Cortex-R8 processor
        ARM: 8749/1: Kconfig: Add ARCH_HAS_FORTIFY_SOURCE
        ARM: simplify and fix linker script for TCM
        ARM: linker script: factor out TCM bits
        ARM: linker script: factor out vectors and stubs
        ARM: linker script: factor out unwinding table sections
        ARM: linker script: factor out stuff for the .text section
        ARM: linker script: factor out stuff for the DISCARD section
        ARM: linker script: factor out some common definitions between XIP and non-XIP
      4a1e0052
    • L
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · 2025fef0
      Linus Torvalds 提交于
      Pull m68knommu update from Greg Ungerer:
       "Only a single fix to set the DMA masks in the ColdFire FEC platform
        data structure.
      
        This stops the warning from dma-mapping.h at boot time"
      
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
        m68k: set dma and coherent masks for platform FEC ethernets
      2025fef0
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha · 5148408a
      Linus Torvalds 提交于
      Pull alpha updates from Matt Turner:
       "A few small changes for alpha"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
        alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering
        alpha: Implement CPU vulnerabilities sysfs functions.
        alpha: rtc: stop validating rtc_time in .read_time
        alpha: rtc: remove unused set_mmss ops
      5148408a
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux · becdce1c
      Linus Torvalds 提交于
      Pull s390 updates from Martin Schwidefsky:
      
       - Improvements for the spectre defense:
          * The spectre related code is consolidated to a single file
            nospec-branch.c
          * Automatic enable/disable for the spectre v2 defenses (expoline vs.
            nobp)
          * Syslog messages for specve v2 are added
          * Enable CONFIG_GENERIC_CPU_VULNERABILITIES and define the attribute
            functions for spectre v1 and v2
      
       - Add helper macros for assembler alternatives and use them to shorten
         the code in entry.S.
      
       - Add support for persistent configuration data via the SCLP Store Data
         interface. The H/W interface requires a page table that uses 4K pages
         only, the code to setup such an address space is added as well.
      
       - Enable virtio GPU emulation in QEMU. To do this the depends
         statements for a few common Kconfig options are modified.
      
       - Add support for format-3 channel path descriptors and add a binary
         sysfs interface to export the associated utility strings.
      
       - Add a sysfs attribute to control the IFCC handling in case of
         constant channel errors.
      
       - The vfio-ccw changes from Cornelia.
      
       - Bug fixes and cleanups.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (40 commits)
        s390/kvm: improve stack frame constants in entry.S
        s390/lpp: use assembler alternatives for the LPP instruction
        s390/entry.S: use assembler alternatives
        s390: add assembler macros for CPU alternatives
        s390: add sysfs attributes for spectre
        s390: report spectre mitigation via syslog
        s390: add automatic detection of the spectre defense
        s390: move nobp parameter functions to nospec-branch.c
        s390/cio: add util_string sysfs attribute
        s390/chsc: query utility strings via fmt3 channel path descriptor
        s390/cio: rename struct channel_path_desc
        s390/cio: fix unbind of io_subchannel_driver
        s390/qdio: split up CCQ handling for EQBS / SQBS
        s390/qdio: don't retry EQBS after CCQ 96
        s390/qdio: restrict buffer merging to eligible devices
        s390/qdio: don't merge ERROR output buffers
        s390/qdio: simplify math in get_*_buffer_frontier()
        s390/decompressor: trim uncompressed image head during the build
        s390/crypto: Fix kernel crash on aes_s390 module remove.
        s390/defkeymap: fix global init to zero
        ...
      becdce1c
  5. 09 4月, 2018 18 次提交
    • H
      vhost-net: set packet weight of tx polling to 2 * vq size · a2ac9990
      haibinzhang(张海斌) 提交于
      handle_tx will delay rx for tens or even hundreds of milliseconds when tx busy
      polling udp packets with small length(e.g. 1byte udp payload), because setting
      VHOST_NET_WEIGHT takes into account only sent-bytes but no single packet length.
      
      Ping-Latencies shown below were tested between two Virtual Machines using
      netperf (UDP_STREAM, len=1), and then another machine pinged the client:
      
      vq size=256
      Packet-Weight   Ping-Latencies(millisecond)
                         min      avg       max
      Origin           3.319   18.489    57.303
      64               1.643    2.021     2.552
      128              1.825    2.600     3.224
      256              1.997    2.710     4.295
      512              1.860    3.171     4.631
      1024             2.002    4.173     9.056
      2048             2.257    5.650     9.688
      4096             2.093    8.508    15.943
      
      vq size=512
      Packet-Weight   Ping-Latencies(millisecond)
                         min      avg       max
      Origin           6.537   29.177    66.245
      64               2.798    3.614     4.403
      128              2.861    3.820     4.775
      256              3.008    4.018     4.807
      512              3.254    4.523     5.824
      1024             3.079    5.335     7.747
      2048             3.944    8.201    12.762
      4096             4.158   11.057    19.985
      
      Seems pretty consistent, a small dip at 2 VQ sizes.
      Ring size is a hint from device about a burst size it can tolerate. Based on
      benchmarks, set the weight to 2 * vq size.
      
      To evaluate this change, another tests were done using netperf(RR, TX) between
      two machines with Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz, and vq size was
      tweaked through qemu. Results shown below does not show obvious changes.
      
      vq size=256 TCP_RR                vq size=512 TCP_RR
      size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
         1/       1/  -7%/        -2%      1/       1/   0%/        -2%
         1/       4/  +1%/         0%      1/       4/  +1%/         0%
         1/       8/  +1%/        -2%      1/       8/   0%/        +1%
        64/       1/  -6%/         0%     64/       1/  +7%/        +3%
        64/       4/   0%/        +2%     64/       4/  -1%/        +1%
        64/       8/   0%/         0%     64/       8/  -1%/        -2%
       256/       1/  -3%/        -4%    256/       1/  -4%/        -2%
       256/       4/  +3%/        +4%    256/       4/  +1%/        +2%
       256/       8/  +2%/         0%    256/       8/  +1%/        -1%
      
      vq size=256 UDP_RR                vq size=512 UDP_RR
      size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
         1/       1/  -5%/        +1%      1/       1/  -3%/        -2%
         1/       4/  +4%/        +1%      1/       4/  -2%/        +2%
         1/       8/  -1%/        -1%      1/       8/  -1%/         0%
        64/       1/  -2%/        -3%     64/       1/  +1%/        +1%
        64/       4/  -5%/        -1%     64/       4/  +2%/         0%
        64/       8/   0%/        -1%     64/       8/  -2%/        +1%
       256/       1/  +7%/        +1%    256/       1/  -7%/         0%
       256/       4/  +1%/        +1%    256/       4/  -3%/        -4%
       256/       8/  +2%/        +2%    256/       8/  +1%/        +1%
      
      vq size=256 TCP_STREAM            vq size=512 TCP_STREAM
      size/sessions/+thu%/+normalize%   size/sessions/+thu%/+normalize%
        64/       1/   0%/        -3%     64/       1/   0%/         0%
        64/       4/  +3%/        -1%     64/       4/  -2%/        +4%
        64/       8/  +9%/        -4%     64/       8/  -1%/        +2%
       256/       1/  +1%/        -4%    256/       1/  +1%/        +1%
       256/       4/  -1%/        -1%    256/       4/  -3%/         0%
       256/       8/  +7%/        +5%    256/       8/  -3%/         0%
       512/       1/  +1%/         0%    512/       1/  -1%/        -1%
       512/       4/  +1%/        -1%    512/       4/   0%/         0%
       512/       8/  +7%/        -5%    512/       8/  +6%/        -1%
      1024/       1/   0%/        -1%   1024/       1/   0%/        +1%
      1024/       4/  +3%/         0%   1024/       4/  +1%/         0%
      1024/       8/  +8%/        +5%   1024/       8/  -1%/         0%
      2048/       1/  +2%/        +2%   2048/       1/  -1%/         0%
      2048/       4/  +1%/         0%   2048/       4/   0%/        -1%
      2048/       8/  -2%/         0%   2048/       8/   5%/        -1%
      4096/       1/  -2%/         0%   4096/       1/  -2%/         0%
      4096/       4/  +2%/         0%   4096/       4/   0%/         0%
      4096/       8/  +9%/        -2%   4096/       8/  -5%/        -1%
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NHaibin Zhang <haibinzhang@tencent.com>
      Signed-off-by: NYunfang Tai <yunfangtai@tencent.com>
      Signed-off-by: NLidong Chen <lidongchen@tencent.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2ac9990
    • V
      net: thunderx: rework mac addresses list to u64 array · 9b5c4dfb
      Vadim Lomovtsev 提交于
      It is too expensive to pass u64 values via linked list, instead
      allocate array for them by overall number of mac addresses from netdev.
      
      This eventually removes multiple kmalloc() calls, aviod memory
      fragmentation and allow to put single null check on kmalloc
      return value in order to prevent a potential null pointer dereference.
      
      Addresses-Coverity-ID: 1467429 ("Dereference null return value")
      Fixes: 37c3347e ("net: thunderx: add ndo_set_rx_mode callback implementation for VF")
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NVadim Lomovtsev <Vadim.Lomovtsev@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b5c4dfb
    • E
      inetpeer: fix uninit-value in inet_getpeer · b6a37e5e
      Eric Dumazet 提交于
      syzbot/KMSAN reported that p->dtime was read while it was
      not yet initialized in :
      
      	delta = (__u32)jiffies - p->dtime;
      	if (delta < ttl || !refcount_dec_if_one(&p->refcnt))
      		gc_stack[i] = NULL;
      
      This is a false positive, because the inetpeer wont be erased
      from rb-tree if the refcount_dec_if_one(&p->refcnt) does not
      succeed. And this wont happen before first inet_putpeer() call
      for this inetpeer has been done, and ->dtime field is written
      exactly before the refcount_dec_and_test(&p->refcnt).
      
      The KMSAN report was :
      
      BUG: KMSAN: uninit-value in inet_peer_gc net/ipv4/inetpeer.c:163 [inline]
      BUG: KMSAN: uninit-value in inet_getpeer+0x1567/0x1e70 net/ipv4/inetpeer.c:228
      CPU: 0 PID: 9494 Comm: syz-executor5 Not tainted 4.16.0+ #82
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       inet_peer_gc net/ipv4/inetpeer.c:163 [inline]
       inet_getpeer+0x1567/0x1e70 net/ipv4/inetpeer.c:228
       inet_getpeer_v4 include/net/inetpeer.h:110 [inline]
       icmpv4_xrlim_allow net/ipv4/icmp.c:330 [inline]
       icmp_send+0x2b44/0x3050 net/ipv4/icmp.c:725
       ip_options_compile+0x237c/0x29f0 net/ipv4/ip_options.c:472
       ip_rcv_options net/ipv4/ip_input.c:284 [inline]
       ip_rcv_finish+0xda8/0x16d0 net/ipv4/ip_input.c:365
       NF_HOOK include/linux/netfilter.h:288 [inline]
       ip_rcv+0x119d/0x16f0 net/ipv4/ip_input.c:493
       __netif_receive_skb_core+0x47cf/0x4a80 net/core/dev.c:4562
       __netif_receive_skb net/core/dev.c:4627 [inline]
       netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701
       netif_receive_skb+0x230/0x240 net/core/dev.c:4725
       tun_rx_batched drivers/net/tun.c:1555 [inline]
       tun_get_user+0x6d88/0x7580 drivers/net/tun.c:1962
       tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990
       do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
       do_iter_write+0x30d/0xd40 fs/read_write.c:932
       vfs_writev fs/read_write.c:977 [inline]
       do_writev+0x3c9/0x830 fs/read_write.c:1012
       SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
       SyS_writev+0x56/0x80 fs/read_write.c:1082
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x455111
      RSP: 002b:00007fae0365cba0 EFLAGS: 00000293 ORIG_RAX: 0000000000000014
      RAX: ffffffffffffffda RBX: 000000000000002e RCX: 0000000000455111
      RDX: 0000000000000001 RSI: 00007fae0365cbf0 RDI: 00000000000000fc
      RBP: 0000000020000040 R08: 00000000000000fc R09: 0000000000000000
      R10: 000000000000002e R11: 0000000000000293 R12: 00000000ffffffff
      R13: 0000000000000658 R14: 00000000006fc8e0 R15: 0000000000000000
      
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
       kmem_cache_alloc+0xaab/0xb90 mm/slub.c:2756
       inet_getpeer+0xed8/0x1e70 net/ipv4/inetpeer.c:210
       inet_getpeer_v4 include/net/inetpeer.h:110 [inline]
       ip4_frag_init+0x4d1/0x740 net/ipv4/ip_fragment.c:153
       inet_frag_alloc net/ipv4/inet_fragment.c:369 [inline]
       inet_frag_create net/ipv4/inet_fragment.c:385 [inline]
       inet_frag_find+0x7da/0x1610 net/ipv4/inet_fragment.c:418
       ip_find net/ipv4/ip_fragment.c:275 [inline]
       ip_defrag+0x448/0x67a0 net/ipv4/ip_fragment.c:676
       ip_check_defrag+0x775/0xda0 net/ipv4/ip_fragment.c:724
       packet_rcv_fanout+0x2a8/0x8d0 net/packet/af_packet.c:1447
       deliver_skb net/core/dev.c:1897 [inline]
       deliver_ptype_list_skb net/core/dev.c:1912 [inline]
       __netif_receive_skb_core+0x314a/0x4a80 net/core/dev.c:4545
       __netif_receive_skb net/core/dev.c:4627 [inline]
       netif_receive_skb_internal+0x49d/0x630 net/core/dev.c:4701
       netif_receive_skb+0x230/0x240 net/core/dev.c:4725
       tun_rx_batched drivers/net/tun.c:1555 [inline]
       tun_get_user+0x6d88/0x7580 drivers/net/tun.c:1962
       tun_chr_write_iter+0x1d4/0x330 drivers/net/tun.c:1990
       do_iter_readv_writev+0x7bb/0x970 include/linux/fs.h:1776
       do_iter_write+0x30d/0xd40 fs/read_write.c:932
       vfs_writev fs/read_write.c:977 [inline]
       do_writev+0x3c9/0x830 fs/read_write.c:1012
       SYSC_writev+0x9b/0xb0 fs/read_write.c:1085
       SyS_writev+0x56/0x80 fs/read_write.c:1082
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6a37e5e
    • R
      9178caf9
    • E
      dp83640: Ensure against premature access to PHY registers after reset · 76327a35
      Esben Haabendal 提交于
      The datasheet specifies a 3uS pause after performing a software
      reset. The default implementation of genphy_soft_reset() does not
      provide this, so implement soft_reset with the needed pause.
      Signed-off-by: NEsben Haabendal <eha@deif.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76327a35
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · 1f1cba78
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-04-09
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) Two sockmap fixes: i) fix a potential warning when a socket with
         pending cork data is closed by freeing the memory right when the
         socket is closed instead of seeing still outstanding memory at
         garbage collector time, ii) fix a NULL pointer deref in case of
         duplicates release calls, so make sure to only reset the sk_prot
         pointer when it's in a valid state to do so, both from John.
      
      2) Fix a compilation warning in bpf_prog_attach_check_attach_type()
         by moving the function under CONFIG_CGROUP_BPF ifdef since only
         used there, from Anders.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f1cba78
    • D
      Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth · 4c7c12e0
      David S. Miller 提交于
      Johan Hedberg says:
      
      ====================
      pull request: bluetooth 2018-04-08
      
      Here's one important Bluetooth fix for the 4.17-rc series that's needed
      to pass several Bluetooth qualification test cases.
      
      Let me know if there are any issues pulling. Thanks.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c7c12e0
    • J
      devlink: convert occ_get op to separate registration · fc56be47
      Jiri Pirko 提交于
      This resolves race during initialization where the resources with
      ops are registered before driver and the structures used by occ_get
      op is initialized. So keep occ_get callbacks registered only when
      all structs are initialized.
      
      The example flows, as it is in mlxsw:
      1) driver load/asic probe:
         mlxsw_core
            -> mlxsw_sp_resources_register
              -> mlxsw_sp_kvdl_resources_register
                -> devlink_resource_register IDX
         mlxsw_spectrum
            -> mlxsw_sp_kvdl_init
              -> mlxsw_sp_kvdl_parts_init
                -> mlxsw_sp_kvdl_part_init
                  -> devlink_resource_size_get IDX (to get the current setup
                                                    size from devlink)
              -> devlink_resource_occ_get_register IDX (register current
                                                        occupancy getter)
      2) reload triggered by devlink command:
        -> mlxsw_devlink_core_bus_device_reload
          -> mlxsw_sp_fini
            -> mlxsw_sp_kvdl_fini
      	-> devlink_resource_occ_get_unregister IDX
          (struct mlxsw_sp *mlxsw_sp is freed at this point, call to occ get
           which is using mlxsw_sp would cause use-after free)
          -> mlxsw_sp_init
            -> mlxsw_sp_kvdl_init
              -> mlxsw_sp_kvdl_parts_init
                -> mlxsw_sp_kvdl_part_init
                  -> devlink_resource_size_get IDX (to get the current setup
                                                    size from devlink)
              -> devlink_resource_occ_get_register IDX (register current
                                                        occupancy getter)
      
      Fixes: d9f9b9a4 ("devlink: Add support for resource abstraction")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc56be47
    • E
      ARM: dts: ls1021a: Specify TBIPA register address · 55711961
      Esben Haabendal 提交于
      The current (mildly evil) fsl_pq_mdio code uses an undocumented shadow of
      the TBIPA register on LS1021A, which happens to be read-only.
      Changing TBI PHY address therefore does not work on LS1021A.
      
      The real (and documented) address of the TBIPA registere lies in the eTSEC
      block and not in MDIO/MII, which is read/write, so using that fixes
      the problem.
      Signed-off-by: NEsben Haabendal <eha@deif.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55711961
    • E
      net/fsl_pq_mdio: Allow explicit speficition of TBIPA address · 21481189
      Esben Haabendal 提交于
      This introduces a simpler and generic method for for finding (and mapping)
      the TBIPA register.
      
      Instead of relying of complicated logic for finding the TBIPA register
      address based on the MDIO or MII register block base
      address, which even in some cases relies on undocumented shadow registers,
      a second "reg" entry for the mdio bus devicetree node specifies the TBIPA
      register.
      
      Backwards compatibility is kept, as the existing logic is applied when
      only a single "reg" mapping is specified.
      Signed-off-by: NEsben Haabendal <eha@deif.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21481189
    • D
      Merge branch 'ibmvnic-Fix-driver-reset-and-DMA-bugs' · 4e31a684
      David S. Miller 提交于
      Thomas Falcon says:
      
      ====================
      ibmvnic: Fix driver reset and DMA bugs
      
      This patch series introduces some fixes to the driver reset
      routines and a patch that fixes mistakes caught by the kernel
      DMA debugger.
      
      The reset fixes include a fix to reset TX queue counters properly
      after a reset as well as updates to driver reset error-handling code.
      It also provides updates to the reset handling routine for redundant
      backing VF failover and partition migration cases.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e31a684
    • N
      ibmvnic: Do not reset CRQ for Mobility driver resets · 30f79625
      Nathan Fontenot 提交于
      When resetting the ibmvnic driver after a partition migration occurs
      there is no requirement to do a reset of the main CRQ. The current
      driver code does the required re-enable of the main CRQ, then does
      a reset of the main CRQ later.
      
      What we should be doing for a driver reset after a migration is to
      re-enable the main CRQ, release all the sub-CRQs, and then allocate
      new sub-CRQs after capability negotiation.
      
      This patch updates the handling of mobility resets to do the proper
      work and not reset the main CRQ. To do this the initialization/reset
      of the main CRQ had to be moved out of the ibmvnic_init routine
      and in to the ibmvnic_probe and do_reset routines.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30f79625
    • T
      ibmvnic: Fix failover case for non-redundant configuration · 5a18e1e0
      Thomas Falcon 提交于
      There is a failover case for a non-redundant pseries VNIC
      configuration that was not being handled properly. The current
      implementation assumes that the driver will always have a redandant
      device to communicate with following a failover notification. There
      are cases, however, when a non-redundant configuration can receive
      a failover request. If that happens, the driver should wait until
      it receives a signal that the device is ready for operation.
      
      The driver is agnostic of its backing hardware configuration,
      so this fix necessarily affects all device failover management.
      The driver needs to wait until it receives a signal that the device
      is ready for resetting. A flag is introduced to track this intermediary
      state where the driver is waiting for an active device.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a18e1e0
    • T
      ibmvnic: Fix reset scheduler error handling · af894d23
      Thomas Falcon 提交于
      In some cases, if the driver is waiting for a reset following
      a device parameter change, failure to schedule a reset can result
      in a hang since a completion signal is never sent.
      
      If the device configuration is being altered by a tool such
      as ethtool or ifconfig, it could cause the console to hang
      if the reset request does not get scheduled. Add some additional
      error handling code to exit the wait_for_completion if there is
      one in progress.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af894d23
    • T
      ibmvnic: Zero used TX descriptor counter on reset · 41f71467
      Thomas Falcon 提交于
      The counter that tracks used TX descriptors pending completion
      needs to be zeroed as part of a device reset. This change fixes
      a bug causing transmit queues to be stopped unnecessarily and in
      some cases a transmit queue stall and timeout reset. If the counter
      is not reset, the remaining descriptors will not be "removed",
      effectively reducing queue capacity. If the queue is over half full,
      it will cause the queue to stall if stopped.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41f71467
    • T
      ibmvnic: Fix DMA mapping mistakes · 37e40fa8
      Thomas Falcon 提交于
      Fix some mistakes caught by the DMA debugger. The first change
      fixes a unnecessary unmap that should have been removed in an
      earlier update. The next hunk fixes another bad unmap by zeroing
      the bit checked to determine that an unmap is needed. The final
      change fixes some buffers that are unmapped with the wrong
      direction specified.
      Signed-off-by: NThomas Falcon <tlfalcon@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37e40fa8
    • C
      tipc: use the right skb in tipc_sk_fill_sock_diag() · e41f0548
      Cong Wang 提交于
      Commit 4b2e6877 ("tipc: Fix namespace violation in tipc_sk_fill_sock_diag")
      tried to fix the crash but failed, the crash is still 100% reproducible
      with it.
      
      In tipc_sk_fill_sock_diag(), skb is the diag dump we are filling, it is not
      correct to retrieve its NETLINK_CB(), instead, like other protocol diag,
      we should use NETLINK_CB(cb->skb).sk here.
      
      Reported-by: <syzbot+326e587eff1074657718@syzkaller.appspotmail.com>
      Fixes: 4b2e6877 ("tipc: Fix namespace violation in tipc_sk_fill_sock_diag")
      Fixes: c30b70de (tipc: implement socket diagnostics for AF_TIPC)
      Cc: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e41f0548
    • E
      sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6 · 81e98370
      Eric Dumazet 提交于
      Check must happen before call to ipv6_addr_v4mapped()
      
      syzbot report was :
      
      BUG: KMSAN: uninit-value in sctp_sockaddr_af net/sctp/socket.c:359 [inline]
      BUG: KMSAN: uninit-value in sctp_do_bind+0x60f/0xdc0 net/sctp/socket.c:384
      CPU: 0 PID: 3576 Comm: syzkaller968804 Not tainted 4.16.0+ #82
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       sctp_sockaddr_af net/sctp/socket.c:359 [inline]
       sctp_do_bind+0x60f/0xdc0 net/sctp/socket.c:384
       sctp_bind+0x149/0x190 net/sctp/socket.c:332
       inet6_bind+0x1fd/0x1820 net/ipv6/af_inet6.c:293
       SYSC_bind+0x3f2/0x4b0 net/socket.c:1474
       SyS_bind+0x54/0x80 net/socket.c:1460
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x43fd49
      RSP: 002b:00007ffe99df3d28 EFLAGS: 00000213 ORIG_RAX: 0000000000000031
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fd49
      RDX: 0000000000000010 RSI: 0000000020000000 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 00000000004002c8 R11: 0000000000000213 R12: 0000000000401670
      R13: 0000000000401700 R14: 0000000000000000 R15: 0000000000000000
      
      Local variable description: ----address@SYSC_bind
      Variable was created at:
       SYSC_bind+0x6f/0x4b0 net/socket.c:1461
       SyS_bind+0x54/0x80 net/socket.c:1460
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81e98370
  6. 08 4月, 2018 5 次提交
    • A
      getname_kernel() needs to make sure that ->name != ->iname in long case · 30ce4d19
      Al Viro 提交于
      missed it in "kill struct filename.separate" several years ago.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      30ce4d19
    • A
      net: dsa: Discard frames from unused ports · fc5f3376
      Andrew Lunn 提交于
      The Marvell switches under some conditions will pass a frame to the
      host with the port being the CPU port. Such frames are invalid, and
      should be dropped. Not dropping them can result in a crash when
      incrementing the receive statistics for an invalid port.
      Reported-by: NChris Healy <cphealy@gmail.com>
      Fixes: 91da11f8 ("net: Distributed Switch Architecture protocol support")
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc5f3376
    • E
      sctp: do not leak kernel memory to user space · 6780db24
      Eric Dumazet 提交于
      syzbot produced a nice report [1]
      
      Issue here is that a recvmmsg() managed to leak 8 bytes of kernel memory
      to user space, because sin_zero (padding field) was not properly cleared.
      
      [1]
      BUG: KMSAN: uninit-value in copy_to_user include/linux/uaccess.h:184 [inline]
      BUG: KMSAN: uninit-value in move_addr_to_user+0x32e/0x530 net/socket.c:227
      CPU: 1 PID: 3586 Comm: syzkaller481044 Not tainted 4.16.0+ #82
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       kmsan_internal_check_memory+0x164/0x1d0 mm/kmsan/kmsan.c:1176
       kmsan_copy_to_user+0x69/0x160 mm/kmsan/kmsan.c:1199
       copy_to_user include/linux/uaccess.h:184 [inline]
       move_addr_to_user+0x32e/0x530 net/socket.c:227
       ___sys_recvmsg+0x4e2/0x810 net/socket.c:2211
       __sys_recvmmsg+0x54e/0xdb0 net/socket.c:2313
       SYSC_recvmmsg+0x29b/0x3e0 net/socket.c:2394
       SyS_recvmmsg+0x76/0xa0 net/socket.c:2378
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x4401c9
      RSP: 002b:00007ffc56f73098 EFLAGS: 00000217 ORIG_RAX: 000000000000012b
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004401c9
      RDX: 0000000000000001 RSI: 0000000020003ac0 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000020003bc0 R09: 0000000000000010
      R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000401af0
      R13: 0000000000401b80 R14: 0000000000000000 R15: 0000000000000000
      
      Local variable description: ----addr@___sys_recvmsg
      Variable was created at:
       ___sys_recvmsg+0xd5/0x810 net/socket.c:2172
       __sys_recvmmsg+0x54e/0xdb0 net/socket.c:2313
      
      Bytes 8-15 of 16 are uninitialized
      
      ==================================================================
      Kernel panic - not syncing: panic_on_warn set ...
      
      CPU: 1 PID: 3586 Comm: syzkaller481044 Tainted: G    B            4.16.0+ #82
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       panic+0x39d/0x940 kernel/panic.c:183
       kmsan_report+0x238/0x240 mm/kmsan/kmsan.c:1083
       kmsan_internal_check_memory+0x164/0x1d0 mm/kmsan/kmsan.c:1176
       kmsan_copy_to_user+0x69/0x160 mm/kmsan/kmsan.c:1199
       copy_to_user include/linux/uaccess.h:184 [inline]
       move_addr_to_user+0x32e/0x530 net/socket.c:227
       ___sys_recvmsg+0x4e2/0x810 net/socket.c:2211
       __sys_recvmmsg+0x54e/0xdb0 net/socket.c:2313
       SYSC_recvmmsg+0x29b/0x3e0 net/socket.c:2394
       SyS_recvmmsg+0x76/0xa0 net/socket.c:2378
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc:	Vlad Yasevich <vyasevich@gmail.com>
      Cc:	Neil Horman <nhorman@tuxdriver.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6780db24
    • D
      Merge branch 'net-fix-uninit-values-in-networking-stack' · ccb48e83
      David S. Miller 提交于
      Eric Dumazet says:
      
      ====================
      net: fix uninit-values in networking stack
      
      It seems syzbot got new features enabled, and fired some interesting
      reports. Oh well.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccb48e83
    • E
      soreuseport: initialise timewait reuseport field · 3099a529
      Eric Dumazet 提交于
      syzbot reported an uninit-value in inet_csk_bind_conflict() [1]
      
      It turns out we never propagated sk->sk_reuseport into timewait socket.
      
      [1]
      BUG: KMSAN: uninit-value in inet_csk_bind_conflict+0x5f9/0x990 net/ipv4/inet_connection_sock.c:151
      CPU: 1 PID: 3589 Comm: syzkaller008242 Not tainted 4.16.0+ #82
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       inet_csk_bind_conflict+0x5f9/0x990 net/ipv4/inet_connection_sock.c:151
       inet_csk_get_port+0x1d28/0x1e40 net/ipv4/inet_connection_sock.c:320
       inet6_bind+0x121c/0x1820 net/ipv6/af_inet6.c:399
       SYSC_bind+0x3f2/0x4b0 net/socket.c:1474
       SyS_bind+0x54/0x80 net/socket.c:1460
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x4416e9
      RSP: 002b:00007ffce6d15c88 EFLAGS: 00000217 ORIG_RAX: 0000000000000031
      RAX: ffffffffffffffda RBX: 0100000000000000 RCX: 00000000004416e9
      RDX: 000000000000001c RSI: 0000000020402000 RDI: 0000000000000004
      RBP: 0000000000000000 R08: 00000000e6d15e08 R09: 00000000e6d15e08
      R10: 0000000000000004 R11: 0000000000000217 R12: 0000000000009478
      R13: 00000000006cd448 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was stored to memory at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_save_stack mm/kmsan/kmsan.c:293 [inline]
       kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684
       __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:521
       tcp_time_wait+0xf17/0xf50 net/ipv4/tcp_minisocks.c:283
       tcp_rcv_state_process+0xebe/0x6490 net/ipv4/tcp_input.c:6003
       tcp_v6_do_rcv+0x11dd/0x1d90 net/ipv6/tcp_ipv6.c:1331
       sk_backlog_rcv include/net/sock.h:908 [inline]
       __release_sock+0x2d6/0x680 net/core/sock.c:2271
       release_sock+0x97/0x2a0 net/core/sock.c:2786
       tcp_close+0x277/0x18f0 net/ipv4/tcp.c:2269
       inet_release+0x240/0x2a0 net/ipv4/af_inet.c:427
       inet6_release+0xaf/0x100 net/ipv6/af_inet6.c:435
       sock_release net/socket.c:595 [inline]
       sock_close+0xe0/0x300 net/socket.c:1149
       __fput+0x49e/0xa10 fs/file_table.c:209
       ____fput+0x37/0x40 fs/file_table.c:243
       task_work_run+0x243/0x2c0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x10e1/0x38d0 kernel/exit.c:867
       do_group_exit+0x1a0/0x360 kernel/exit.c:970
       SYSC_exit_group+0x21/0x30 kernel/exit.c:981
       SyS_exit_group+0x25/0x30 kernel/exit.c:979
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      Uninit was stored to memory at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_save_stack mm/kmsan/kmsan.c:293 [inline]
       kmsan_internal_chain_origin+0x12b/0x210 mm/kmsan/kmsan.c:684
       __msan_chain_origin+0x69/0xc0 mm/kmsan/kmsan_instr.c:521
       inet_twsk_alloc+0xaef/0xc00 net/ipv4/inet_timewait_sock.c:182
       tcp_time_wait+0xd9/0xf50 net/ipv4/tcp_minisocks.c:258
       tcp_rcv_state_process+0xebe/0x6490 net/ipv4/tcp_input.c:6003
       tcp_v6_do_rcv+0x11dd/0x1d90 net/ipv6/tcp_ipv6.c:1331
       sk_backlog_rcv include/net/sock.h:908 [inline]
       __release_sock+0x2d6/0x680 net/core/sock.c:2271
       release_sock+0x97/0x2a0 net/core/sock.c:2786
       tcp_close+0x277/0x18f0 net/ipv4/tcp.c:2269
       inet_release+0x240/0x2a0 net/ipv4/af_inet.c:427
       inet6_release+0xaf/0x100 net/ipv6/af_inet6.c:435
       sock_release net/socket.c:595 [inline]
       sock_close+0xe0/0x300 net/socket.c:1149
       __fput+0x49e/0xa10 fs/file_table.c:209
       ____fput+0x37/0x40 fs/file_table.c:243
       task_work_run+0x243/0x2c0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x10e1/0x38d0 kernel/exit.c:867
       do_group_exit+0x1a0/0x360 kernel/exit.c:970
       SYSC_exit_group+0x21/0x30 kernel/exit.c:981
       SyS_exit_group+0x25/0x30 kernel/exit.c:979
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      Uninit was created at:
       kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
       kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
       kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
       kmem_cache_alloc+0xaab/0xb90 mm/slub.c:2756
       inet_twsk_alloc+0x13b/0xc00 net/ipv4/inet_timewait_sock.c:163
       tcp_time_wait+0xd9/0xf50 net/ipv4/tcp_minisocks.c:258
       tcp_rcv_state_process+0xebe/0x6490 net/ipv4/tcp_input.c:6003
       tcp_v6_do_rcv+0x11dd/0x1d90 net/ipv6/tcp_ipv6.c:1331
       sk_backlog_rcv include/net/sock.h:908 [inline]
       __release_sock+0x2d6/0x680 net/core/sock.c:2271
       release_sock+0x97/0x2a0 net/core/sock.c:2786
       tcp_close+0x277/0x18f0 net/ipv4/tcp.c:2269
       inet_release+0x240/0x2a0 net/ipv4/af_inet.c:427
       inet6_release+0xaf/0x100 net/ipv6/af_inet6.c:435
       sock_release net/socket.c:595 [inline]
       sock_close+0xe0/0x300 net/socket.c:1149
       __fput+0x49e/0xa10 fs/file_table.c:209
       ____fput+0x37/0x40 fs/file_table.c:243
       task_work_run+0x243/0x2c0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x10e1/0x38d0 kernel/exit.c:867
       do_group_exit+0x1a0/0x360 kernel/exit.c:970
       SYSC_exit_group+0x21/0x30 kernel/exit.c:981
       SyS_exit_group+0x25/0x30 kernel/exit.c:979
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      Fixes: da5e3630 ("soreuseport: TCP/IPv4 implementation")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3099a529