1. 23 8月, 2017 1 次提交
    • E
      xfs: write unmount record for ro mounts · 757a69ef
      Eric Sandeen 提交于
      There are dueling comments in the xfs code about intent
      for log writes when unmounting a readonly filesystem.
      
      In xfs_mountfs, we see the intent:
      
      /*
       * Now the log is fully replayed, we can transition to full read-only
       * mode for read-only mounts. This will sync all the metadata and clean
       * the log so that the recovery we just performed does not have to be
       * replayed again on the next mount.
       */
      
      and it calls xfs_quiesce_attr(), but by the time we get to
      xfs_log_unmount_write(), it returns early for a RDONLY mount:
      
       * Don't write out unmount record on read-only mounts.
      
      Because of this, sequential ro mounts of a filesystem with
      a dirty log will replay the log each time, which seems odd.
      
      Fix this by writing an unmount record even for RO mounts, as long
      as norecovery wasn't specified (don't write a clean log record
      if a dirty log may still be there!) and the log device is
      writable.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      757a69ef
  2. 22 8月, 2017 6 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 6470812e
      Linus Torvalds 提交于
      Pull sparc fixes from David Miller:
       "Just a couple small fixes, two of which have to do with gcc-7:
      
         1) Don't clobber kernel fixed registers in __multi4 libgcc helper.
      
         2) Fix a new uninitialized variable warning on sparc32 with gcc-7,
            from Thomas Petazzoni.
      
         3) Adjust pmd_t initializer on sparc32 to make gcc happy.
      
         4) If ATU isn't available, don't bark in the logs. From Tushar Dave"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        sparc: kernel/pcic: silence gcc 7.x warning in pcibios_fixup_bus()
        sparc64: remove unnecessary log message
        sparc64: Don't clibber fixed registers in __multi4.
        mm: add pmd_t initializer __pmd() to work around a GCC bug.
      6470812e
    • T
      sparc: kernel/pcic: silence gcc 7.x warning in pcibios_fixup_bus() · 2dc77533
      Thomas Petazzoni 提交于
      When building the kernel for Sparc using gcc 7.x, the build fails
      with:
      
      arch/sparc/kernel/pcic.c: In function ‘pcibios_fixup_bus’:
      arch/sparc/kernel/pcic.c:647:8: error: ‘cmd’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
          cmd |= PCI_COMMAND_IO;
              ^~
      
      The simplified code looks like this:
      
      unsigned int cmd;
      [...]
      pcic_read_config(dev->bus, dev->devfn, PCI_COMMAND, 2, &cmd);
      [...]
      cmd |= PCI_COMMAND_IO;
      
      I.e, the code assumes that pcic_read_config() will always initialize
      cmd. But it's not the case. Looking at pcic_read_config(), if
      bus->number is != 0 or if the size is not one of 1, 2 or 4, *val will
      not be initialized.
      
      As a simple fix, we initialize cmd to zero at the beginning of
      pcibios_fixup_bus.
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2dc77533
    • L
      Merge tag 'arc-4.13-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc · 05ab303b
      Linus Torvalds 提交于
      Pull ARC fixes from Vineet Gupta:
      
       - PAE40 related updates
      
       - SLC errata for region ops
      
       - intc line masking by default
      
      * tag 'arc-4.13-rc7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
        arc: Mask individual IRQ lines during core INTC init
        ARCv2: PAE40: set MSB even if !CONFIG_ARC_HAS_PAE40 but PAE exists in SoC
        ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses
        ARC: dma: implement dma_unmap_page and sg variant
        ARCv2: SLC: Make sure busy bit is set properly for region ops
        ARC: [plat-sim] Include this platform unconditionally
        ARC: [plat-axs10x]: prepare dts files for enabling PAE40 on axs103
        ARC: defconfig: Cleanup from old Kconfig options
      05ab303b
    • L
      Merge tag 'rtc-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · 0b3baec8
      Linus Torvalds 提交于
      Pull RTC fix from Alexandre Belloni:
       "Fix regmap configuration for ds1307"
      
      * tag 'rtc-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
        rtc: ds1307: fix regmap config
      0b3baec8
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · e3181f2c
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Fix IGMP handling wrt VRF, from David Ahern.
      
       2) Fix timer access to freed object in dccp, from Eric Dumazet.
      
       3) Use kmalloc_array() in ptr_ring to avoid overflow cases which are
          triggerable by userspace. Also from Eric Dumazet.
      
       4) Fix infinite loop in unmapping cleanup of nfp driver, from Colin Ian
          King.
      
       5) Correct datagram peek handling of empty SKBs, from Matthew Dawson.
      
       6) Fix use after free in TIPC, from Eric Dumazet.
      
       7) When replacing a route in ipv6 we need to reset the round robin
          pointer, from Wei Wang.
      
       8) Fix bug in pci_find_pcie_root_port() which was unearthed by the
          relaxed ordering changes, from Thierry Redding. I made sure to get
          an explicit ACK from Bjorn this time around :-)
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
        ipv6: repair fib6 tree in failure case
        net_sched: fix order of queue length updates in qdisc_replace()
        tools lib bpf: improve warning
        switchdev: documentation: minor typo fixes
        bpf, doc: also add s390x as arch to sysctl description
        net: sched: fix NULL pointer dereference when action calls some targets
        rxrpc: Fix oops when discarding a preallocated service call
        irda: do not leak initialized list.dev to userspace
        net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled
        PCI: Allow PCI express root ports to find themselves
        tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
        net: check and errout if res->fi is NULL when RTM_F_FIB_MATCH is set
        ipv6: reset fn->rr_ptr when replacing route
        sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
        tipc: fix use-after-free
        tun: handle register_netdevice() failures properly
        datagram: When peeking datagrams with offset < 0 don't skip empty skbs
        bpf, doc: improve sysctl knob description
        netxen: fix incorrect loop counter decrement
        nfp: fix infinite loop on umapping cleanup
        ...
      e3181f2c
    • O
      pids: make task_tgid_nr_ns() safe · dd1c1f2f
      Oleg Nesterov 提交于
      This was reported many times, and this was even mentioned in commit
      52ee2dfd ("pids: refactor vnr/nr_ns helpers to make them safe") but
      somehow nobody bothered to fix the obvious problem: task_tgid_nr_ns() is
      not safe because task->group_leader points to nowhere after the exiting
      task passes exit_notify(), rcu_read_lock() can not help.
      
      We really need to change __unhash_process() to nullify group_leader,
      parent, and real_parent, but this needs some cleanups.  Until then we
      can turn task_tgid_nr_ns() into another user of __task_pid_nr_ns() and
      fix the problem.
      Reported-by: NTroy Kensinger <tkensinger@google.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd1c1f2f
  3. 21 8月, 2017 12 次提交
    • H
      rtc: ds1307: fix regmap config · 03619844
      Heiner Kallweit 提交于
      Current max_register setting breaks reading nvram on certain chips and
      also reading the standard registers on RX8130 where register map starts
      at 0x10.
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Fixes: 11e5890b "rtc: ds1307: convert driver to regmap"
      Signed-off-by: NAlexandre Belloni <alexandre.belloni@free-electrons.com>
      03619844
    • W
      ipv6: repair fib6 tree in failure case · 348a4002
      Wei Wang 提交于
      In fib6_add(), it is possible that fib6_add_1() picks an intermediate
      node and sets the node's fn->leaf to NULL in order to add this new
      route. However, if fib6_add_rt2node() fails to add the new
      route for some reason, fn->leaf will be left as NULL and could
      potentially cause crash when fn->leaf is accessed in fib6_locate().
      This patch makes sure fib6_repair_tree() is called to properly repair
      fn->leaf in the above failure case.
      
      Here is the syzkaller reported general protection fault in fib6_locate:
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 0 PID: 40937 Comm: syz-executor3 Not tainted
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801d7d64100 ti: ffff8801d01a0000 task.ti: ffff8801d01a0000
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] __ipv6_prefix_equal64_half include/net/ipv6.h:475 [inline]
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] ipv6_prefix_equal include/net/ipv6.h:492 [inline]
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] fib6_locate_1 net/ipv6/ip6_fib.c:1210 [inline]
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] fib6_locate+0x281/0x3c0 net/ipv6/ip6_fib.c:1233
      RSP: 0018:ffff8801d01a36a8  EFLAGS: 00010202
      RAX: 0000000000000020 RBX: ffff8801bc790e00 RCX: ffffc90002983000
      RDX: 0000000000001219 RSI: ffff8801d01a37a0 RDI: 0000000000000100
      RBP: ffff8801d01a36f0 R08: 00000000000000ff R09: 0000000000000000
      R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
      R13: dffffc0000000000 R14: ffff8801d01a37a0 R15: 0000000000000000
      FS:  00007f6afd68c700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004c6340 CR3: 00000000ba41f000 CR4: 00000000001426f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Stack:
       ffff8801d01a37a8 ffff8801d01a3780 ffffed003a0346f5 0000000c82a23ea0
       ffff8800b7bd7700 ffff8801d01a3780 ffff8800b6a1c940 ffffffff82a23ea0
       ffff8801d01a3920 ffff8801d01a3748 ffffffff82a223d6 ffff8801d7d64988
      Call Trace:
       [<ffffffff82a223d6>] ip6_route_del+0x106/0x570 net/ipv6/route.c:2109
       [<ffffffff82a23f9d>] inet6_rtm_delroute+0xfd/0x100 net/ipv6/route.c:3075
       [<ffffffff82621359>] rtnetlink_rcv_msg+0x549/0x7a0 net/core/rtnetlink.c:3450
       [<ffffffff8274c1d1>] netlink_rcv_skb+0x141/0x370 net/netlink/af_netlink.c:2281
       [<ffffffff82613ddf>] rtnetlink_rcv+0x2f/0x40 net/core/rtnetlink.c:3456
       [<ffffffff8274ad38>] netlink_unicast_kernel net/netlink/af_netlink.c:1206 [inline]
       [<ffffffff8274ad38>] netlink_unicast+0x518/0x750 net/netlink/af_netlink.c:1232
       [<ffffffff8274b83e>] netlink_sendmsg+0x8ce/0xc30 net/netlink/af_netlink.c:1778
       [<ffffffff82564aff>] sock_sendmsg_nosec net/socket.c:609 [inline]
       [<ffffffff82564aff>] sock_sendmsg+0xcf/0x110 net/socket.c:619
       [<ffffffff82564d62>] sock_write_iter+0x222/0x3a0 net/socket.c:834
       [<ffffffff8178523d>] new_sync_write+0x1dd/0x2b0 fs/read_write.c:478
       [<ffffffff817853f4>] __vfs_write+0xe4/0x110 fs/read_write.c:491
       [<ffffffff81786c38>] vfs_write+0x178/0x4b0 fs/read_write.c:538
       [<ffffffff817892a9>] SYSC_write fs/read_write.c:585 [inline]
       [<ffffffff817892a9>] SyS_write+0xd9/0x1b0 fs/read_write.c:577
       [<ffffffff82c71e32>] entry_SYSCALL_64_fastpath+0x12/0x17
      
      Note: there is no "Fixes" tag as this seems to be a bug introduced
      very early.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      348a4002
    • K
      net_sched: fix order of queue length updates in qdisc_replace() · 68a66d14
      Konstantin Khlebnikov 提交于
      This important to call qdisc_tree_reduce_backlog() after changing queue
      length. Parent qdisc should deactivate class in ->qlen_notify() called from
      qdisc_tree_reduce_backlog() but this happens only if qdisc->q.qlen in zero.
      
      Missed class deactivations leads to crashes/warnings at picking packets
      from empty qdisc and corrupting state at reactivating this class in future.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Fixes: 86a7996c ("net_sched: introduce qdisc_replace() helper")
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68a66d14
    • E
      49bf4b36
    • C
      switchdev: documentation: minor typo fixes · 5a784498
      Chris Packham 提交于
      Two typos in switchdev.txt
      Signed-off-by: NChris Packham <chris.packham@alliedtelesis.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a784498
    • D
      bpf, doc: also add s390x as arch to sysctl description · d4dd2d75
      Daniel Borkmann 提交于
      Looks like this was accidentally missed, so still add s390x
      as supported eBPF JIT arch to bpf_jit_enable.
      
      Fixes: 014cd0a3 ("bpf: Update sysctl documentation to list all supported architectures")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4dd2d75
    • L
      Linux 4.13-rc6 · 14ccee78
      Linus Torvalds 提交于
      14ccee78
    • L
      Sanitize 'move_pages()' permission checks · 197e7e52
      Linus Torvalds 提交于
      The 'move_paghes()' system call was introduced long long ago with the
      same permission checks as for sending a signal (except using
      CAP_SYS_NICE instead of CAP_SYS_KILL for the overriding capability).
      
      That turns out to not be a great choice - while the system call really
      only moves physical page allocations around (and you need other
      capabilities to do a lot of it), you can check the return value to map
      out some the virtual address choices and defeat ASLR of a binary that
      still shares your uid.
      
      So change the access checks to the more common 'ptrace_may_access()'
      model instead.
      
      This tightens the access checks for the uid, and also effectively
      changes the CAP_SYS_NICE check to CAP_SYS_PTRACE, but it's unlikely that
      anybody really _uses_ this legacy system call any more (we hav ebetter
      NUMA placement models these days), so I expect nobody to notice.
      
      Famous last words.
      Reported-by: NOtto Ebeling <otto.ebeling@iki.fi>
      Acked-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      197e7e52
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7f680d7e
      Linus Torvalds 提交于
      Pull x86 fixes from Thomas Gleixner:
       "Another pile of small fixes and updates for x86:
      
         - Plug a hole in the SMAP implementation which misses to clear AC on
           NMI entry
      
         - Fix the norandmaps/ADDR_NO_RANDOMIZE logic so the command line
           parameter works correctly again
      
         - Use the proper accessor in the startup64 code for next_early_pgt to
           prevent accessing of invalid addresses and faulting in the early
           boot code.
      
         - Prevent CPU hotplug lock recursion in the MTRR code
      
         - Unbreak CPU0 hotplugging
      
         - Rename overly long CPUID bits which got introduced in this cycle
      
         - Two commits which mark data 'const' and restrict the scope of data
           and functions to file scope by making them 'static'"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86: Constify attribute_group structures
        x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt'
        x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks
        x86: Fix norandmaps/ADDR_NO_RANDOMIZE
        x86/mtrr: Prevent CPU hotplug lock recursion
        x86: Mark various structures and functions as 'static'
        x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag
        x86/smpboot: Unbreak CPU0 hotplug
        x86/asm/64: Clear AC on NMI entries
      7f680d7e
    • L
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2615a38f
      Linus Torvalds 提交于
      Pull timer fixes from Thomas Gleixner:
       "A few small fixes for timer drivers:
      
         - Prevent infinite recursion in the arm architected timer driver with
           ftrace
      
         - Propagate error codes to the caller in case of failure in EM STI
           driver
      
         - Adjust a bogus loop iteration in the arm architected timer driver
      
         - Add a missing Kconfig dependency to the pistachio clocksource to
           prevent build failures
      
         - Correctly check for IS_ERR() instead of NULL in the shared timer-of
           code"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        clocksource/drivers/arm_arch_timer: Avoid infinite recursion when ftrace is enabled
        clocksource/drivers/Kconfig: Fix CLKSRC_PISTACHIO dependencies
        clocksource/drivers/timer-of: Checking for IS_ERR() instead of NULL
        clocksource/drivers/em_sti: Fix error return codes in em_sti_probe()
        clocksource/drivers/arm_arch_timer: Fix mem frame loop initialization
      2615a38f
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e46db8d2
      Linus Torvalds 提交于
      Pull perf fixes from Thomas Gleixner:
       "Two fixes for the perf subsystem:
      
         - Fix an inconsistency of RDPMC mm struct tagging across exec() which
           causes RDPMC to fault.
      
         - Correct the timestamp mechanics across IOC_DISABLE/ENABLE which
           causes incorrect timestamps and total time calculations"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/core: Fix time on IOC_ENABLE
        perf/x86: Fix RDPMC vs. mm_struct tracking
      e46db8d2
    • L
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9dae41a2
      Linus Torvalds 提交于
      Pull irq fixes from Thomas Gleixner:
       "A pile of smallish changes all over the place:
      
         - Add a missing ISB in the GIC V1 driver
      
         - Remove an ACPI version check in the GIC V3 ITS driver
      
         - Add the missing irq_pm_shutdown function for BRCMSTB-L2 to avoid
           spurious wakeups
      
         - Remove the artifical limitation of ITS instances to the number of
           NUMA nodes which prevents utilizing the ITS hardware correctly
      
         - Prevent a infinite parsing loop in the GIC-V3 ITS/MSI code
      
         - Honour the force affinity argument in the GIC-V3 driver which is
           required to make perf work correctly
      
         - Correctly report allocation failures in GIC-V2/V3 to avoid using
           half allocated and initialized interrupts.
      
         - Fixup checks against nr_cpu_ids in the generic IPI code"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq/ipi: Fixup checks against nr_cpu_ids
        genirq: Restore trigger settings in irq_modify_status()
        MAINTAINERS: Remove Jason Cooper's irqchip git tree
        irqchip/gic-v3-its-platform-msi: Fix msi-parent parsing loop
        irqchip/gic-v3-its: Allow GIC ITS number more than MAX_NUMNODES
        irqchip: brcmstb-l2: Define an irq_pm_shutdown function
        irqchip/gic: Ensure we have an ISB between ack and ->handle_irq
        irqchip/gic-v3-its: Remove ACPICA version check for ACPI NUMA
        irqchip/gic-v3: Honor forced affinity setting
        irqchip/gic-v3: Report failures in gic_irq_domain_alloc
        irqchip/gic-v2: Report failures in gic_irq_domain_alloc
        irqchip/atmel-aic: Remove root argument from ->fixup() prototype
        irqchip/atmel-aic: Fix unbalanced refcount in aic_common_rtc_irq_fixup()
        irqchip/atmel-aic: Fix unbalanced of_node_put() in aic_common_irq_fixup()
      9dae41a2
  4. 20 8月, 2017 2 次提交
  5. 19 8月, 2017 19 次提交
    • X
      net: sched: fix NULL pointer dereference when action calls some targets · 4f8a881a
      Xin Long 提交于
      As we know in some target's checkentry it may dereference par.entryinfo
      to check entry stuff inside. But when sched action calls xt_check_target,
      par.entryinfo is set with NULL. It would cause kernel panic when calling
      some targets.
      
      It can be reproduce with:
        # tc qd add dev eth1 ingress handle ffff:
        # tc filter add dev eth1 parent ffff: u32 match u32 0 0 action xt \
          -j ECN --ecn-tcp-remove
      
      It could also crash kernel when using target CLUSTERIP or TPROXY.
      
      By now there's no proper value for par.entryinfo in ipt_init_target,
      but it can not be set with NULL. This patch is to void all these
      panics by setting it with an ipt_entry obj with all members = 0.
      
      Note that this issue has been there since the very beginning.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f8a881a
    • D
      rxrpc: Fix oops when discarding a preallocated service call · 9a19bad7
      David Howells 提交于
      rxrpc_service_prealloc_one() doesn't set the socket pointer on any new call
      it preallocates, but does add it to the rxrpc net namespace call list.
      This, however, causes rxrpc_put_call() to oops when the call is discarded
      when the socket is closed.  rxrpc_put_call() needs the socket to be able to
      reach the namespace so that it can use a lock held therein.
      
      Fix this by setting a call's socket pointer immediately before discarding
      it.
      
      This can be triggered by unloading the kafs module, resulting in an oops
      like the following:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000030
      IP: rxrpc_put_call+0x1e2/0x32d
      PGD 0
      P4D 0
      Oops: 0000 [#1] SMP
      Modules linked in: kafs(E-)
      CPU: 3 PID: 3037 Comm: rmmod Tainted: G            E   4.12.0-fscache+ #213
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      task: ffff8803fc92e2c0 task.stack: ffff8803fef74000
      RIP: 0010:rxrpc_put_call+0x1e2/0x32d
      RSP: 0018:ffff8803fef77e08 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: ffff8803fab99ac0 RCX: 000000000000000f
      RDX: ffffffff81c50a40 RSI: 000000000000000c RDI: ffff8803fc92ea88
      RBP: ffff8803fef77e30 R08: ffff8803fc87b941 R09: ffffffff82946d20
      R10: ffff8803fef77d10 R11: 00000000000076fc R12: 0000000000000005
      R13: ffff8803fab99c20 R14: 0000000000000001 R15: ffffffff816c6aee
      FS:  00007f915a059700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000030 CR3: 00000003fef39000 CR4: 00000000001406e0
      Call Trace:
       rxrpc_discard_prealloc+0x325/0x341
       rxrpc_listen+0xf9/0x146
       kernel_listen+0xb/0xd
       afs_close_socket+0x3e/0x173 [kafs]
       afs_exit+0x1f/0x57 [kafs]
       SyS_delete_module+0x10f/0x19a
       do_syscall_64+0x8a/0x149
       entry_SYSCALL64_slow_path+0x25/0x25
      
      Fixes: 2baec2c3 ("rxrpc: Support network namespacing")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a19bad7
    • C
      irda: do not leak initialized list.dev to userspace · b024d949
      Colin Ian King 提交于
      list.dev has not been initialized and so the copy_to_user is copying
      data from the stack back to user space which is a potential
      information leak. Fix this ensuring all of list is initialized to
      zero.
      
      Detected by CoverityScan, CID#1357894 ("Uninitialized scalar variable")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b024d949
    • H
      net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled · ca3d89a3
      Huy Nguyen 提交于
      enable_4k_uar module parameter was added in patch cited below to
      address the backward compatibility issue in SRIOV when the VM has
      system's PAGE_SIZE uar implementation and the Hypervisor has 4k uar
      implementation.
      
      The above compatibility issue does not exist in the non SRIOV case.
      In this patch, we always enable 4k uar implementation if SRIOV
      is not enabled on mlx4's supported cards.
      
      Fixes: 76e39ccf ("net/mlx4_core: Fix backward compatibility on VFs")
      Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
      Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca3d89a3
    • T
      PCI: Allow PCI express root ports to find themselves · b6f6d56c
      Thierry Reding 提交于
      If the pci_find_pcie_root_port() function is called on a root port
      itself, return the root port rather than NULL.
      
      This effectively reverts commit 0e405232 ("PCI: fix oops when
      try to find Root Port for a PCI device") which added an extra check
      that would now be redundant.
      
      Fixes: a99b646a ("PCI: Disable PCIe Relaxed Ordering if unsupported")
      Fixes: c56d4450 ("PCI: Turn off Request Attributes to avoid Chelsio T5 Completion erratum")
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Tested-by: NShawn Lin <shawn.lin@rock-chips.com>
      Tested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6f6d56c
    • N
      tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP · cdbeb633
      Neal Cardwell 提交于
      In some situations tcp_send_loss_probe() can realize that it's unable
      to send a loss probe (TLP), and falls back to calling tcp_rearm_rto()
      to schedule an RTO timer. In such cases, sometimes tcp_rearm_rto()
      realizes that the RTO was eligible to fire immediately or at some
      point in the past (delta_us <= 0). Previously in such cases
      tcp_rearm_rto() was scheduling such "overdue" RTOs to happen at now +
      icsk_rto, which caused needless delays of hundreds of milliseconds
      (and non-linear behavior that made reproducible testing
      difficult). This commit changes the logic to schedule "overdue" RTOs
      ASAP, rather than at now + icsk_rto.
      
      Fixes: 6ba8a3b1 ("tcp: Tail loss probe (TLP)")
      Suggested-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdbeb633
    • L
      Merge branch 'akpm' (patches from Andrew) · 58d4e450
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "14 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes
        mm/vmalloc.c: don't unconditonally use __GFP_HIGHMEM
        mm/mempolicy: fix use after free when calling get_mempolicy
        mm/cma_debug.c: fix stack corruption due to sprintf usage
        signal: don't remove SIGNAL_UNKILLABLE for traced tasks.
        mm, oom: fix potential data corruption when oom_reaper races with writer
        mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS
        slub: fix per memcg cache leak on css offline
        mm: discard memblock data later
        test_kmod: fix description for -s -and -c parameters
        kmod: fix wait on recursive loop
        wait: add wait_event_killable_timeout()
        kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdog
        mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback()
      58d4e450
    • R
      net: check and errout if res->fi is NULL when RTM_F_FIB_MATCH is set · bc3aae2b
      Roopa Prabhu 提交于
      Syzkaller hit 'general protection fault in fib_dump_info' bug on
      commit 4.13-rc5..
      
      Guilty file: net/ipv4/fib_semantics.c
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 0 PID: 2808 Comm: syz-executor0 Not tainted 4.13.0-rc5 #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      Ubuntu-1.8.2-1ubuntu1 04/01/2014
      task: ffff880078562700 task.stack: ffff880078110000
      RIP: 0010:fib_dump_info+0x388/0x1170 net/ipv4/fib_semantics.c:1314
      RSP: 0018:ffff880078117010 EFLAGS: 00010206
      RAX: dffffc0000000000 RBX: 00000000000000fe RCX: 0000000000000002
      RDX: 0000000000000006 RSI: ffff880078117084 RDI: 0000000000000030
      RBP: ffff880078117268 R08: 000000000000000c R09: ffff8800780d80c8
      R10: 0000000058d629b4 R11: 0000000067fce681 R12: 0000000000000000
      R13: ffff8800784bd540 R14: ffff8800780d80b5 R15: ffff8800780d80a4
      FS:  00000000022fa940(0000) GS:ffff88007fc00000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004387d0 CR3: 0000000079135000 CR4: 00000000000006f0
      Call Trace:
        inet_rtm_getroute+0xc89/0x1f50 net/ipv4/route.c:2766
        rtnetlink_rcv_msg+0x288/0x680 net/core/rtnetlink.c:4217
        netlink_rcv_skb+0x340/0x470 net/netlink/af_netlink.c:2397
        rtnetlink_rcv+0x28/0x30 net/core/rtnetlink.c:4223
        netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
        netlink_unicast+0x4c4/0x6e0 net/netlink/af_netlink.c:1291
        netlink_sendmsg+0x8c4/0xca0 net/netlink/af_netlink.c:1854
        sock_sendmsg_nosec net/socket.c:633 [inline]
        sock_sendmsg+0xca/0x110 net/socket.c:643
        ___sys_sendmsg+0x779/0x8d0 net/socket.c:2035
        __sys_sendmsg+0xd1/0x170 net/socket.c:2069
        SYSC_sendmsg net/socket.c:2080 [inline]
        SyS_sendmsg+0x2d/0x50 net/socket.c:2076
        entry_SYSCALL_64_fastpath+0x1a/0xa5
        RIP: 0033:0x4512e9
        RSP: 002b:00007ffc75584cc8 EFLAGS: 00000216 ORIG_RAX:
        000000000000002e
        RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00000000004512e9
        RDX: 0000000000000000 RSI: 0000000020f2cfc8 RDI: 0000000000000003
        RBP: 000000000000000e R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000216 R12: fffffffffffffffe
        R13: 0000000000718000 R14: 0000000020c44ff0 R15: 0000000000000000
        Code: 00 0f b6 8d ec fd ff ff 48 8b 85 f0 fd ff ff 88 48 17 48 8b 45
        28 48 8d 78 30 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03
        <0f>
        b6 04 02 84 c0 74 08 3c 03 0f 8e cb 0c 00 00 48 8b 45 28 44
        RIP: fib_dump_info+0x388/0x1170 net/ipv4/fib_semantics.c:1314 RSP:
        ffff880078117010
      ---[ end trace 254a7af28348f88b ]---
      
      This patch adds a res->fi NULL check.
      
      example run:
      $ip route get 0.0.0.0 iif virt1-0
      broadcast 0.0.0.0 dev lo
          cache <local,brd> iif virt1-0
      
      $ip route get 0.0.0.0 iif virt1-0 fibmatch
      RTNETLINK answers: No route to host
      Reported-by: Nidaifish <idaifish@gmail.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Fixes: b6179813 ("net: ipv4: RTM_GETROUTE: return matched fib result when requested")
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc3aae2b
    • W
      ipv6: reset fn->rr_ptr when replacing route · 383143f3
      Wei Wang 提交于
      syzcaller reported the following use-after-free issue in rt6_select():
      BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8
      BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8
      Read of size 4 by task syz-executor1/439628
      CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       0000000000000000 ffff88018fe435b0 ffffffff81ca384d ffff8801d3588c00
       ffff8800bc699380 ffff8800bc699500 dffffc0000000000 ffff8801d40a47c0
       ffff88018fe435d8 ffffffff81735751 ffff88018fe43660 ffff8800bc699380
      Call Trace:
       [<ffffffff81ca384d>] __dump_stack lib/dump_stack.c:15 [inline]
       [<ffffffff81ca384d>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
      sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option.
      Use struct sctp_sack_info instead
       [<ffffffff81735751>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
       [<ffffffff817359c4>] print_address_description mm/kasan/report.c:196 [inline]
       [<ffffffff817359c4>] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285
       [<ffffffff81735d93>] kasan_report mm/kasan/report.c:305 [inline]
       [<ffffffff81735d93>] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325
       [<ffffffff82a28e39>] rt6_select net/ipv6/route.c:755 [inline]
       [<ffffffff82a28e39>] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084
       [<ffffffff82a28fb1>] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203
       [<ffffffff82ab0a50>] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95
       [<ffffffff8265cbb6>] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223
       [<ffffffff82ab1430>] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41
       [<ffffffff82a22006>] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224
       [<ffffffff829e83d2>] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943
       [<ffffffff829e889a>] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079
       [<ffffffff82a9f7d8>] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91
       [<ffffffff82aa0978>] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline]
       [<ffffffff82aa0978>] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272
       [<ffffffff82aa1313>] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284
       [<ffffffff8292f790>] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564
       [<ffffffff82565547>] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582
       [<ffffffff8256a649>] SyS_connect+0x29/0x30 net/socket.c:1563
       [<ffffffff82c72032>] entry_SYSCALL_64_fastpath+0x12/0x17
      Object at ffff8800bc699380, in cache ip6_dst_cache size: 384
      
      The root cause of it is that in fib6_add_rt2node(), when it replaces an
      existing route with the new one, it does not update fn->rr_ptr.
      This commit resets fn->rr_ptr to NULL when it points to a route which is
      replaced in fib6_add_rt2node().
      
      Fixes: 27596472 ("ipv6: fix ECMP route replacement")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      383143f3
    • A
      sctp: fully initialize the IPv6 address in sctp_v6_to_addr() · 15339e44
      Alexander Potapenko 提交于
      KMSAN reported use of uninitialized sctp_addr->v4.sin_addr.s_addr and
      sctp_addr->v6.sin6_scope_id in sctp_v6_cmp_addr() (see below).
      Make sure all fields of an IPv6 address are initialized, which
      guarantees that the IPv4 fields are also initialized.
      
      ==================================================================
       BUG: KMSAN: use of uninitialized memory in sctp_v6_cmp_addr+0x8d4/0x9f0
       net/sctp/ipv6.c:517
       CPU: 2 PID: 31056 Comm: syz-executor1 Not tainted 4.11.0-rc5+ #2944
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
       01/01/2011
       Call Trace:
        dump_stack+0x172/0x1c0 lib/dump_stack.c:42
        is_logbuf_locked mm/kmsan/kmsan.c:59 [inline]
        kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:938
        native_save_fl arch/x86/include/asm/irqflags.h:18 [inline]
        arch_local_save_flags arch/x86/include/asm/irqflags.h:72 [inline]
        arch_local_irq_save arch/x86/include/asm/irqflags.h:113 [inline]
        __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:467
        sctp_v6_cmp_addr+0x8d4/0x9f0 net/sctp/ipv6.c:517
        sctp_v6_get_dst+0x8c7/0x1630 net/sctp/ipv6.c:290
        sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
        sctp_assoc_add_peer+0x66d/0x16f0 net/sctp/associola.c:651
        sctp_sendmsg+0x35a5/0x4f90 net/sctp/socket.c:1871
        inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
        sock_sendmsg_nosec net/socket.c:633 [inline]
        sock_sendmsg net/socket.c:643 [inline]
        SYSC_sendto+0x608/0x710 net/socket.c:1696
        SyS_sendto+0x8a/0xb0 net/socket.c:1664
        entry_SYSCALL_64_fastpath+0x13/0x94
       RIP: 0033:0x44b479
       RSP: 002b:00007f6213f21c08 EFLAGS: 00000286 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 000000000044b479
       RDX: 0000000000000041 RSI: 0000000020edd000 RDI: 0000000000000006
       RBP: 00000000007080a8 R08: 0000000020b85fe4 R09: 000000000000001c
       R10: 0000000000040005 R11: 0000000000000286 R12: 00000000ffffffff
       R13: 0000000000003760 R14: 00000000006e5820 R15: 0000000000ff8000
       origin description: ----dst_saddr@sctp_v6_get_dst
       local variable created at:
        sk_fullsock include/net/sock.h:2321 [inline]
        inet6_sk include/linux/ipv6.h:309 [inline]
        sctp_v6_get_dst+0x91/0x1630 net/sctp/ipv6.c:241
        sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
      ==================================================================
       BUG: KMSAN: use of uninitialized memory in sctp_v6_cmp_addr+0x8d4/0x9f0
       net/sctp/ipv6.c:517
       CPU: 2 PID: 31056 Comm: syz-executor1 Not tainted 4.11.0-rc5+ #2944
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
       01/01/2011
       Call Trace:
        dump_stack+0x172/0x1c0 lib/dump_stack.c:42
        is_logbuf_locked mm/kmsan/kmsan.c:59 [inline]
        kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:938
        native_save_fl arch/x86/include/asm/irqflags.h:18 [inline]
        arch_local_save_flags arch/x86/include/asm/irqflags.h:72 [inline]
        arch_local_irq_save arch/x86/include/asm/irqflags.h:113 [inline]
        __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:467
        sctp_v6_cmp_addr+0x8d4/0x9f0 net/sctp/ipv6.c:517
        sctp_v6_get_dst+0x8c7/0x1630 net/sctp/ipv6.c:290
        sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
        sctp_assoc_add_peer+0x66d/0x16f0 net/sctp/associola.c:651
        sctp_sendmsg+0x35a5/0x4f90 net/sctp/socket.c:1871
        inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
        sock_sendmsg_nosec net/socket.c:633 [inline]
        sock_sendmsg net/socket.c:643 [inline]
        SYSC_sendto+0x608/0x710 net/socket.c:1696
        SyS_sendto+0x8a/0xb0 net/socket.c:1664
        entry_SYSCALL_64_fastpath+0x13/0x94
       RIP: 0033:0x44b479
       RSP: 002b:00007f6213f21c08 EFLAGS: 00000286 ORIG_RAX: 000000000000002c
       RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 000000000044b479
       RDX: 0000000000000041 RSI: 0000000020edd000 RDI: 0000000000000006
       RBP: 00000000007080a8 R08: 0000000020b85fe4 R09: 000000000000001c
       R10: 0000000000040005 R11: 0000000000000286 R12: 00000000ffffffff
       R13: 0000000000003760 R14: 00000000006e5820 R15: 0000000000ff8000
       origin description: ----dst_saddr@sctp_v6_get_dst
       local variable created at:
        sk_fullsock include/net/sock.h:2321 [inline]
        inet6_sk include/linux/ipv6.h:309 [inline]
        sctp_v6_get_dst+0x91/0x1630 net/sctp/ipv6.c:241
        sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
      ==================================================================
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15339e44
    • E
      tipc: fix use-after-free · 5bfd37b4
      Eric Dumazet 提交于
      syszkaller reported use-after-free in tipc [1]
      
      When msg->rep skb is freed, set the pointer to NULL,
      so that caller does not free it again.
      
      [1]
      
      ==================================================================
      BUG: KASAN: use-after-free in skb_push+0xd4/0xe0 net/core/skbuff.c:1466
      Read of size 8 at addr ffff8801c6e71e90 by task syz-executor5/4115
      
      CPU: 1 PID: 4115 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #32
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       print_address_description+0x73/0x250 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x24e/0x340 mm/kasan/report.c:409
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
       skb_push+0xd4/0xe0 net/core/skbuff.c:1466
       tipc_nl_compat_recv+0x833/0x18f0 net/tipc/netlink_compat.c:1209
       genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
       genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
       netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
       netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
       netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
       netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       sock_write_iter+0x31a/0x5d0 net/socket.c:898
       call_write_iter include/linux/fs.h:1743 [inline]
       new_sync_write fs/read_write.c:457 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:470
       vfs_write+0x189/0x510 fs/read_write.c:518
       SYSC_write fs/read_write.c:565 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:557
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4512e9
      RSP: 002b:00007f3bc8184c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004512e9
      RDX: 0000000000000020 RSI: 0000000020fdb000 RDI: 0000000000000006
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b5e76
      R13: 00007f3bc8184b48 R14: 00000000004b5e86 R15: 0000000000000000
      
      Allocated by task 4115:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
       kmem_cache_alloc_node+0x13d/0x750 mm/slab.c:3651
       __alloc_skb+0xf1/0x740 net/core/skbuff.c:219
       alloc_skb include/linux/skbuff.h:903 [inline]
       tipc_tlv_alloc+0x26/0xb0 net/tipc/netlink_compat.c:148
       tipc_nl_compat_dumpit+0xf2/0x3c0 net/tipc/netlink_compat.c:248
       tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
       tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
       genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
       genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
       netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
       netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
       netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
       netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       sock_write_iter+0x31a/0x5d0 net/socket.c:898
       call_write_iter include/linux/fs.h:1743 [inline]
       new_sync_write fs/read_write.c:457 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:470
       vfs_write+0x189/0x510 fs/read_write.c:518
       SYSC_write fs/read_write.c:565 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:557
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Freed by task 4115:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
       __cache_free mm/slab.c:3503 [inline]
       kmem_cache_free+0x77/0x280 mm/slab.c:3763
       kfree_skbmem+0x1a1/0x1d0 net/core/skbuff.c:622
       __kfree_skb net/core/skbuff.c:682 [inline]
       kfree_skb+0x165/0x4c0 net/core/skbuff.c:699
       tipc_nl_compat_dumpit+0x36a/0x3c0 net/tipc/netlink_compat.c:260
       tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
       tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
       genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
       genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
       netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
       netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
       netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
       netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       sock_write_iter+0x31a/0x5d0 net/socket.c:898
       call_write_iter include/linux/fs.h:1743 [inline]
       new_sync_write fs/read_write.c:457 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:470
       vfs_write+0x189/0x510 fs/read_write.c:518
       SYSC_write fs/read_write.c:565 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:557
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      The buggy address belongs to the object at ffff8801c6e71dc0
       which belongs to the cache skbuff_head_cache of size 224
      The buggy address is located 208 bytes inside of
       224-byte region [ffff8801c6e71dc0, ffff8801c6e71ea0)
      The buggy address belongs to the page:
      page:ffffea00071b9c40 count:1 mapcount:0 mapping:ffff8801c6e71000 index:0x0
      flags: 0x200000000000100(slab)
      raw: 0200000000000100 ffff8801c6e71000 0000000000000000 000000010000000c
      raw: ffffea0007224a20 ffff8801d98caf48 ffff8801d9e79040 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801c6e71d80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
       ffff8801c6e71e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      >ffff8801c6e71e80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
                               ^
       ffff8801c6e71f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8801c6e71f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      ==================================================================
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov  <dvyukov@google.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Cc: Ying Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5bfd37b4
    • E
      tun: handle register_netdevice() failures properly · ff244c6b
      Eric Dumazet 提交于
      syzkaller reported a double free [1], caused by the fact
      that tun driver was not updated properly when priv_destructor
      was added.
      
      When/if register_netdevice() fails, priv_destructor() must have been
      called already.
      
      [1]
      BUG: KASAN: double-free or invalid-free in selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023
      
      CPU: 0 PID: 2919 Comm: syzkaller227220 Not tainted 4.13.0-rc4+ #23
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       print_address_description+0x7f/0x260 mm/kasan/report.c:252
       kasan_report_double_free+0x55/0x80 mm/kasan/report.c:333
       kasan_slab_free+0xa0/0xc0 mm/kasan/kasan.c:514
       __cache_free mm/slab.c:3503 [inline]
       kfree+0xd3/0x260 mm/slab.c:3820
       selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023
       security_tun_dev_free_security+0x48/0x80 security/security.c:1512
       tun_set_iff drivers/net/tun.c:1884 [inline]
       __tun_chr_ioctl+0x2ce6/0x3d50 drivers/net/tun.c:2064
       tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309
       vfs_ioctl fs/ioctl.c:45 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
       SYSC_ioctl fs/ioctl.c:700 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x443ff9
      RSP: 002b:00007ffc34271f68 EFLAGS: 00000217 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000443ff9
      RDX: 0000000020533000 RSI: 00000000400454ca RDI: 0000000000000003
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000401ce0
      R13: 0000000000401d70 R14: 0000000000000000 R15: 0000000000000000
      
      Allocated by task 2919:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:551
       kmem_cache_alloc_trace+0x101/0x6f0 mm/slab.c:3627
       kmalloc include/linux/slab.h:493 [inline]
       kzalloc include/linux/slab.h:666 [inline]
       selinux_tun_dev_alloc_security+0x49/0x170 security/selinux/hooks.c:5012
       security_tun_dev_alloc_security+0x6d/0xa0 security/security.c:1506
       tun_set_iff drivers/net/tun.c:1839 [inline]
       __tun_chr_ioctl+0x1730/0x3d50 drivers/net/tun.c:2064
       tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309
       vfs_ioctl fs/ioctl.c:45 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
       SYSC_ioctl fs/ioctl.c:700 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Freed by task 2919:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_slab_free+0x6e/0xc0 mm/kasan/kasan.c:524
       __cache_free mm/slab.c:3503 [inline]
       kfree+0xd3/0x260 mm/slab.c:3820
       selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023
       security_tun_dev_free_security+0x48/0x80 security/security.c:1512
       tun_free_netdev+0x13b/0x1b0 drivers/net/tun.c:1563
       register_netdevice+0x8d0/0xee0 net/core/dev.c:7605
       tun_set_iff drivers/net/tun.c:1859 [inline]
       __tun_chr_ioctl+0x1caf/0x3d50 drivers/net/tun.c:2064
       tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309
       vfs_ioctl fs/ioctl.c:45 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
       SYSC_ioctl fs/ioctl.c:700 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      The buggy address belongs to the object at ffff8801d2843b40
       which belongs to the cache kmalloc-32 of size 32
      The buggy address is located 0 bytes inside of
       32-byte region [ffff8801d2843b40, ffff8801d2843b60)
      The buggy address belongs to the page:
      page:ffffea000660cea8 count:1 mapcount:0 mapping:ffff8801d2843000 index:0xffff8801d2843fc1
      flags: 0x200000000000100(slab)
      raw: 0200000000000100 ffff8801d2843000 ffff8801d2843fc1 000000010000003f
      raw: ffffea0006626a40 ffffea00066141a0 ffff8801dbc00100
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801d2843a00: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
       ffff8801d2843a80: 00 00 00 fc fc fc fc fc fb fb fb fb fc fc fc fc
      >ffff8801d2843b00: 00 00 00 00 fc fc fc fc fb fb fb fb fc fc fc fc
                                                 ^
       ffff8801d2843b80: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
       ffff8801d2843c00: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
      
      ==================================================================
      
      Fixes: cf124db5 ("net: Fix inconsistent teardown and release of private netdev state.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff244c6b
    • K
      mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes · c715b72c
      Kees Cook 提交于
      Moving the x86_64 and arm64 PIE base from 0x555555554000 to 0x000100000000
      broke AddressSanitizer.  This is a partial revert of:
      
        eab09532 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
        02445990 ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB")
      
      The AddressSanitizer tool has hard-coded expectations about where
      executable mappings are loaded.
      
      The motivation for changing the PIE base in the above commits was to
      avoid the Stack-Clash CVEs that allowed executable mappings to get too
      close to heap and stack.  This was mainly a problem on 32-bit, but the
      64-bit bases were moved too, in an effort to proactively protect those
      systems (proofs of concept do exist that show 64-bit collisions, but
      other recent changes to fix stack accounting and setuid behaviors will
      minimize the impact).
      
      The new 32-bit PIE base is fine for ASan (since it matches the ET_EXEC
      base), so only the 64-bit PIE base needs to be reverted to let x86 and
      arm64 ASan binaries run again.  Future changes to the 64-bit PIE base on
      these architectures can be made optional once a more dynamic method for
      dealing with AddressSanitizer is found.  (e.g.  always loading PIE into
      the mmap region for marked binaries.)
      
      Link: http://lkml.kernel.org/r/20170807201542.GA21271@beast
      Fixes: eab09532 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
      Fixes: 02445990 ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKostya Serebryany <kcc@google.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c715b72c
    • L
      mm/vmalloc.c: don't unconditonally use __GFP_HIGHMEM · 704b862f
      Laura Abbott 提交于
      Commit 19809c2d ("mm, vmalloc: use __GFP_HIGHMEM implicitly") added
      use of __GFP_HIGHMEM for allocations.  vmalloc_32 may use
      GFP_DMA/GFP_DMA32 which does not play nice with __GFP_HIGHMEM and will
      trigger a BUG in gfp_zone.
      
      Only add __GFP_HIGHMEM if we aren't using GFP_DMA/GFP_DMA32.
      
      Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1482249
      Link: http://lkml.kernel.org/r/20170816220705.31374-1-labbott@redhat.com
      Fixes: 19809c2d ("mm, vmalloc: use __GFP_HIGHMEM implicitly")
      Signed-off-by: NLaura Abbott <labbott@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      704b862f
    • Z
      mm/mempolicy: fix use after free when calling get_mempolicy · 73223e4e
      zhong jiang 提交于
      I hit a use after free issue when executing trinity and repoduced it
      with KASAN enabled.  The related call trace is as follows.
      
        BUG: KASan: use after free in SyS_get_mempolicy+0x3c8/0x960 at addr ffff8801f582d766
        Read of size 2 by task syz-executor1/798
      
        INFO: Allocated in mpol_new.part.2+0x74/0x160 age=3 cpu=1 pid=799
           __slab_alloc+0x768/0x970
           kmem_cache_alloc+0x2e7/0x450
           mpol_new.part.2+0x74/0x160
           mpol_new+0x66/0x80
           SyS_mbind+0x267/0x9f0
           system_call_fastpath+0x16/0x1b
        INFO: Freed in __mpol_put+0x2b/0x40 age=4 cpu=1 pid=799
           __slab_free+0x495/0x8e0
           kmem_cache_free+0x2f3/0x4c0
           __mpol_put+0x2b/0x40
           SyS_mbind+0x383/0x9f0
           system_call_fastpath+0x16/0x1b
        INFO: Slab 0xffffea0009cb8dc0 objects=23 used=8 fp=0xffff8801f582de40 flags=0x200000000004080
        INFO: Object 0xffff8801f582d760 @offset=5984 fp=0xffff8801f582d600
      
        Bytes b4 ffff8801f582d750: ae 01 ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
        Object ffff8801f582d760: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
        Object ffff8801f582d770: 6b 6b 6b 6b 6b 6b 6b a5                          kkkkkkk.
        Redzone ffff8801f582d778: bb bb bb bb bb bb bb bb                          ........
        Padding ffff8801f582d8b8: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
        Memory state around the buggy address:
        ffff8801f582d600: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffff8801f582d680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        >ffff8801f582d700: fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb fc
      
      !shared memory policy is not protected against parallel removal by other
      thread which is normally protected by the mmap_sem.  do_get_mempolicy,
      however, drops the lock midway while we can still access it later.
      
      Early premature up_read is a historical artifact from times when
      put_user was called in this path see https://lwn.net/Articles/124754/
      but that is gone since 8bccd85f ("[PATCH] Implement sys_* do_*
      layering in the memory policy layer.").  but when we have the the
      current mempolicy ref count model.  The issue was introduced
      accordingly.
      
      Fix the issue by removing the premature release.
      
      Link: http://lkml.kernel.org/r/1502950924-27521-1-git-send-email-zhongjiang@huawei.comSigned-off-by: Nzhong jiang <zhongjiang@huawei.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>	[2.6+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73223e4e
    • P
      mm/cma_debug.c: fix stack corruption due to sprintf usage · da094e42
      Prakash Gupta 提交于
      name[] in cma_debugfs_add_one() can only accommodate 16 chars including
      NULL to store sprintf output.  It's common for cma device name to be
      larger than 15 chars.  This can cause stack corrpution.  If the gcc
      stack protector is turned on, this can cause a panic due to stack
      corruption.
      
      Below is one example trace:
      
        Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in:
        ffffff8e69a75730
        Call trace:
           dump_backtrace+0x0/0x2c4
           show_stack+0x20/0x28
           dump_stack+0xb8/0xf4
           panic+0x154/0x2b0
           print_tainted+0x0/0xc0
           cma_debugfs_init+0x274/0x290
           do_one_initcall+0x5c/0x168
           kernel_init_freeable+0x1c8/0x280
      
      Fix the short sprintf buffer in cma_debugfs_add_one() by using
      scnprintf() instead of sprintf().
      
      Link: http://lkml.kernel.org/r/1502446217-21840-1-git-send-email-guptap@codeaurora.org
      Fixes: f318dd08 ("cma: Store a name in the cma structure")
      Signed-off-by: NPrakash Gupta <guptap@codeaurora.org>
      Acked-by: NLaura Abbott <labbott@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      da094e42
    • J
      signal: don't remove SIGNAL_UNKILLABLE for traced tasks. · eb61b591
      Jamie Iles 提交于
      When forcing a signal, SIGNAL_UNKILLABLE is removed to prevent recursive
      faults, but this is undesirable when tracing.  For example, debugging an
      init process (whether global or namespace), hitting a breakpoint and
      SIGTRAP will force SIGTRAP and then remove SIGNAL_UNKILLABLE.
      Everything continues fine, but then once debugging has finished, the
      init process is left killable which is unlikely what the user expects,
      resulting in either an accidentally killed init or an init that stops
      reaping zombies.
      
      Link: http://lkml.kernel.org/r/20170815112806.10728-1-jamie.iles@oracle.comSigned-off-by: NJamie Iles <jamie.iles@oracle.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb61b591
    • M
      mm, oom: fix potential data corruption when oom_reaper races with writer · 6b31d595
      Michal Hocko 提交于
      Wenwei Tao has noticed that our current assumption that the oom victim
      is dying and never doing any visible changes after it dies, and so the
      oom_reaper can tear it down, is not entirely true.
      
      __task_will_free_mem consider a task dying when SIGNAL_GROUP_EXIT is set
      but do_group_exit sends SIGKILL to all threads _after_ the flag is set.
      So there is a race window when some threads won't have
      fatal_signal_pending while the oom_reaper could start unmapping the
      address space.  Moreover some paths might not check for fatal signals
      before each PF/g-u-p/copy_from_user.
      
      We already have a protection for oom_reaper vs.  PF races by checking
      MMF_UNSTABLE.  This has been, however, checked only for kernel threads
      (use_mm users) which can outlive the oom victim.  A simple fix would be
      to extend the current check in handle_mm_fault for all tasks but that
      wouldn't be sufficient because the current check assumes that a kernel
      thread would bail out after EFAULT from get_user*/copy_from_user and
      never re-read the same address which would succeed because the PF path
      has established page tables already.  This seems to be the case for the
      only existing use_mm user currently (virtio driver) but it is rather
      fragile in general.
      
      This is even more fragile in general for more complex paths such as
      generic_perform_write which can re-read the same address more times
      (e.g.  iov_iter_copy_from_user_atomic to fail and then
      iov_iter_fault_in_readable on retry).
      
      Therefore we have to implement MMF_UNSTABLE protection in a robust way
      and never make a potentially corrupted content visible.  That requires
      to hook deeper into the PF path and check for the flag _every time_
      before a pte for anonymous memory is established (that means all
      !VM_SHARED mappings).
      
      The corruption can be triggered artificially
      (http://lkml.kernel.org/r/201708040646.v746kkhC024636@www262.sakura.ne.jp)
      but there doesn't seem to be any real life bug report.  The race window
      should be quite tight to trigger most of the time.
      
      Link: http://lkml.kernel.org/r/20170807113839.16695-3-mhocko@kernel.org
      Fixes: aac45363 ("mm, oom: introduce oom reaper")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NWenwei Tao <wenwei.tww@alibaba-inc.com>
      Tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andrea Argangeli <andrea@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b31d595
    • M
      mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS · 5b53a6ea
      Michal Hocko 提交于
      Tetsuo Handa has noticed that MMF_UNSTABLE SIGBUS path in
      handle_mm_fault causes a lockdep splat
      
        Out of memory: Kill process 1056 (a.out) score 603 or sacrifice child
        Killed process 1056 (a.out) total-vm:4268108kB, anon-rss:2246048kB, file-rss:0kB, shmem-rss:0kB
        a.out (1169) used greatest stack depth: 11664 bytes left
        DEBUG_LOCKS_WARN_ON(depth <= 0)
        ------------[ cut here ]------------
        WARNING: CPU: 6 PID: 1339 at kernel/locking/lockdep.c:3617 lock_release+0x172/0x1e0
        CPU: 6 PID: 1339 Comm: a.out Not tainted 4.13.0-rc3-next-20170803+ #142
        Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
        RIP: 0010:lock_release+0x172/0x1e0
        Call Trace:
           up_read+0x1a/0x40
           __do_page_fault+0x28e/0x4c0
           do_page_fault+0x30/0x80
           page_fault+0x28/0x30
      
      The reason is that the page fault path might have dropped the mmap_sem
      and returned with VM_FAULT_RETRY.  MMF_UNSTABLE check however rewrites
      the error path to VM_FAULT_SIGBUS and we always expect mmap_sem taken in
      that path.  Fix this by taking mmap_sem when VM_FAULT_RETRY is held in
      the MMF_UNSTABLE path.
      
      We cannot simply add VM_FAULT_SIGBUS to the existing error code because
      all arch specific page fault handlers and g-u-p would have to learn a
      new error code combination.
      
      Link: http://lkml.kernel.org/r/20170807113839.16695-2-mhocko@kernel.org
      Fixes: 3f70dc38 ("mm: make sure that kthreads will not refault oom reaped memory")
      Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andrea Argangeli <andrea@kernel.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Wenwei Tao <wenwei.tww@alibaba-inc.com>
      Cc: <stable@vger.kernel.org>	[4.9+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b53a6ea