1. 23 11月, 2016 3 次提交
    • E
      ptrace: Don't allow accessing an undumpable mm · 84d77d3f
      Eric W. Biederman 提交于
      It is the reasonable expectation that if an executable file is not
      readable there will be no way for a user without special privileges to
      read the file.  This is enforced in ptrace_attach but if ptrace
      is already attached before exec there is no enforcement for read-only
      executables.
      
      As the only way to read such an mm is through access_process_vm
      spin a variant called ptrace_access_vm that will fail if the
      target process is not being ptraced by the current process, or
      the current process did not have sufficient privileges when ptracing
      began to read the target processes mm.
      
      In the ptrace implementations replace access_process_vm by
      ptrace_access_vm.  There remain several ptrace sites that still use
      access_process_vm as they are reading the target executables
      instructions (for kernel consumption) or register stacks.  As such it
      does not appear necessary to add a permission check to those calls.
      
      This bug has always existed in Linux.
      
      Fixes: v1.0
      Cc: stable@vger.kernel.org
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      84d77d3f
    • E
      ptrace: Capture the ptracer's creds not PT_PTRACE_CAP · 64b875f7
      Eric W. Biederman 提交于
      When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
      overlooked.  This can result in incorrect behavior when an application
      like strace traces an exec of a setuid executable.
      
      Further PT_PTRACE_CAP does not have enough information for making good
      security decisions as it does not report which user namespace the
      capability is in.  This has already allowed one mistake through
      insufficient granulariy.
      
      I found this issue when I was testing another corner case of exec and
      discovered that I could not get strace to set PT_PTRACE_CAP even when
      running strace as root with a full set of caps.
      
      This change fixes the above issue with strace allowing stracing as
      root a setuid executable without disabling setuid.  More fundamentaly
      this change allows what is allowable at all times, by using the correct
      information in it's decision.
      
      Cc: stable@vger.kernel.org
      Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      64b875f7
    • E
      mm: Add a user_ns owner to mm_struct and fix ptrace permission checks · bfedb589
      Eric W. Biederman 提交于
      During exec dumpable is cleared if the file that is being executed is
      not readable by the user executing the file.  A bug in
      ptrace_may_access allows reading the file if the executable happens to
      enter into a subordinate user namespace (aka clone(CLONE_NEWUSER),
      unshare(CLONE_NEWUSER), or setns(fd, CLONE_NEWUSER).
      
      This problem is fixed with only necessary userspace breakage by adding
      a user namespace owner to mm_struct, captured at the time of exec, so
      it is clear in which user namespace CAP_SYS_PTRACE must be present in
      to be able to safely give read permission to the executable.
      
      The function ptrace_may_access is modified to verify that the ptracer
      has CAP_SYS_ADMIN in task->mm->user_ns instead of task->cred->user_ns.
      This ensures that if the task changes it's cred into a subordinate
      user namespace it does not become ptraceable.
      
      The function ptrace_attach is modified to only set PT_PTRACE_CAP when
      CAP_SYS_PTRACE is held over task->mm->user_ns.  The intent of
      PT_PTRACE_CAP is to be a flag to note that whatever permission changes
      the task might go through the tracer has sufficient permissions for
      it not to be an issue.  task->cred->user_ns is always the same
      as or descendent of mm->user_ns.  Which guarantees that having
      CAP_SYS_PTRACE over mm->user_ns is the worst case for the tasks
      credentials.
      
      To prevent regressions mm->dumpable and mm->user_ns are not considered
      when a task has no mm.  As simply failing ptrace_may_attach causes
      regressions in privileged applications attempting to read things
      such as /proc/<pid>/stat
      
      Cc: stable@vger.kernel.org
      Acked-by: NKees Cook <keescook@chromium.org>
      Tested-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Fixes: 8409cca7 ("userns: allow ptrace from non-init user namespaces")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      bfedb589
  2. 20 11月, 2016 1 次提交
  3. 18 11月, 2016 1 次提交
    • A
      mremap: fix race between mremap() and page cleanning · 5d190420
      Aaron Lu 提交于
      Prior to 3.15, there was a race between zap_pte_range() and
      page_mkclean() where writes to a page could be lost.  Dave Hansen
      discovered by inspection that there is a similar race between
      move_ptes() and page_mkclean().
      
      We've been able to reproduce the issue by enlarging the race window with
      a msleep(), but have not been able to hit it without modifying the code.
      So, we think it's a real issue, but is difficult or impossible to hit in
      practice.
      
      The zap_pte_range() issue is fixed by commit 1cf35d47("mm: split
      'tlb_flush_mmu()' into tlb flushing and memory freeing parts").  And
      this patch is to fix the race between page_mkclean() and mremap().
      
      Here is one possible way to hit the race: suppose a process mmapped a
      file with READ | WRITE and SHARED, it has two threads and they are bound
      to 2 different CPUs, e.g.  CPU1 and CPU2.  mmap returned X, then thread
      1 did a write to addr X so that CPU1 now has a writable TLB for addr X
      on it.  Thread 2 starts mremaping from addr X to Y while thread 1
      cleaned the page and then did another write to the old addr X again.
      The 2nd write from thread 1 could succeed but the value will get lost.
      
              thread 1                           thread 2
           (bound to CPU1)                    (bound to CPU2)
      
        1: write 1 to addr X to get a
           writeable TLB on this CPU
      
                                              2: mremap starts
      
                                              3: move_ptes emptied PTE for addr X
                                                 and setup new PTE for addr Y and
                                                 then dropped PTL for X and Y
      
        4: page laundering for N by doing
           fadvise FADV_DONTNEED. When done,
           pageframe N is deemed clean.
      
        5: *write 2 to addr X
      
                                              6: tlb flush for addr X
      
        7: munmap (Y, pagesize) to make the
           page unmapped
      
        8: fadvise with FADV_DONTNEED again
           to kick the page off the pagecache
      
        9: pread the page from file to verify
           the value. If 1 is there, it means
           we have lost the written 2.
      
        *the write may or may not cause segmentation fault, it depends on
        if the TLB is still on the CPU.
      
      Please note that this is only one specific way of how the race could
      occur, it didn't mean that the race could only occur in exact the above
      config, e.g. more than 2 threads could be involved and fadvise() could
      be done in another thread, etc.
      
      For anonymous pages, they could race between mremap() and page reclaim:
      THP: a huge PMD is moved by mremap to a new huge PMD, then the new huge
      PMD gets unmapped/splitted/pagedout before the flush tlb happened for
      the old huge PMD in move_page_tables() and we could still write data to
      it.  The normal anonymous page has similar situation.
      
      To fix this, check for any dirty PTE in move_ptes()/move_huge_pmd() and
      if any, did the flush before dropping the PTL.  If we did the flush for
      every move_ptes()/move_huge_pmd() call then we do not need to do the
      flush in move_pages_tables() for the whole range.  But if we didn't, we
      still need to do the whole range flush.
      
      Alternatively, we can track which part of the range is flushed in
      move_ptes()/move_huge_pmd() and which didn't to avoid flushing the whole
      range in move_page_tables().  But that would require multiple tlb
      flushes for the different sub-ranges and should be less efficient than
      the single whole range flush.
      
      KBuild test on my Sandybridge desktop doesn't show any noticeable change.
      v4.9-rc4:
        real    5m14.048s
        user    32m19.800s
        sys     4m50.320s
      
      With this commit:
        real    5m13.888s
        user    32m19.330s
        sys     4m51.200s
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NAaron Lu <aaron.lu@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5d190420
  4. 17 11月, 2016 1 次提交
  5. 15 11月, 2016 1 次提交
  6. 14 11月, 2016 3 次提交
    • S
      sunrpc: svc_age_temp_xprts_now should not call setsockopt non-tcp transports · ea08e392
      Scott Mayhew 提交于
      This fixes the following panic that can occur with NFSoRDMA.
      
      general protection fault: 0000 [#1] SMP
      Modules linked in: rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi
      scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp
      scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
      mlx5_ib ib_core intel_powerclamp coretemp kvm_intel kvm sg ioatdma
      ipmi_devintf ipmi_ssif dcdbas iTCO_wdt iTCO_vendor_support pcspkr
      irqbypass sb_edac shpchp dca crc32_pclmul ghash_clmulni_intel edac_core
      lpc_ich aesni_intel lrw gf128mul glue_helper ablk_helper mei_me mei
      ipmi_si cryptd wmi ipmi_msghandler acpi_pad acpi_power_meter nfsd
      auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod
      crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper
      syscopyarea sysfillrect sysimgblt ahci fb_sys_fops ttm libahci mlx5_core
      tg3 crct10dif_pclmul drm crct10dif_common
      ptp i2c_core libata crc32c_intel pps_core fjes dm_mirror dm_region_hash
      dm_log dm_mod
      CPU: 1 PID: 120 Comm: kworker/1:1 Not tainted 3.10.0-514.el7.x86_64 #1
      Hardware name: Dell Inc. PowerEdge R320/0KM5PX, BIOS 2.4.2 01/29/2015
      Workqueue: events check_lifetime
      task: ffff88031f506dd0 ti: ffff88031f584000 task.ti: ffff88031f584000
      RIP: 0010:[<ffffffff8168d847>]  [<ffffffff8168d847>]
      _raw_spin_lock_bh+0x17/0x50
      RSP: 0018:ffff88031f587ba8  EFLAGS: 00010206
      RAX: 0000000000020000 RBX: 20041fac02080072 RCX: ffff88031f587fd8
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 20041fac02080072
      RBP: ffff88031f587bb0 R08: 0000000000000008 R09: ffffffff8155be77
      R10: ffff880322a59b00 R11: ffffea000bf39f00 R12: 20041fac02080072
      R13: 000000000000000d R14: ffff8800c4fbd800 R15: 0000000000000001
      FS:  0000000000000000(0000) GS:ffff880322a40000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3c52d4547e CR3: 00000000019ba000 CR4: 00000000001407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
      20041fac02080002 ffff88031f587bd0 ffffffff81557830 20041fac02080002
      ffff88031f587c78 ffff88031f587c40 ffffffff8155ae08 000000010157df32
      0000000800000001 ffff88031f587c20 ffffffff81096acb ffffffff81aa37d0
      Call Trace:
      [<ffffffff81557830>] lock_sock_nested+0x20/0x50
      [<ffffffff8155ae08>] sock_setsockopt+0x78/0x940
      [<ffffffff81096acb>] ? lock_timer_base.isra.33+0x2b/0x50
      [<ffffffff8155397d>] kernel_setsockopt+0x4d/0x50
      [<ffffffffa0386284>] svc_age_temp_xprts_now+0x174/0x1e0 [sunrpc]
      [<ffffffffa03b681d>] nfsd_inetaddr_event+0x9d/0xd0 [nfsd]
      [<ffffffff81691ebc>] notifier_call_chain+0x4c/0x70
      [<ffffffff810b687d>] __blocking_notifier_call_chain+0x4d/0x70
      [<ffffffff810b68b6>] blocking_notifier_call_chain+0x16/0x20
      [<ffffffff815e8538>] __inet_del_ifa+0x168/0x2d0
      [<ffffffff815e8cef>] check_lifetime+0x25f/0x270
      [<ffffffff810a7f3b>] process_one_work+0x17b/0x470
      [<ffffffff810a8d76>] worker_thread+0x126/0x410
      [<ffffffff810a8c50>] ? rescuer_thread+0x460/0x460
      [<ffffffff810b052f>] kthread+0xcf/0xe0
      [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
      [<ffffffff81696418>] ret_from_fork+0x58/0x90
      [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
      Code: ca 75 f1 5d c3 0f 1f 80 00 00 00 00 eb d9 66 0f 1f 44 00 00 0f 1f
      44 00 00 55 48 89 e5 53 48 89 fb e8 7e 04 a0 ff b8 00 00 02 00 <f0> 0f
      c1 03 89 c2 c1 ea 10 66 39 c2 75 03 5b 5d c3 83 e2 fe 0f
      RIP  [<ffffffff8168d847>] _raw_spin_lock_bh+0x17/0x50
      RSP <ffff88031f587ba8>
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Fixes: c3d4879e ("sunrpc: Add a function to close temporary transports immediately")
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ea08e392
    • M
      Revert "include/uapi/linux/atm_zatm.h: include linux/time.h" · 7b5b74ef
      Mike Frysinger 提交于
      This reverts commit cf00713a ("include/uapi/linux/atm_zatm.h: include
      linux/time.h").
      
      This attempted to fix userspace breakage that no longer existed when
      the patch was merged.  Almost one year earlier, commit 70ba07b6
      ("atm: remove 'struct zatm_t_hist'") deleted the struct in question.
      
      After this patch was merged, we now have to deal with people being
      unable to include this header in conjunction with standard C library
      headers like stdlib.h (which linux-atm does).  Example breakage:
      x86_64-pc-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../.. -I./../q2931 -I./../saal \
      	-I.  -DCPPFLAGS_TEST  -I../../src/include -O2 -march=native -pipe -g \
      	-frecord-gcc-switches -freport-bug -Wimplicit-function-declaration \
      	-Wnonnull -Wstrict-aliasing -Wparentheses -Warray-bounds \
      	-Wfree-nonheap-object -Wreturn-local-addr -fno-strict-aliasing -Wall \
      	-Wshadow -Wpointer-arith -Wwrite-strings -Wstrict-prototypes -c zntune.c
      In file included from /usr/include/linux/atm_zatm.h:17:0,
                       from zntune.c:17:
      /usr/include/linux/time.h:9:8: error: redefinition of ‘struct timespec’
       struct timespec {
              ^
      In file included from /usr/include/sys/select.h:43:0,
                       from /usr/include/sys/types.h:219,
                       from /usr/include/stdlib.h:314,
                       from zntune.c:9:
      /usr/include/time.h:120:8: note: originally defined here
       struct timespec
              ^
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Acked-by: NMikko Rapeli <mikko.rapeli@iki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b5b74ef
    • E
      tcp: take care of truncations done by sk_filter() · ac6e7800
      Eric Dumazet 提交于
      With syzkaller help, Marco Grassi found a bug in TCP stack,
      crashing in tcp_collapse()
      
      Root cause is that sk_filter() can truncate the incoming skb,
      but TCP stack was not really expecting this to happen.
      It probably was expecting a simple DROP or ACCEPT behavior.
      
      We first need to make sure no part of TCP header could be removed.
      Then we need to adjust TCP_SKB_CB(skb)->end_seq
      
      Many thanks to syzkaller team and Marco for giving us a reproducer.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NMarco Grassi <marco.gra@gmail.com>
      Reported-by: NVladis Dronov <vdronov@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac6e7800
  7. 13 11月, 2016 2 次提交
    • B
      net: bpqether.h: remove if_ether.h guard · 10b21768
      Baruch Siach 提交于
      __LINUX_IF_ETHER_H is not defined anywhere, and if_ether.h can keep itself from
      double inclusion, though it uses a single underscore prefix.
      Signed-off-by: NBaruch Siach <baruch@tkos.co.il>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10b21768
    • M
      bpf: Fix bpf_redirect to an ipip/ip6tnl dev · 4e3264d2
      Martin KaFai Lau 提交于
      If the bpf program calls bpf_redirect(dev, 0) and dev is
      an ipip/ip6tnl, it currently includes the mac header.
      e.g. If dev is ipip, the end result is IP-EthHdr-IP instead
      of IP-IP.
      
      The fix is to pull the mac header.  At ingress, skb_postpull_rcsum()
      is not needed because the ethhdr should have been pulled once already
      and then got pushed back just before calling the bpf_prog.
      At egress, this patch calls skb_postpull_rcsum().
      
      If bpf_redirect(dev, BPF_F_INGRESS) is called,
      it also fails now because it calls dev_forward_skb() which
      eventually calls eth_type_trans(skb, dev).  The eth_type_trans()
      will set skb->type = PACKET_OTHERHOST because the mac address
      does not match the redirecting dev->dev_addr.  The PACKET_OTHERHOST
      will eventually cause the ip_rcv() errors out.  To fix this,
      ____dev_forward_skb() is added.
      
      Joint work with Daniel Borkmann.
      
      Fixes: cfc7381b ("ip_tunnel: add collect_md mode to IPIP tunnel")
      Fixes: 8d79266b ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e3264d2
  8. 12 11月, 2016 3 次提交
    • J
      mm: kmemleak: scan .data.ro_after_init · d7c19b06
      Jakub Kicinski 提交于
      Limit the number of kmemleak false positives by including
      .data.ro_after_init in memory scanning.  To achieve this we need to add
      symbols for start and end of the section to the linker scripts.
      
      The problem was been uncovered by commit 56989f6d ("genetlink: mark
      families as __ro_after_init").
      
      Link: http://lkml.kernel.org/r/1478274173-15218-1-git-send-email-jakub.kicinski@netronome.comReviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7c19b06
    • H
      Revert "console: don't prefer first registered if DT specifies stdout-path" · c6c7d83b
      Hans de Goede 提交于
      This reverts commit 05fd007e ("console: don't prefer first
      registered if DT specifies stdout-path").
      
      The reverted commit changes existing behavior on which many ARM boards
      rely.  Many ARM small-board-computers, like e.g.  the Raspberry Pi have
      both a video output and a serial console.  Depending on whether the user
      is using the device as a more regular computer; or as a headless device
      we need to have the console on either one or the other.
      
      Many users rely on the kernel behavior of the console being present on
      both outputs, before the reverted commit the console setup with no
      console= kernel arguments on an ARM board which sets stdout-path in dt
      would look like this:
      
        [root@localhost ~]# cat /proc/consoles
        ttyS0                -W- (EC p a)    4:64
        tty0                 -WU (E  p  )    4:1
      
      Where as after the reverted commit, it looks like this:
      
        [root@localhost ~]# cat /proc/consoles
        ttyS0                -W- (EC p a)    4:64
      
      This commit reverts commit 05fd007e ("console: don't prefer first
      registered if DT specifies stdout-path") restoring the original
      behavior.
      
      Fixes: 05fd007e ("console: don't prefer first registered if DT specifies stdout-path")
      Link: http://lkml.kernel.org/r/20161104121135.4780-2-hdegoede@redhat.comSigned-off-by: NHans de Goede <hdegoede@redhat.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Frank Rowand <frowand.list@gmail.com>
      Cc: Thorsten Leemhuis <regressions@leemhuis.info>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c6c7d83b
    • V
      mm, frontswap: make sure allocated frontswap map is assigned · 5e322bee
      Vlastimil Babka 提交于
      Christian Borntraeger reports:
      
      With commit 8ea1d2a1 ("mm, frontswap: convert frontswap_enabled to
      static key") kmemleak complains about a memory leak in swapon
      
          unreferenced object 0x3e09ba56000 (size 32112640):
            comm "swapon", pid 7852, jiffies 4294968787 (age 1490.770s)
            hex dump (first 32 bytes):
              00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
              00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            backtrace:
               __vmalloc_node_range+0x194/0x2d8
               vzalloc+0x58/0x68
               SyS_swapon+0xd60/0x12f8
               system_call+0xd6/0x270
      
      Turns out kmemleak is right.  We now allocate the frontswap map
      depending on the kernel config (and no longer on the enablement)
      
        swapfile.c:
        [...]
            if (IS_ENABLED(CONFIG_FRONTSWAP))
                      frontswap_map = vzalloc(BITS_TO_LONGS(maxpages) * sizeof(long));
      
      but later on this is passed along
        --> enable_swap_info(p, prio, swap_map, cluster_info, frontswap_map);
      
      and ignored if frontswap is disabled
        --> frontswap_init(p->type, frontswap_map);
      
        static inline void frontswap_init(unsigned type, unsigned long *map)
        {
              if (frontswap_enabled())
                      __frontswap_init(type, map);
        }
      
      Thing is, that frontswap map is never freed.
      
      The leakage is relatively not that bad, because swapon is an infrequent
      and privileged operation.  However, if the first frontswap backend is
      registered after a swap type has been already enabled, it will WARN_ON
      in frontswap_register_ops() and frontswap will not be available for the
      swap type.
      
      Fix this by making sure the map is assigned by frontswap_init() as long
      as CONFIG_FRONTSWAP is enabled.
      
      Fixes: 8ea1d2a1 ("mm, frontswap: convert frontswap_enabled to static key")
      Link: http://lkml.kernel.org/r/20161026134220.2566-1-vbabka@suse.czSigned-off-by: NVlastimil Babka <vbabka@suse.cz>
      Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e322bee
  9. 11 11月, 2016 1 次提交
  10. 10 11月, 2016 1 次提交
  11. 08 11月, 2016 1 次提交
  12. 05 11月, 2016 1 次提交
  13. 04 11月, 2016 3 次提交
    • E
      dccp: do not release listeners too soon · c3f24cfb
      Eric Dumazet 提交于
      Andrey Konovalov reported following error while fuzzing with syzkaller :
      
      IPv4: Attempt to release alive inet socket ffff880068e98940
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 3905 Comm: a.out Not tainted 4.9.0-rc3+ #333
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff88006b9e0000 task.stack: ffff880068770000
      RIP: 0010:[<ffffffff819ead5f>]  [<ffffffff819ead5f>]
      selinux_socket_sock_rcv_skb+0xff/0x6a0 security/selinux/hooks.c:4639
      RSP: 0018:ffff8800687771c8  EFLAGS: 00010202
      RAX: ffff88006b9e0000 RBX: 1ffff1000d0eee3f RCX: 1ffff1000d1d312a
      RDX: 1ffff1000d1d31a6 RSI: dffffc0000000000 RDI: 0000000000000010
      RBP: ffff880068777360 R08: 0000000000000000 R09: 0000000000000002
      R10: dffffc0000000000 R11: 0000000000000006 R12: ffff880068e98940
      R13: 0000000000000002 R14: ffff880068777338 R15: 0000000000000000
      FS:  00007f00ff760700(0000) GS:ffff88006cd00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020008000 CR3: 000000006a308000 CR4: 00000000000006e0
      Stack:
       ffff8800687771e0 ffffffff812508a5 ffff8800686f3168 0000000000000007
       ffff88006ac8cdfc ffff8800665ea500 0000000041b58ab3 ffffffff847b5480
       ffffffff819eac60 ffff88006b9e0860 ffff88006b9e0868 ffff88006b9e07f0
      Call Trace:
       [<ffffffff819c8dd5>] security_sock_rcv_skb+0x75/0xb0 security/security.c:1317
       [<ffffffff82c2a9e7>] sk_filter_trim_cap+0x67/0x10e0 net/core/filter.c:81
       [<ffffffff82b81e60>] __sk_receive_skb+0x30/0xa00 net/core/sock.c:460
       [<ffffffff838bbf12>] dccp_v4_rcv+0xdb2/0x1910 net/dccp/ipv4.c:873
       [<ffffffff83069d22>] ip_local_deliver_finish+0x332/0xad0
      net/ipv4/ip_input.c:216
       [<     inline     >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
       [<     inline     >] NF_HOOK ./include/linux/netfilter.h:255
       [<ffffffff8306abd2>] ip_local_deliver+0x1c2/0x4b0 net/ipv4/ip_input.c:257
       [<     inline     >] dst_input ./include/net/dst.h:507
       [<ffffffff83068500>] ip_rcv_finish+0x750/0x1c40 net/ipv4/ip_input.c:396
       [<     inline     >] NF_HOOK_THRESH ./include/linux/netfilter.h:232
       [<     inline     >] NF_HOOK ./include/linux/netfilter.h:255
       [<ffffffff8306b82f>] ip_rcv+0x96f/0x12f0 net/ipv4/ip_input.c:487
       [<ffffffff82bd9fb7>] __netif_receive_skb_core+0x1897/0x2a50 net/core/dev.c:4213
       [<ffffffff82bdb19a>] __netif_receive_skb+0x2a/0x170 net/core/dev.c:4251
       [<ffffffff82bdb493>] netif_receive_skb_internal+0x1b3/0x390 net/core/dev.c:4279
       [<ffffffff82bdb6b8>] netif_receive_skb+0x48/0x250 net/core/dev.c:4303
       [<ffffffff8241fc75>] tun_get_user+0xbd5/0x28a0 drivers/net/tun.c:1308
       [<ffffffff82421b5a>] tun_chr_write_iter+0xda/0x190 drivers/net/tun.c:1332
       [<     inline     >] new_sync_write fs/read_write.c:499
       [<ffffffff8151bd44>] __vfs_write+0x334/0x570 fs/read_write.c:512
       [<ffffffff8151f85b>] vfs_write+0x17b/0x500 fs/read_write.c:560
       [<     inline     >] SYSC_write fs/read_write.c:607
       [<ffffffff81523184>] SyS_write+0xd4/0x1a0 fs/read_write.c:599
       [<ffffffff83fc02c1>] entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      It turns out DCCP calls __sk_receive_skb(), and this broke when
      lookups no longer took a reference on listeners.
      
      Fix this issue by adding a @refcounted parameter to __sk_receive_skb(),
      so that sock_put() is used only when needed.
      
      Fixes: 3b24d854 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3f24cfb
    • L
      ipv4: allow local fragmentation in ip_finish_output_gso() · 9ee6c5dc
      Lance Richardson 提交于
      Some configurations (e.g. geneve interface with default
      MTU of 1500 over an ethernet interface with 1500 MTU) result
      in the transmission of packets that exceed the configured MTU.
      While this should be considered to be a "bad" configuration,
      it is still allowed and should not result in the sending
      of packets that exceed the configured MTU.
      
      Fix by dropping the assumption in ip_finish_output_gso() that
      locally originated gso packets will never need fragmentation.
      Basic testing using iperf (observing CPU usage and bandwidth)
      have shown no measurable performance impact for traffic not
      requiring fragmentation.
      
      Fixes: c7ba65d7 ("net: ip: push gso skb forwarding handling down the stack")
      Reported-by: NJan Tluka <jtluka@redhat.com>
      Signed-off-by: NLance Richardson <lrichard@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ee6c5dc
    • D
      net: tcp: check skb is non-NULL for exact match on lookups · da96786e
      David Ahern 提交于
      Andrey reported the following error report while running the syzkaller
      fuzzer:
      
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 648 Comm: syz-executor Not tainted 4.9.0-rc3+ #333
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff8800398c4480 task.stack: ffff88003b468000
      RIP: 0010:[<ffffffff83091106>]  [<     inline     >]
      inet_exact_dif_match include/net/tcp.h:808
      RIP: 0010:[<ffffffff83091106>]  [<ffffffff83091106>]
      __inet_lookup_listener+0xb6/0x500 net/ipv4/inet_hashtables.c:219
      RSP: 0018:ffff88003b46f270  EFLAGS: 00010202
      RAX: 0000000000000004 RBX: 0000000000004242 RCX: 0000000000000001
      RDX: 0000000000000000 RSI: ffffc90000e3c000 RDI: 0000000000000054
      RBP: ffff88003b46f2d8 R08: 0000000000004000 R09: ffffffff830910e7
      R10: 0000000000000000 R11: 000000000000000a R12: ffffffff867fa0c0
      R13: 0000000000004242 R14: 0000000000000003 R15: dffffc0000000000
      FS:  00007fb135881700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020cc3000 CR3: 000000006d56a000 CR4: 00000000000006f0
      Stack:
       0000000000000000 000000000601a8c0 0000000000000000 ffffffff00004242
       424200003b9083c2 ffff88003def4041 ffffffff84e7e040 0000000000000246
       ffff88003a0911c0 0000000000000000 ffff88003a091298 ffff88003b9083ae
      Call Trace:
       [<ffffffff831100f4>] tcp_v4_send_reset+0x584/0x1700 net/ipv4/tcp_ipv4.c:643
       [<ffffffff83115b1b>] tcp_v4_rcv+0x198b/0x2e50 net/ipv4/tcp_ipv4.c:1718
       [<ffffffff83069d22>] ip_local_deliver_finish+0x332/0xad0
      net/ipv4/ip_input.c:216
      ...
      
      MD5 has a code path that calls __inet_lookup_listener with a null skb,
      so inet{6}_exact_dif_match needs to check skb against null before pulling
      the flag.
      
      Fixes: a04a480d ("net: Require exact match for TCP socket lookups if
             dif is l3mdev")
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da96786e
  14. 03 11月, 2016 1 次提交
    • E
      ip6_tunnel: Clear IP6CB in ip6tunnel_xmit() · 23f4ffed
      Eli Cooper 提交于
      skb->cb may contain data from previous layers. In the observed scenario,
      the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so
      that small packets sent through the tunnel are mistakenly fragmented.
      
      This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
      which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
      these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NEli Cooper <elicooper@gmx.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23f4ffed
  15. 01 11月, 2016 1 次提交
    • X
      sctp: hold transport instead of assoc when lookup assoc in rx path · dae399d7
      Xin Long 提交于
      Prior to this patch, in rx path, before calling lock_sock, it needed to
      hold assoc when got it by __sctp_lookup_association, in case other place
      would free/put assoc.
      
      But in __sctp_lookup_association, it lookup and hold transport, then got
      assoc by transport->assoc, then hold assoc and put transport. It means
      it didn't hold transport, yet it was returned and later on directly
      assigned to chunk->transport.
      
      Without the protection of sock lock, the transport may be freed/put by
      other places, which would cause a use-after-free issue.
      
      This patch is to fix this issue by holding transport instead of assoc.
      As holding transport can make sure to access assoc is also safe, and
      actually it looks up assoc by searching transport rhashtable, to hold
      transport here makes more sense.
      
      Note that the function will be renamed later on on another patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dae399d7
  16. 31 10月, 2016 2 次提交
  17. 30 10月, 2016 6 次提交
  18. 29 10月, 2016 1 次提交
  19. 28 10月, 2016 7 次提交
    • J
      perf/powerpc: Don't call perf_event_disable() from atomic context · 5aab90ce
      Jiri Olsa 提交于
      The trinity syscall fuzzer triggered following WARN() on powerpc:
      
        WARNING: CPU: 9 PID: 2998 at arch/powerpc/kernel/hw_breakpoint.c:278
        ...
        NIP [c00000000093aedc] .hw_breakpoint_handler+0x28c/0x2b0
        LR [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0
        Call Trace:
        [c0000002f7933580] [c00000000093aed8] .hw_breakpoint_handler+0x288/0x2b0 (unreliable)
        [c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0
        [c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0
        [c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0
        [c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100
        [c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48
      
      Followed by a lockdep warning:
      
        ===============================
        [ INFO: suspicious RCU usage. ]
        4.8.0-rc5+ #7 Tainted: G        W
        -------------------------------
        ./include/linux/rcupdate.h:556 Illegal context switch in RCU read-side critical section!
      
        other info that might help us debug this:
      
        rcu_scheduler_active = 1, debug_locks = 0
        2 locks held by ls/2998:
         #0:  (rcu_read_lock){......}, at: [<c0000000000f6a00>] .__atomic_notifier_call_chain+0x0/0x1c0
         #1:  (rcu_read_lock){......}, at: [<c00000000093ac50>] .hw_breakpoint_handler+0x0/0x2b0
      
        stack backtrace:
        CPU: 9 PID: 2998 Comm: ls Tainted: G        W       4.8.0-rc5+ #7
        Call Trace:
        [c0000002f7933150] [c00000000094b1f8] .dump_stack+0xe0/0x14c (unreliable)
        [c0000002f79331e0] [c00000000013c468] .lockdep_rcu_suspicious+0x138/0x180
        [c0000002f7933270] [c0000000001005d8] .___might_sleep+0x278/0x2e0
        [c0000002f7933300] [c000000000935584] .mutex_lock_nested+0x64/0x5a0
        [c0000002f7933410] [c00000000023084c] .perf_event_ctx_lock_nested+0x16c/0x380
        [c0000002f7933500] [c000000000230a80] .perf_event_disable+0x20/0x60
        [c0000002f7933580] [c00000000093aeec] .hw_breakpoint_handler+0x29c/0x2b0
        [c0000002f7933630] [c0000000000f671c] .notifier_call_chain+0x7c/0xf0
        [c0000002f79336d0] [c0000000000f6abc] .__atomic_notifier_call_chain+0xbc/0x1c0
        [c0000002f7933780] [c0000000000f6c40] .notify_die+0x70/0xd0
        [c0000002f7933820] [c00000000001a74c] .do_break+0x4c/0x100
        [c0000002f7933920] [c0000000000089fc] handle_dabr_fault+0x14/0x48
      
      While it looks like the first WARN() is probably valid, the other one is
      triggered by disabling event via perf_event_disable() from atomic context.
      
      The event is disabled here in case we were not able to emulate
      the instruction that hit the breakpoint. By disabling the event
      we unschedule the event and make sure it's not scheduled back.
      
      But we can't call perf_event_disable() from atomic context, instead
      we need to use the event's pending_disable irq_work method to disable it.
      Reported-by: NJan Stancek <jstancek@redhat.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161026094824.GA21397@kravaSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5aab90ce
    • B
      mtd: nand: Fix data interface configuration logic · 73f907fd
      Boris Brezillon 提交于
      When changing from one data interface setting to another, one has to
      ensure a specific sequence which is described in the ONFI spec.
      
      One of these constraints is that the CE line has go high after a reset
      before a command can be sent with the new data interface setting, which
      is not guaranteed by the current implementation.
      
      Rework the nand_reset() function and all the call sites to make sure the
      CE line is asserted and released when required.
      
      Also make sure to actually apply the new data interface setting on the
      first die.
      Signed-off-by: NBoris Brezillon <boris.brezillon@free-electrons.com>
      Fixes: d8e725dd ("mtd: nand: automate NAND timings selection")
      Reviewed-by: NSascha Hauer <s.hauer@pengutronix.de>
      Tested-by: NMarc Gonzalez <marc_gonzalez@sigmadesigns.com>
      73f907fd
    • M
      kconfig.h: remove config_enabled() macro · c0a0aba8
      Masahiro Yamada 提交于
      The use of config_enabled() is ambiguous.  For config options,
      IS_ENABLED(), IS_REACHABLE(), etc.  will make intention clearer.
      Sometimes config_enabled() has been used for non-config options because
      it is useful to check whether the given symbol is defined or not.
      
      I have been tackling on deprecating config_enabled(), and now is the
      time to finish this work.
      
      Some new users have appeared for v4.9-rc1, but it is trivial to replace
      them:
      
       - arch/x86/mm/kaslr.c
        replace config_enabled() with IS_ENABLED() because
        CONFIG_X86_ESPFIX64 and CONFIG_EFI are boolean.
      
       - include/asm-generic/export.h
        replace config_enabled() with __is_defined().
      
      Then, config_enabled() can be removed now.
      
      Going forward, please use IS_ENABLED(), IS_REACHABLE(), etc. for config
      options, and __is_defined() for non-config symbols.
      
      Link: http://lkml.kernel.org/r/1476616078-32252-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NNicolas Pitre <nicolas.pitre@linaro.org>
      Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Garnier <thgarnie@google.com>
      Cc: Paul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0a0aba8
    • D
      net: ipv6: Do not consider link state for nexthop validation · d5d32e4b
      David Ahern 提交于
      Similar to IPv4, do not consider link state when validating next hops.
      
      Currently, if the link is down default routes can fail to insert:
       $ ip -6 ro add vrf blue default via 2100:2::64 dev eth2
       RTNETLINK answers: No route to host
      
      With this patch the command succeeds.
      
      Fixes: 8c14586f ("net: ipv6: Use passed in table for nexthop lookups")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5d32e4b
    • D
      net: ipv6: Fix processing of RAs in presence of VRF · 830218c1
      David Ahern 提交于
      rt6_add_route_info and rt6_add_dflt_router were updated to pull the FIB
      table from the device index, but the corresponding rt6_get_route_info
      and rt6_get_dflt_router functions were not leading to the failure to
      process RA's:
      
          ICMPv6: RA: ndisc_router_discovery failed to add default route
      
      Fix the 'get' functions by using the table id associated with the
      device when applicable.
      
      Also, now that default routes can be added to tables other than the
      default table, rt6_purge_dflt_routers needs to be updated as well to
      look at all tables. To handle that efficiently, add a flag to the table
      denoting if it is has a default route via RA.
      
      Fixes: ca254490 ("net: Add VRF support to IPv6 stack")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      830218c1
    • F
      netfilter: conntrack: avoid excess memory allocation · cdb436d1
      Florian Westphal 提交于
      This is now a fixed-size extension, so we don't need to pass a variable
      alloc size.  This (harmless) error results in allocating 32 instead of
      the needed 16 bytes for this extension as the size gets passed twice.
      
      Fixes: 23014011 ("netfilter: conntrack: support a fixed size of 128 distinct labels")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      cdb436d1
    • J
      netfilter: nf_tables: fix type mismatch with error return from nft_parse_u32_check · f1d505bb
      John W. Linville 提交于
      Commit 36b701fa ("netfilter: nf_tables: validate maximum value of
      u32 netlink attributes") introduced nft_parse_u32_check with a return
      value of "unsigned int", yet on error it returns "-ERANGE".
      
      This patch corrects the mismatch by changing the return value to "int",
      which happens to match the actual users of nft_parse_u32_check already.
      
      Found by Coverity, CID 1373930.
      
      Note that commit 21a9e0f1 ("netfilter: nft_exthdr: fix error
      handling in nft_exthdr_init()) attempted to address the issue, but
      did not address the return type of nft_parse_u32_check.
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      Cc: Laura Garcia Liebana <nevola@gmail.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Fixes: 36b701fa ("netfilter: nf_tables: validate maximum value...")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f1d505bb
新手
引导
客服 返回
顶部