1. 10 8月, 2017 3 次提交
  2. 07 8月, 2017 1 次提交
  3. 05 8月, 2017 1 次提交
  4. 04 8月, 2017 1 次提交
  5. 03 8月, 2017 6 次提交
    • K
      mm: allow page_cache_get_speculative in interrupt context · 1ee1c3f5
      Kan Liang 提交于
      Kernel panic when calling the IRQ-safe __get_user_pages_fast in NMI
      handler.
      
      The bug was introduced by commit 2947ba05 ("x86/mm/gup: Switch GUP
      to the generic get_user_page_fast() implementation").
      
      The original x86 __get_user_page_fast used plain get_page() or
      page_ref_add().  However, the generic __get_user_page_fast uses
      page_cache_get_speculative(), which has VM_BUG_ON(in_interrupt()).
      
      There is no reason to prevent page_cache_get_speculative from using in
      interrupt context.  According to the author, putting a BUG_ON there is
      just because the code is not verifying correctness of interrupt races.
      I did some tests in interrupt context.  There is no issue found.
      
      Removing VM_BUG_ON(in_interrupt()) for page_cache_get_speculative().
      
      Link: http://lkml.kernel.org/r/1501609146-59730-1-git-send-email-kan.liang@intel.com
      Fixes: 2947ba05 ("x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation")
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Ying Huang <ying.huang@intel.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ee1c3f5
    • D
      cpuset: fix a deadlock due to incomplete patching of cpusets_enabled() · 89affbf5
      Dima Zavin 提交于
      In codepaths that use the begin/retry interface for reading
      mems_allowed_seq with irqs disabled, there exists a race condition that
      stalls the patch process after only modifying a subset of the
      static_branch call sites.
      
      This problem manifested itself as a deadlock in the slub allocator,
      inside get_any_partial.  The loop reads mems_allowed_seq value (via
      read_mems_allowed_begin), performs the defrag operation, and then
      verifies the consistency of mem_allowed via the read_mems_allowed_retry
      and the cookie returned by xxx_begin.
      
      The issue here is that both begin and retry first check if cpusets are
      enabled via cpusets_enabled() static branch.  This branch can be
      rewritted dynamically (via cpuset_inc) if a new cpuset is created.  The
      x86 jump label code fully synchronizes across all CPUs for every entry
      it rewrites.  If it rewrites only one of the callsites (specifically the
      one in read_mems_allowed_retry) and then waits for the
      smp_call_function(do_sync_core) to complete while a CPU is inside the
      begin/retry section with IRQs off and the mems_allowed value is changed,
      we can hang.
      
      This is because begin() will always return 0 (since it wasn't patched
      yet) while retry() will test the 0 against the actual value of the seq
      counter.
      
      The fix is to use two different static keys: one for begin
      (pre_enable_key) and one for retry (enable_key).  In cpuset_inc(), we
      first bump the pre_enable key to ensure that cpuset_mems_allowed_begin()
      always return a valid seqcount if are enabling cpusets.  Similarly, when
      disabling cpusets via cpuset_dec(), we first ensure that callers of
      cpuset_mems_allowed_retry() will start ignoring the seqcount value
      before we let cpuset_mems_allowed_begin() return 0.
      
      The relevant stack traces of the two stuck threads:
      
        CPU: 1 PID: 1415 Comm: mkdir Tainted: G L  4.9.36-00104-g540c51286237 #4
        Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
        task: ffff8817f9c28000 task.stack: ffffc9000ffa4000
        RIP: smp_call_function_many+0x1f9/0x260
        Call Trace:
          smp_call_function+0x3b/0x70
          on_each_cpu+0x2f/0x90
          text_poke_bp+0x87/0xd0
          arch_jump_label_transform+0x93/0x100
          __jump_label_update+0x77/0x90
          jump_label_update+0xaa/0xc0
          static_key_slow_inc+0x9e/0xb0
          cpuset_css_online+0x70/0x2e0
          online_css+0x2c/0xa0
          cgroup_apply_control_enable+0x27f/0x3d0
          cgroup_mkdir+0x2b7/0x420
          kernfs_iop_mkdir+0x5a/0x80
          vfs_mkdir+0xf6/0x1a0
          SyS_mkdir+0xb7/0xe0
          entry_SYSCALL_64_fastpath+0x18/0xad
      
        ...
      
        CPU: 2 PID: 1 Comm: init Tainted: G L  4.9.36-00104-g540c51286237 #4
        Hardware name: Default string Default string/Hardware, BIOS 4.29.1-20170526215256 05/26/2017
        task: ffff8818087c0000 task.stack: ffffc90000030000
        RIP: int3+0x39/0x70
        Call Trace:
          <#DB> ? ___slab_alloc+0x28b/0x5a0
          <EOE> ? copy_process.part.40+0xf7/0x1de0
          __slab_alloc.isra.80+0x54/0x90
          copy_process.part.40+0xf7/0x1de0
          copy_process.part.40+0xf7/0x1de0
          kmem_cache_alloc_node+0x8a/0x280
          copy_process.part.40+0xf7/0x1de0
          _do_fork+0xe7/0x6c0
          _raw_spin_unlock_irq+0x2d/0x60
          trace_hardirqs_on_caller+0x136/0x1d0
          entry_SYSCALL_64_fastpath+0x5/0xad
          do_syscall_64+0x27/0x350
          SyS_clone+0x19/0x20
          do_syscall_64+0x60/0x350
          entry_SYSCALL64_slow_path+0x25/0x25
      
      Link: http://lkml.kernel.org/r/20170731040113.14197-1-dmitriyz@waymo.com
      Fixes: 46e700ab ("mm, page_alloc: remove unnecessary taking of a seqlock when cpusets are disabled")
      Signed-off-by: NDima Zavin <dmitriyz@waymo.com>
      Reported-by: NCliff Spradlin <cspradlin@waymo.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christopher Lameter <cl@linux.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89affbf5
    • J
      kthread: fix documentation build warning · d16977f3
      Jonathan Corbet 提交于
      The kerneldoc comment for kthread_create() had an incorrect argument
      name, leading to a warning in the docs build.
      
      Correct it, and make one more small step toward a warning-free build.
      
      Link: http://lkml.kernel.org/r/20170724135916.7f486c6f@lwn.netSigned-off-by: NJonathan Corbet <corbet@lwn.net>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d16977f3
    • M
      mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries · 3ea27719
      Mel Gorman 提交于
      Nadav Amit identified a theoritical race between page reclaim and
      mprotect due to TLB flushes being batched outside of the PTL being held.
      
      He described the race as follows:
      
              CPU0                            CPU1
              ----                            ----
                                              user accesses memory using RW PTE
                                              [PTE now cached in TLB]
              try_to_unmap_one()
              ==> ptep_get_and_clear()
              ==> set_tlb_ubc_flush_pending()
                                              mprotect(addr, PROT_READ)
                                              ==> change_pte_range()
                                              ==> [ PTE non-present - no flush ]
      
                                              user writes using cached RW PTE
              ...
      
              try_to_unmap_flush()
      
      The same type of race exists for reads when protecting for PROT_NONE and
      also exists for operations that can leave an old TLB entry behind such
      as munmap, mremap and madvise.
      
      For some operations like mprotect, it's not necessarily a data integrity
      issue but it is a correctness issue as there is a window where an
      mprotect that limits access still allows access.  For munmap, it's
      potentially a data integrity issue although the race is massive as an
      munmap, mmap and return to userspace must all complete between the
      window when reclaim drops the PTL and flushes the TLB.  However, it's
      theoritically possible so handle this issue by flushing the mm if
      reclaim is potentially currently batching TLB flushes.
      
      Other instances where a flush is required for a present pte should be ok
      as either the page lock is held preventing parallel reclaim or a page
      reference count is elevated preventing a parallel free leading to
      corruption.  In the case of page_mkclean there isn't an obvious path
      that userspace could take advantage of without using the operations that
      are guarded by this patch.  Other users such as gup as a race with
      reclaim looks just at PTEs.  huge page variants should be ok as they
      don't race with reclaim.  mincore only looks at PTEs.  userfault also
      should be ok as if a parallel reclaim takes place, it will either fault
      the page back in or read some of the data before the flush occurs
      triggering a fault.
      
      Note that a variant of this patch was acked by Andy Lutomirski but this
      was for the x86 parts on top of his PCID work which didn't make the 4.13
      merge window as expected.  His ack is dropped from this version and
      there will be a follow-on patch on top of PCID that will include his
      ack.
      
      [akpm@linux-foundation.org: tweak comments]
      [akpm@linux-foundation.org: fix spello]
      Link: http://lkml.kernel.org/r/20170717155523.emckq2esjro6hf3z@suse.deReported-by: NNadav Amit <nadav.amit@gmail.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: <stable@vger.kernel.org>	[v4.4+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ea27719
    • P
      KVM: avoid using rcu_dereference_protected · 3898da94
      Paolo Bonzini 提交于
      During teardown, accesses to memslots and buses are using
      rcu_dereference_protected with an always-true condition because
      these accesses are done outside the usual mutexes.  This
      is because the last reference is gone and there cannot be any
      concurrent modifications, but rcu_dereference_protected is
      ugly and unobvious.
      
      Instead, check the refcount in kvm_get_bus and __kvm_memslots.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      3898da94
    • I
      net/mlx4_en: Fix wrong indication of Wake-on-LAN (WoL) support · c994f778
      Inbar Karmy 提交于
      Currently when WoL is supported but disabled, ethtool reports:
      "Supports Wake-on: d".
      Fix the indication of Wol support, so that the indication
      remains "g" all the time if the NIC supports WoL.
      
      Tested:
      As accepted, when NIC supports WoL- ethtool reports:
      	Supports Wake-on: g
      	Wake-on: d
      when NIC doesn't support WoL- ethtool reports:
              Supports Wake-on: d
              Wake-on: d
      
      Fixes: 14c07b13 ("mlx4: Wake on LAN support")
      Signed-off-by: NInbar Karmy <inbark@mellanox.com>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c994f778
  6. 02 8月, 2017 3 次提交
    • B
      mtd: nand: Declare tBERS, tR and tPROG as u64 to avoid integer overflow · 6d292310
      Boris Brezillon 提交于
      All timings in nand_sdr_timings are expressed in picoseconds but some
      of them may not fit in an u32.
      Signed-off-by: NBoris Brezillon <boris.brezillon@free-electrons.com>
      Fixes: 204e7ecd ("mtd: nand: Add a few more timings to nand_sdr_timings")
      Reported-by: NAlexander Dahl <ada@thorsis.com>
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NAlexander Dahl <ada@thorsis.com>
      Tested-by: NAlexander Dahl <ada@thorsis.com>
      Signed-off-by: NBoris Brezillon <boris.brezillon@free-electrons.com>
      6d292310
    • G
      ptp: introduce ptp auxiliary worker · d9535cb7
      Grygorii Strashko 提交于
      Many PTP drivers required to perform some asynchronous or periodic work,
      like periodically handling PHC counter overflow or handle delayed timestamp
      for RX/TX network packets. In most of the cases, such work is implemented
      using workqueues. Unfortunately, Kernel workqueues might introduce
      significant delay in work scheduling under high system load and on -RT,
      which could cause misbehavior of PTP drivers due to internal counter
      overflow, for example, and there is no way to tune its execution policy and
      priority manuallly.
      
      Hence, The kthread_worker can be used insted of workqueues, as it create
      separte named kthread for each worker and its its execution policy and
      priority can be configured using chrt tool.
      
      This prblem was reported for two drivers TI CPSW CPTS and dp83640, so
      instead of modifying each of these driver it was proposed to add PTP
      auxiliary worker to the PHC subsystem.
      
      The patch adds PTP auxiliary worker in PHC subsystem using kthread_worker
      and kthread_delayed_work and introduces two new PHC subsystem APIs:
      
      - long (*do_aux_work)(struct ptp_clock_info *ptp) callback in
      ptp_clock_info structure, which driver should assign if it require to
      perform asynchronous or periodic work. Driver should return the delay of
      the PTP next auxiliary work scheduling time (>=0) or negative value in case
      further scheduling is not required.
      
      - int ptp_schedule_worker(struct ptp_clock *ptp, unsigned long delay) which
      allows schedule PTP auxiliary work.
      
      The name of kthread_worker thread corresponds PTP PHC device name "ptp%d".
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9535cb7
    • T
      NFSv4: Fix EXCHANGE_ID corrupt verifier issue · fd40559c
      Trond Myklebust 提交于
      The verifier is allocated on the stack, but the EXCHANGE_ID RPC call was
      changed to be asynchronous by commit 8d89bd70. If we interrrupt
      the call to rpc_wait_for_completion_task(), we can therefore end up
      transmitting random stack contents in lieu of the verifier.
      
      Fixes: 8d89bd70 ("NFS setup async exchange_id")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      fd40559c
  7. 01 8月, 2017 3 次提交
  8. 31 7月, 2017 2 次提交
  9. 30 7月, 2017 1 次提交
    • P
      udp6: fix socket leak on early demux · c9f2c1ae
      Paolo Abeni 提交于
      When an early demuxed packet reaches __udp6_lib_lookup_skb(), the
      sk reference is retrieved and used, but the relevant reference
      count is leaked and the socket destructor is never called.
      Beyond leaking the sk memory, if there are pending UDP packets
      in the receive queue, even the related accounted memory is leaked.
      
      In the long run, this will cause persistent forward allocation errors
      and no UDP skbs (both ipv4 and ipv6) will be able to reach the
      user-space.
      
      Fix this by explicitly accessing the early demux reference before
      the lookup, and properly decreasing the socket reference count
      after usage.
      
      Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and
      the now obsoleted comment about "socket cache".
      
      The newly added code is derived from the current ipv4 code for the
      similar path.
      
      v1 -> v2:
        fixed the __udp6_lib_rcv() return code for resubmission,
        as suggested by Eric
      Reported-by: NSam Edwards <CFSworks@gmail.com>
      Reported-by: NMarc Haber <mh+netdev@zugschlus.de>
      Fixes: 5425077d ("net: ipv6: Add early demux handler for UDP unicast")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9f2c1ae
  10. 27 7月, 2017 10 次提交
  11. 26 7月, 2017 6 次提交
    • P
      media: platform: davinci: drop VPFE_CMD_S_CCDC_RAW_PARAMS · b25db383
      Prabhakar Lad 提交于
      drop VPFE_CMD_S_CCDC_RAW_PARAMS ioctl from dm355/dm644x following reasons:
      
      - This ioctl was never in public api and was only defined in kernel header.
      - The function set_params constantly mixes up pointers and phys_addr_t
        numbers.
      - This is part of a 'VPFE_CMD_S_CCDC_RAW_PARAMS' ioctl command that is
        described as an 'experimental ioctl that will change in future kernels'.
      - The code to allocate the table never gets called after we copy_from_user
        the user input over the kernel settings, and then compare them
        for inequality.
      - We then go on to use an address provided by user space as both the
        __user pointer for input and pass it through phys_to_virt to come up
        with a kernel pointer to copy the data to. This looks like a trivially
        exploitable root hole.
      Signed-off-by: NLad, Prabhakar <prabhakar.csengg@gmail.com>
      Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>
      b25db383
    • H
      media: cec-notifier: small improvements · fc1ff45a
      Hans Verkuil 提交于
      Allow calling cec_notifier_set_phys_addr and
      cec_notifier_set_phys_addr_from_edid with a NULL notifier, in which
      case these functions do nothing.
      
      Add a cec_notifier_phys_addr_invalidate helper function (the notifier
      equivalent of cec_phys_addr_invalidate).
      
      These changes simplify drm CEC driver support.
      Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>
      fc1ff45a
    • M
      net: phy: Remove trailing semicolon in macro definition · 2eaa38d9
      Marc Gonzalez 提交于
      Commit e5a03bfd ("phy: Add an mdio_device structure")
      introduced a spurious trailing semicolon. Remove it.
      Signed-off-by: NMarc Gonzalez <marc_gonzalez@sigmadesigns.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2eaa38d9
    • T
      workqueue: implicit ordered attribute should be overridable · 0a94efb5
      Tejun Heo 提交于
      5c0338c6 ("workqueue: restore WQ_UNBOUND/max_active==1 to be
      ordered") automatically enabled ordered attribute for unbound
      workqueues w/ max_active == 1.  Because ordered workqueues reject
      max_active and some attribute changes, this implicit ordered mode
      broke cases where the user creates an unbound workqueue w/ max_active
      == 1 and later explicitly changes the related attributes.
      
      This patch distinguishes explicit and implicit ordered setting and
      overrides from attribute changes if implict.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: 5c0338c6 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered")
      0a94efb5
    • P
      udp: preserve head state for IP_CMSG_PASSSEC · dce4551c
      Paolo Abeni 提交于
      Paul Moore reported a SELinux/IP_PASSSEC regression
      caused by missing skb->sp at recvmsg() time. We need to
      preserve the skb head state to process the IP_CMSG_PASSSEC
      cmsg.
      
      With this commit we avoid releasing the skb head state in the
      BH even if a secpath is attached to the current skb, and stores
      the skb status (with/without head states) in the scratch area,
      so that we can access it at skb deallocation time, without
      incurring in cache-miss penalties.
      
      This also avoids misusing the skb CB for ipv6 packets,
      as introduced by the commit 0ddf3fb2 ("udp: preserve
      skb->dst if required for IP options processing").
      
      Clean a bit the scratch area helpers implementation, to
      reduce the code differences between 32 and 64 bits build.
      Reported-by: NPaul Moore <paul@paul-moore.com>
      Fixes: 0a463c78 ("udp: avoid a cache miss on dequeue")
      Fixes: 0ddf3fb2 ("udp: preserve skb->dst if required for IP options processing")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Tested-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dce4551c
    • J
      nvme-fc: revise TRADDR parsing · 9c5358e1
      James Smart 提交于
      The FC-NVME spec hasn't locked down on the format string for TRADDR.
      Currently the spec is lobbying for "nn-<16hexdigits>:pn-<16hexdigits>"
      where the wwn's are hex values but not prefixed by 0x.
      
      Most implementations so far expect a string format of
      "nn-0x<16hexdigits>:pn-0x<16hexdigits>" to be used. The transport
      uses the match_u64 parser which requires a leading 0x prefix to set
      the base properly. If it's not there, a match will either fail or return
      a base 10 value.
      
      The resolution in T11 is pushing out. Therefore, to fix things now and
      to cover any eventuality and any implementations already in the field,
      this patch adds support for both formats.
      
      The change consists of replacing the token matching routine with a
      routine that validates the fixed string format, and then builds
      a local copy of the hex name with a 0x prefix before calling
      the system parser.
      
      Note: the same parser routine exists in both the initiator and target
      transports. Given this is about the only "shared" item, we chose to
      replicate rather than create an interdendency on some shared code.
      Signed-off-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      9c5358e1
  12. 25 7月, 2017 3 次提交