1. 15 6月, 2017 4 次提交
  2. 13 6月, 2017 2 次提交
  3. 12 6月, 2017 1 次提交
  4. 09 6月, 2017 8 次提交
  5. 08 6月, 2017 3 次提交
    • P
      srcu: Allow use of Classic SRCU from both process and interrupt context · 1123a604
      Paolo Bonzini 提交于
      Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
      down a guest running iperf on a VFIO assigned device.  This happens
      because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
      context, while a worker thread does the same inside kvm_set_irq().  If the
      interrupt happens while the worker thread is executing __srcu_read_lock(),
      updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
      ->srcu_lock_count[] field can be lost.
      
      The docs say you are not supposed to call srcu_read_lock() and
      srcu_read_unlock() from irq context, but KVM interrupt injection happens
      from (host) interrupt context and it would be nice if SRCU supported the
      use case.  KVM is using SRCU here not really for the "sleepable" part,
      but rather due to its IPI-free fast detection of grace periods.  It is
      therefore not desirable to switch back to RCU, which would effectively
      revert commit 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
      2014-01-16).
      
      However, the docs are overly conservative.  You can have an SRCU instance
      only has users in irq context, and you can mix process and irq context
      as long as process context users disable interrupts.  In addition,
      __srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
      Classic SRCU.  For those two implementations, only srcu_read_lock()
      is unsafe.
      
      When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
      in commit 5a41344a ("srcu: Simplify __srcu_read_unlock() via
      this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
      Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
      the caller.  Tree SRCU however only does one increment, so on most
      architectures it is more efficient for __srcu_read_lock() to use
      this_cpu_inc(), and any performance differences appear to be down in
      the noise.
      
      Cc: stable@vger.kernel.org
      Fixes: 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
      Reported-by: NLinu Cherian <linuc.decode@gmail.com>
      Suggested-by: NLinu Cherian <linuc.decode@gmail.com>
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      1123a604
    • H
      loop: support 4k physical blocksize · f2c6df7d
      Hannes Reinecke 提交于
      When generating bootable VM images certain systems (most notably
      s390x) require devices with 4k blocksize. This patch implements
      a new flag 'LO_FLAGS_BLOCKSIZE' which will set the physical
      blocksize to that of the underlying device, and allow to change
      the logical blocksize for up to the physical blocksize.
      Signed-off-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f2c6df7d
    • C
      acpi: always include uuid.h · bcbc2265
      Christoph Hellwig 提交于
      Without this the build will fail for !CONFIG_ACPI builds on x86.
      
      Fixes: 94116f81 ("ACPI: Switch to use generic guid_t in acpi_evaluate_dsm()")
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      bcbc2265
  6. 07 6月, 2017 4 次提交
  7. 06 6月, 2017 1 次提交
  8. 05 6月, 2017 12 次提交
  9. 03 6月, 2017 4 次提交
    • M
      mm: consider memblock reservations for deferred memory initialization sizing · 864b9a39
      Michal Hocko 提交于
      We have seen an early OOM killer invocation on ppc64 systems with
      crashkernel=4096M:
      
      	kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
      	kthreadd cpuset=/ mems_allowed=7
      	CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
      	Call Trace:
      	  dump_stack+0xb0/0xf0 (unreliable)
      	  dump_header+0xb0/0x258
      	  out_of_memory+0x5f0/0x640
      	  __alloc_pages_nodemask+0xa8c/0xc80
      	  kmem_getpages+0x84/0x1a0
      	  fallback_alloc+0x2a4/0x320
      	  kmem_cache_alloc_node+0xc0/0x2e0
      	  copy_process.isra.25+0x260/0x1b30
      	  _do_fork+0x94/0x470
      	  kernel_thread+0x48/0x60
      	  kthreadd+0x264/0x330
      	  ret_from_kernel_thread+0x5c/0xa4
      
      	Mem-Info:
      	active_anon:0 inactive_anon:0 isolated_anon:0
      	 active_file:0 inactive_file:0 isolated_file:0
      	 unevictable:0 dirty:0 writeback:0 unstable:0
      	 slab_reclaimable:5 slab_unreclaimable:73
      	 mapped:0 shmem:0 pagetables:0 bounce:0
      	 free:0 free_pcp:0 free_cma:0
      	Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
      	lowmem_reserve[]: 0 0 0 0
      	Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
      	0 total pagecache pages
      	0 pages in swap cache
      	Swap cache stats: add 0, delete 0, find 0/0
      	Free swap  = 0kB
      	Total swap = 0kB
      	819200 pages RAM
      	0 pages HighMem/MovableOnly
      	817481 pages reserved
      	0 pages cma reserved
      	0 pages hwpoisoned
      
      the reason is that the managed memory is too low (only 110MB) while the
      rest of the the 50GB is still waiting for the deferred intialization to
      be done.  update_defer_init estimates the initial memoty to initialize
      to 2GB at least but it doesn't consider any memory allocated in that
      range.  In this particular case we've had
      
      	Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)
      
      so the low 2GB is mostly depleted.
      
      Fix this by considering memblock allocations in the initial static
      initialization estimation.  Move the max_initialise to
      reset_deferred_meminit and implement a simple memblock_reserved_memory
      helper which iterates all reserved blocks and sums the size of all that
      start below the given address.  The cumulative size is than added on top
      of the initial estimation.  This is still not ideal because
      reset_deferred_meminit doesn't consider holes and so reservation might
      be above the initial estimation whihch we ignore but let's make the
      logic simpler until we really need to handle more complicated cases.
      
      Fixes: 3a80a7fa ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
      Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.czSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Tested-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: <stable@vger.kernel.org>	[4.2+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      864b9a39
    • J
      mm/hugetlb: report -EHWPOISON not -EFAULT when FOLL_HWPOISON is specified · 9a291a7c
      James Morse 提交于
      KVM uses get_user_pages() to resolve its stage2 faults.  KVM sets the
      FOLL_HWPOISON flag causing faultin_page() to return -EHWPOISON when it
      finds a VM_FAULT_HWPOISON.  KVM handles these hwpoison pages as a
      special case.  (check_user_page_hwpoison())
      
      When huge pages are involved, this doesn't work so well.
      get_user_pages() calls follow_hugetlb_page(), which stops early if it
      receives VM_FAULT_HWPOISON from hugetlb_fault(), eventually returning
      -EFAULT to the caller.  The step to map this to -EHWPOISON based on the
      FOLL_ flags is missing.  The hwpoison special case is skipped, and
      -EFAULT is returned to user-space, causing Qemu or kvmtool to exit.
      
      Instead, move this VM_FAULT_ to errno mapping code into a header file
      and use it from faultin_page() and follow_hugetlb_page().
      
      With this, KVM works as expected.
      
      This isn't a problem for arm64 today as we haven't enabled
      MEMORY_FAILURE, but I can't see any reason this doesn't happen on x86
      too, so I think this should be a fix.  This doesn't apply earlier than
      stable's v4.11.1 due to all sorts of cleanup.
      
      [james.morse@arm.com: add vm_fault_to_errno() call to faultin_page()]
      suggested.
        Link: http://lkml.kernel.org/r/20170525171035.16359-1-james.morse@arm.com
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170524160900.28786-1-james.morse@arm.comSigned-off-by: NJames Morse <james.morse@arm.com>
      Acked-by: NPunit Agrawal <punit.agrawal@arm.com>
      Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: <stable@vger.kernel.org>	[4.11.1+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a291a7c
    • M
      frv: declare jiffies to be located in the .data section · 60b0a8c3
      Matthias Kaehlcke 提交于
      Commit 7c30f352 ("jiffies.h: declare jiffies and jiffies_64 with
      ____cacheline_aligned_in_smp") removed a section specification from the
      jiffies declaration that caused conflicts on some platforms.
      
      Unfortunately this change broke the build for frv:
      
        kernel/built-in.o: In function `__do_softirq': (.text+0x6460): relocation truncated to fit: R_FRV_GPREL12 against symbol
            `jiffies' defined in *ABS* section in .tmp_vmlinux1
        kernel/built-in.o: In function `__do_softirq': (.text+0x6574): relocation truncated to fit: R_FRV_GPREL12 against symbol
            `jiffies' defined in *ABS* section in .tmp_vmlinux1
        kernel/built-in.o: In function `pwq_activate_delayed_work': workqueue.c:(.text+0x15b9c): relocation truncated to fit: R_FRV_GPREL12 against
            symbol `jiffies' defined in *ABS* section in .tmp_vmlinux1
        ...
      
      Add __jiffy_arch_data to the declaration of jiffies and use it on frv to
      include the section specification.  For all other platforms
      __jiffy_arch_data (currently) has no effect.
      
      Fixes: 7c30f352 ("jiffies.h: declare jiffies and jiffies_64 with ____cacheline_aligned_in_smp")
      Link: http://lkml.kernel.org/r/20170516221333.177280-1-mka@chromium.orgSigned-off-by: NMatthias Kaehlcke <mka@chromium.org>
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Reviewed-by: NDavid Howells <dhowells@redhat.com>
      Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60b0a8c3
    • M
      include/linux/gfp.h: fix ___GFP_NOLOCKDEP value · 1bde33e0
      Michal Hocko 提交于
      Igor Stoppa has noticed that __GFP_NOLOCKDEP can use a lower bit.  At
      the time commit 7e784422 ("lockdep: allow to disable reclaim lockup
      detection") was written we still had __GFP_OTHER_NODE but I have removed
      it in commit 41b6167e ("mm: get rid of __GFP_OTHER_NODE") and forgot
      to lower the bit value.
      
      The current value is outside of __GFP_BITS_SHIFT so it cannot be used
      actually.
      
      Fixes: 7e784422 ("lockdep: allow to disable reclaim lockup detection")
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Reported-by: NIgor Stoppa <igor.stoppa@nokia.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1bde33e0
  10. 02 6月, 2017 1 次提交
    • M
      RDMA/SA: Fix kernel panic in CMA request handler flow · d3957b86
      Majd Dibbiny 提交于
      Commit 9fdca4da (IB/SA: Split struct sa_path_rec based on IB and
      ROCE specific fields) moved the service_id to be specific attribute
      for IB and OPA SA Path Record, and thus wasn't assigned for RoCE.
      
      This caused to the following kernel panic in the CMA request handler flow:
      
      [   27.074594] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      [   27.074731] IP: __radix_tree_lookup+0x1d/0xe0
      ...
      [   27.075356] Workqueue: ib_cm cm_work_handler [ib_cm]
      [   27.075401] task: ffff88022e3b8000 task.stack: ffffc90001298000
      [   27.075449] RIP: 0010:__radix_tree_lookup+0x1d/0xe0
      ...
      [   27.075979] Call Trace:
      [   27.076015]  radix_tree_lookup+0xd/0x10
      [   27.076055]  cma_ps_find+0x59/0x70 [rdma_cm]
      [   27.076097]  cma_id_from_event+0xd2/0x470 [rdma_cm]
      [   27.076144]  ? ib_init_ah_from_path+0x39a/0x590 [ib_core]
      [   27.076193]  cma_req_handler+0x25/0x480 [rdma_cm]
      [   27.076237]  cm_process_work+0x25/0x120 [ib_cm]
      [   27.076280]  ? cm_get_bth_pkey.isra.62+0x3c/0xa0 [ib_cm]
      [   27.076350]  cm_req_handler+0xb03/0xd40 [ib_cm]
      [   27.076430]  ? sched_clock_cpu+0x11/0xb0
      [   27.076478]  cm_work_handler+0x194/0x1588 [ib_cm]
      [   27.076525]  process_one_work+0x160/0x410
      [   27.076565]  worker_thread+0x137/0x4a0
      [   27.076614]  kthread+0x112/0x150
      [   27.076684]  ? max_active_store+0x60/0x60
      [   27.077642]  ? kthread_park+0x90/0x90
      [   27.078530]  ret_from_fork+0x2c/0x40
      
      This patch moves it back to the common SA Path Record structure
      and removes the redundant setter and getter.
      
      Tested on Connect-IB and Connect-X4 in Infiniband and RoCE respectively.
      
      Fixes: 9fdca4da (IB/SA: Split struct sa_path_rec based on IB ands
      	ROCE specific fields)
      Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
      Reviewed-by: NParav Pandit <parav@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d3957b86