1. 17 12月, 2014 1 次提交
    • L
      cpuidle / ACPI: remove unused CPUIDLE_FLAG_TIME_INVALID · 62c4cf97
      Len Brown 提交于
      CPUIDLE_FLAG_TIME_INVALID is no longer checked
      by menu or ladder cpuidle governors, so don't
      bother setting or defining it.
      
      It was originally invented to account for the fact that
      acpi_safe_halt() enables interrupts to invoke HLT.
      That would allow interrupt service routines to be included
      in the last_idle duration measurements made in cpuidle_enter_state(),
      potentially returning a duration much larger than reality.
      
      But menu and ladder can gracefully handle erroneously large duration
      intervals without checking for CPUIDLE_FLAG_TIME_INVALID.
      Further, if they don't check CPUIDLE_FLAG_TIME_INVALID, they
      can also benefit from the instances when the duration interval
      is not erroneously large.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      62c4cf97
  2. 14 11月, 2014 2 次提交
    • T
      mem-hotplug: reset node managed pages when hot-adding a new pgdat · f784a3f1
      Tang Chen 提交于
      In free_area_init_core(), zone->managed_pages is set to an approximate
      value for lowmem, and will be adjusted when the bootmem allocator frees
      pages into the buddy system.
      
      But free_area_init_core() is also called by hotadd_new_pgdat() when
      hot-adding memory.  As a result, zone->managed_pages of the newly added
      node's pgdat is set to an approximate value in the very beginning.
      
      Even if the memory on that node has node been onlined,
      /sys/device/system/node/nodeXXX/meminfo has wrong value:
      
        hot-add node2 (memory not onlined)
        cat /sys/device/system/node/node2/meminfo
        Node 2 MemTotal:       33554432 kB
        Node 2 MemFree:               0 kB
        Node 2 MemUsed:        33554432 kB
        Node 2 Active:                0 kB
      
      This patch fixes this problem by reset node managed pages to 0 after
      hot-adding a new node.
      
      1. Move reset_managed_pages_done from reset_node_managed_pages() to
         reset_all_zones_managed_pages()
      2. Make reset_node_managed_pages() non-static
      3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat
         is initialized
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: <stable@vger.kernel.org>	[3.16+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f784a3f1
    • J
      mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype · ad53f92e
      Joonsoo Kim 提交于
      Before describing bugs itself, I first explain definition of freepage.
      
       1. pages on buddy list are counted as freepage.
       2. pages on isolate migratetype buddy list are *not* counted as freepage.
       3. pages on cma buddy list are counted as CMA freepage, too.
      
      Now, I describe problems and related patch.
      
      Patch 1: There is race conditions on getting pageblock migratetype that
      it results in misplacement of freepages on buddy list, incorrect
      freepage count and un-availability of freepage.
      
      Patch 2: Freepages on pcp list could have stale cached information to
      determine migratetype of buddy list to go.  This causes misplacement of
      freepages on buddy list and incorrect freepage count.
      
      Patch 4: Merging between freepages on different migratetype of
      pageblocks will cause freepages accouting problem.  This patch fixes it.
      
      Without patchset [3], above problem doesn't happens on my CMA allocation
      test, because CMA reserved pages aren't used at all.  So there is no
      chance for above race.
      
      With patchset [3], I did simple CMA allocation test and get below
      result:
      
       - Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
       - run kernel build (make -j16) on background
       - 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
       - Result: more than 5000 freepage count are missed
      
      With patchset [3] and this patchset, I found that no freepage count are
      missed so that I conclude that problems are solved.
      
      On my simple memory offlining test, these problems also occur on that
      environment, too.
      
      This patch (of 4):
      
      There are two paths to reach core free function of buddy allocator,
      __free_one_page(), one is free_one_page()->__free_one_page() and the
      other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page().
      Each paths has race condition causing serious problems.  At first, this
      patch is focused on first type of freepath.  And then, following patch
      will solve the problem in second type of freepath.
      
      In the first type of freepath, we got migratetype of freeing page
      without holding the zone lock, so it could be racy.  There are two cases
      of this race.
      
       1. pages are added to isolate buddy list after restoring orignal
          migratetype
      
          CPU1                                   CPU2
      
          get migratetype => return MIGRATE_ISOLATE
          call free_one_page() with MIGRATE_ISOLATE
      
                                      grab the zone lock
                                      unisolate pageblock
                                      release the zone lock
      
          grab the zone lock
          call __free_one_page() with MIGRATE_ISOLATE
          freepage go into isolate buddy list,
          although pageblock is already unisolated
      
      This may cause two problems.  One is that we can't use this page anymore
      until next isolation attempt of this pageblock, because freepage is on
      isolate buddy list.  The other is that freepage accouting could be wrong
      due to merging between different buddy list.  Freepages on isolate buddy
      list aren't counted as freepage, but ones on normal buddy list are
      counted as freepage.  If merge happens, buddy freepage on normal buddy
      list is inevitably moved to isolate buddy list without any consideration
      of freepage accouting so it could be incorrect.
      
       2. pages are added to normal buddy list while pageblock is isolated.
          It is similar with above case.
      
      This also may cause two problems.  One is that we can't keep these
      freepages from being allocated.  Although this pageblock is isolated,
      freepage would be added to normal buddy list so that it could be
      allocated without any restriction.  And the other problem is same as
      case 1, that it, incorrect freepage accouting.
      
      This race condition would be prevented by checking migratetype again
      with holding the zone lock.  Because it is somewhat heavy operation and
      it isn't needed in common case, we want to avoid rechecking as much as
      possible.  So this patch introduce new variable, nr_isolate_pageblock in
      struct zone to check if there is isolated pageblock.  With this, we can
      avoid to re-check migratetype in common case and do it only if there is
      isolated pageblock or migratetype is MIGRATE_ISOLATE.  This solve above
      mentioned problems.
      
      Changes from v3:
      Add one more check in free_one_page() that checks whether migratetype is
      MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Heesub Shin <heesub.shin@samsung.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Ritesh Harjani <ritesh.list@gmail.com>
      Cc: Gioh Kim <gioh.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ad53f92e
  3. 13 11月, 2014 2 次提交
  4. 12 11月, 2014 1 次提交
    • U
      PM / Domains: Fix initial default state of the need_restore flag · 67732cd3
      Ulf Hansson 提交于
      The initial state of the device's need_restore flag should'nt depend on
      the current state of the PM domain. For example it should be perfectly
      valid to attach an inactive device to a powered PM domain.
      
      The pm_genpd_dev_need_restore() API allow us to update the need_restore
      flag to somewhat cope with such scenarios. Typically that should have
      been done from drivers/buses ->probe() since it's those that put the
      requirements on the value of the need_restore flag.
      
      Until recently, the Exynos SOCs were the only user of the
      pm_genpd_dev_need_restore() API, though invoking it from a centralized
      location while adding devices to their PM domains.
      
      Due to that Exynos now have swithed to the generic OF-based PM domain
      look-up, it's no longer possible to invoke the API from a centralized
      location. The reason is because devices are now added to their PM
      domains during the probe sequence.
      
      Commit "ARM: exynos: Move to generic PM domain DT bindings"
      did the switch for Exynos to the generic OF-based PM domain look-up,
      but it also removed the call to pm_genpd_dev_need_restore(). This
      caused a regression for some of the Exynos drivers.
      
      To handle things more properly in the generic PM domain, let's change
      the default initial value of the need_restore flag to reflect that the
      state is unknown. As soon as some of the runtime PM callbacks gets
      invoked, update the initial value accordingly.
      
      Moreover, since the generic PM domain is verifying that all devices
      are both runtime PM enabled and suspended, using pm_runtime_suspended()
      while pm_genpd_poweroff() is invoked from the scheduled work, we can be
      sure of that the PM domain won't be powering off while having active
      devices.
      
      Do note that, the generic PM domain can still only know about active
      devices which has been activated through invoking its runtime PM resume
      callback. In other words, buses/drivers using pm_runtime_set_active()
      during ->probe() will still suffer from a race condition, potentially
      probing a device without having its PM domain being powered. That issue
      will have to be solved using a different approach.
      
      This a log from the boot regression for Exynos5, which is being fixed in
      this patch.
      
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 308 at ../drivers/clk/clk.c:851 clk_disable+0x24/0x30()
      Modules linked in:
      CPU: 0 PID: 308 Comm: kworker/0:1 Not tainted 3.18.0-rc3-00569-gbd9449f-dirty #10
      Workqueue: pm pm_runtime_work
      [<c0013c64>] (unwind_backtrace) from [<c0010dec>] (show_stack+0x10/0x14)
      [<c0010dec>] (show_stack) from [<c03ee4cc>] (dump_stack+0x70/0xbc)
      [<c03ee4cc>] (dump_stack) from [<c0020d34>] (warn_slowpath_common+0x64/0x88)
      [<c0020d34>] (warn_slowpath_common) from [<c0020d74>] (warn_slowpath_null+0x1c/0x24)
      [<c0020d74>] (warn_slowpath_null) from [<c03107b0>] (clk_disable+0x24/0x30)
      [<c03107b0>] (clk_disable) from [<c02cc834>] (gsc_runtime_suspend+0x128/0x160)
      [<c02cc834>] (gsc_runtime_suspend) from [<c0249024>] (pm_generic_runtime_suspend+0x2c/0x38)
      [<c0249024>] (pm_generic_runtime_suspend) from [<c024f44c>] (pm_genpd_default_save_state+0x2c/0x8c)
      [<c024f44c>] (pm_genpd_default_save_state) from [<c024ff2c>] (pm_genpd_poweroff+0x224/0x3ec)
      [<c024ff2c>] (pm_genpd_poweroff) from [<c02501b4>] (pm_genpd_runtime_suspend+0x9c/0xcc)
      [<c02501b4>] (pm_genpd_runtime_suspend) from [<c024a4f8>] (__rpm_callback+0x2c/0x60)
      [<c024a4f8>] (__rpm_callback) from [<c024a54c>] (rpm_callback+0x20/0x74)
      [<c024a54c>] (rpm_callback) from [<c024a930>] (rpm_suspend+0xd4/0x43c)
      [<c024a930>] (rpm_suspend) from [<c024bbcc>] (pm_runtime_work+0x80/0x90)
      [<c024bbcc>] (pm_runtime_work) from [<c0032a9c>] (process_one_work+0x12c/0x314)
      [<c0032a9c>] (process_one_work) from [<c0032cf4>] (worker_thread+0x3c/0x4b0)
      [<c0032cf4>] (worker_thread) from [<c003747c>] (kthread+0xcc/0xe8)
      [<c003747c>] (kthread) from [<c000e738>] (ret_from_fork+0x14/0x3c)
      ---[ end trace 40cd58bcd6988f12 ]---
      
      Fixes: a4a8c2c4 (ARM: exynos: Move to generic PM domain DT bindings)
      Reported-and-tested0by: Sylwester Nawrocki <s.nawrocki@samsung.com>
      Reviewed-by: NSylwester Nawrocki <s.nawrocki@samsung.com>
      Reviewed-by: NKevin Hilman <khilman@linaro.org>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      67732cd3
  5. 11 11月, 2014 1 次提交
    • R
      tracing: Do not busy wait in buffer splice · e30f53aa
      Rabin Vincent 提交于
      On a !PREEMPT kernel, attempting to use trace-cmd results in a soft
      lockup:
      
       # trace-cmd record -e raw_syscalls:* -F false
       NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61]
       ...
       Call Trace:
        [<ffffffff8105b580>] ? __wake_up_common+0x90/0x90
        [<ffffffff81092e25>] wait_on_pipe+0x35/0x40
        [<ffffffff810936e3>] tracing_buffers_splice_read+0x2e3/0x3c0
        [<ffffffff81093300>] ? tracing_stats_read+0x2a0/0x2a0
        [<ffffffff812d10ab>] ? _raw_spin_unlock+0x2b/0x40
        [<ffffffff810dc87b>] ? do_read_fault+0x21b/0x290
        [<ffffffff810de56a>] ? handle_mm_fault+0x2ba/0xbd0
        [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
        [<ffffffff810951e2>] ? trace_buffer_lock_reserve+0x22/0x60
        [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
        [<ffffffff8112415d>] do_splice_to+0x6d/0x90
        [<ffffffff81126971>] SyS_splice+0x7c1/0x800
        [<ffffffff812d1edd>] tracesys_phase2+0xd3/0xd8
      
      The problem is this: tracing_buffers_splice_read() calls
      ring_buffer_wait() to wait for data in the ring buffers.  The buffers
      are not empty so ring_buffer_wait() returns immediately.  But
      tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1,
      meaning it only wants to read a full page.  When the full page is not
      available, tracing_buffers_splice_read() tries to wait again with
      ring_buffer_wait(), which again returns immediately, and so on.
      
      Fix this by adding a "full" argument to ring_buffer_wait() which will
      make ring_buffer_wait() wait until the writer has left the reader's
      page, i.e.  until full-page reads will succeed.
      
      Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in
      
      Cc: stable@vger.kernel.org # 3.16+
      Fixes: b1169cc6 ("tracing: Remove mock up poll wait function")
      Signed-off-by: NRabin Vincent <rabin@rab.in>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e30f53aa
  6. 10 11月, 2014 1 次提交
    • K
      mfd: max77693: Fix always masked MUIC interrupts · c0acb814
      Krzysztof Kozlowski 提交于
      All interrupts coming from MUIC were ignored because interrupt source
      register was masked.
      
      The Maxim 77693 has a "interrupt source" - a separate register and interrupts
      which give information about PMIC block triggering the individual
      interrupt (charger, topsys, MUIC, flash LED).
      
      By default bootloader could initialize this register to "mask all"
      value. In such case (observed on Trats2 board) MUIC interrupts won't be
      generated regardless of their mask status. Regmap irq chip was unmasking
      individual MUIC interrupts but the source was masked
      
      Before introducing regmap irq chip this interrupt source was unmasked,
      read and acked. Reading and acking is not necessary but unmasking is.
      
      Fixes: 342d669c ("mfd: max77693: Handle IRQs using regmap")
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Reviewed-by: NChanwoo Choi <cw00.choi@samsung.com>
      Signed-off-by: NLee Jones <lee.jones@linaro.org>
      c0acb814
  7. 08 11月, 2014 1 次提交
  8. 06 11月, 2014 2 次提交
  9. 04 11月, 2014 1 次提交
    • G
      of: Fix overflow bug in string property parsing functions · a87fa1d8
      Grant Likely 提交于
      The string property read helpers will run off the end of the buffer if
      it is handed a malformed string property. Rework the parsers to make
      sure that doesn't happen. At the same time add new test cases to make
      sure the functions behave themselves.
      
      The original implementations of of_property_read_string_index() and
      of_property_count_strings() both open-coded the same block of parsing
      code, each with it's own subtly different bugs. The fix here merges
      functions into a single helper and makes the original functions static
      inline wrappers around the helper.
      
      One non-bugfix aspect of this patch is the addition of a new wrapper,
      of_property_read_string_array(). The new wrapper is needed by the
      device_properties feature that Rafael is working on and planning to
      merge for v3.19. The implementation is identical both with and without
      the new static inline wrapper, so it just got left in to reduce the
      churn on the header file.
      Signed-off-by: NGrant Likely <grant.likely@linaro.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Darren Hart <darren.hart@intel.com>
      Cc: <stable@vger.kernel.org>  # v3.3+: Drop selftest hunks that don't apply
      a87fa1d8
  10. 31 10月, 2014 2 次提交
  11. 30 10月, 2014 4 次提交
    • J
      mm: memcontrol: fix missed end-writeback page accounting · d7365e78
      Johannes Weiner 提交于
      Commit 0a31bc97 ("mm: memcontrol: rewrite uncharge API") changed
      page migration to uncharge the old page right away.  The page is locked,
      unmapped, truncated, and off the LRU, but it could race with writeback
      ending, which then doesn't unaccount the page properly:
      
      test_clear_page_writeback()              migration
                                                 wait_on_page_writeback()
        TestClearPageWriteback()
                                                 mem_cgroup_migrate()
                                                   clear PCG_USED
        mem_cgroup_update_page_stat()
          if (PageCgroupUsed(pc))
            decrease memcg pages under writeback
      
        release pc->mem_cgroup->move_lock
      
      The per-page statistics interface is heavily optimized to avoid a
      function call and a lookup_page_cgroup() in the file unmap fast path,
      which means it doesn't verify whether a page is still charged before
      clearing PageWriteback() and it has to do it in the stat update later.
      
      Rework it so that it looks up the page's memcg once at the beginning of
      the transaction and then uses it throughout.  The charge will be
      verified before clearing PageWriteback() and migration can't uncharge
      the page as long as that is still set.  The RCU lock will protect the
      memcg past uncharge.
      
      As far as losing the optimization goes, the following test results are
      from a microbenchmark that maps, faults, and unmaps a 4GB sparse file
      three times in a nested fashion, so that there are two negative passes
      that don't account but still go through the new transaction overhead.
      There is no actual difference:
      
       old:     33.195102545 seconds time elapsed       ( +-  0.01% )
       new:     33.199231369 seconds time elapsed       ( +-  0.03% )
      
      The time spent in page_remove_rmap()'s callees still adds up to the
      same, but the time spent in the function itself seems reduced:
      
           # Children      Self  Command        Shared Object       Symbol
       old:     0.12%     0.11%  filemapstress  [kernel.kallsyms]   [k] page_remove_rmap
       new:     0.12%     0.08%  filemapstress  [kernel.kallsyms]   [k] page_remove_rmap
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: <stable@vger.kernel.org>	[3.17.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7365e78
    • J
      mm: page-writeback: inline account_page_dirtied() into single caller · 3a3c02ec
      Johannes Weiner 提交于
      A follow-up patch would have changed the call signature.  To save the
      trouble, just fold it instead.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: <stable@vger.kernel.org>	[3.17.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3a3c02ec
    • D
      mm, thp: fix collapsing of hugepages on madvise · 6d50e60c
      David Rientjes 提交于
      If an anonymous mapping is not allowed to fault thp memory and then
      madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never
      collapse this memory into thp memory.
      
      This occurs because the madvise(2) handler for thp, hugepage_madvise(),
      clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags
      until the final action of madvise_behavior().  This causes the
      khugepaged_enter_vma_merge() to be a no-op in hugepage_madvise() when
      the vma had previously had VM_NOHUGEPAGE set.
      
      Fix this by passing the correct vma flags to the khugepaged mm slot
      handler.  There's no chance khugepaged can run on this vma until after
      madvise_behavior() returns since we hold mm->mmap_sem.
      
      It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags
      in hugepage_advise(), but I didn't want to introduce special case
      behavior into madvise_behavior().  I think it's best to just let it
      always set vma->vm_flags itself.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reported-by: NSuleiman Souhlal <suleiman@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d50e60c
    • M
      drivers: of: add return value to of_reserved_mem_device_init() · 47f29df7
      Marek Szyprowski 提交于
      Driver calling of_reserved_mem_device_init() might be interested if the
      initialization has been successful or not, so add support for returning
      error code.
      
      This fixes a build warining caused by commit 7bfa5ab6 ("drivers:
      dma-coherent: add initialization from device tree"), which has been
      merged without this change and without fixing function return value.
      
      Fixes: 7bfa5ab6 ("drivers: dma-coherent: add initialization from device tree")
      Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Josh Cartwright <joshc@codeaurora.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      47f29df7
  12. 29 10月, 2014 5 次提交
  13. 28 10月, 2014 5 次提交
    • S
      compiler/gcc4+: Remove inaccurate comment about 'asm goto' miscompiles · 5631b8fb
      Steven Noonan 提交于
      The bug referenced by the comment in this commit was not
      completely fixed in GCC 4.8.2, as I mentioned in a thread back
      in February:
      
         https://lkml.org/lkml/2014/2/12/797
      
      The conclusion at that time was to make the quirk unconditional
      until the bug could be found and fixed in GCC. Unfortunately,
      when I submitted the patch (commit a9f18034) I left a comment
      in that claimed the bug was fixed in GCC 4.8.2+.
      
      This comment is inaccurate, and should be removed.
      Signed-off-by: NSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1414274982-14040-1-git-send-email-steven@uplinklabs.net
      Cc: Ingo Molnar <mingo@kernel.org>
      5631b8fb
    • C
      Revert "block: all blk-mq requests are tagged" · e999dbc2
      Christoph Hellwig 提交于
      This reverts commit fb3ccb5d.
      
      SCSI-2/SPI actually needs the tagged/untagged flag in the request to
      work properly.  Revert this patch and add a follow on to set it in
      the right place.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Acked-by: NJens Axboe <axboe@kernel.dk>
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Tested-by: NMeelis Roos <mroos@linux.ee>
      Cc: stable@vger.kernel.org
      e999dbc2
    • K
      power: charger-manager: Fix accessing invalidated power supply after charger unbind · cdaf3e15
      Krzysztof Kozlowski 提交于
      The charger manager obtained in probe references to power supplies for
      all chargers with power_supply_get_by_name() for later usage. However
      if such charger driver was removed then this reference would point to
      old power supply (from driver which was removed).
      
      This lead to accessing invalid memory which could be observed with:
      $ echo "max77693-charger" > /sys/bus/platform/drivers/max77693-charger/unbind
      $ grep . /sys/devices/virtual/power_supply/battery/charger.0/*
      $ grep . /sys/devices/virtual/power_supply/battery/*
      [   15.339817] Unable to handle kernel paging request at virtual address 0001c12c
      [   15.346187] pgd = edd08000
      [   15.348814] [0001c12c] *pgd=6dce2831, *pte=00000000, *ppte=00000000
      [   15.355075] Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM
      [   15.360967] Modules linked in:
      [   15.364010] CPU: 2 PID: 1388 Comm: grep Not tainted 3.17.0-next-20141007-00027-ga95e761db1b0 #245
      [   15.372859] task: ee03ad00 ti: edcf6000 task.ti: edcf6000
      [   15.378241] PC is at 0x1c12c
      [   15.381113] LR is at is_ext_pwr_online+0x30/0x6c
      [   15.385706] pc : [<0001c12c>]    lr : [<c0339fc4>]    psr: a0000013
      [   15.385706] sp : edcf7e88  ip : 00000000  fp : 00000000
      [   15.397161] r10: eeb02c08  r9 : c04b1f84  r8 : eeb02c00
      [   15.402369] r7 : edc69a10  r6 : eea6ac10  r5 : eea6ac10  r4 : 00000004
      [   15.408878] r3 : 0001c12c  r2 : edcf7e8c  r1 : 00000004  r0 : ee914418
      [   15.415390] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
      [   15.422506] Control: 10c5387d  Table: 6dd0804a  DAC: 00000015
      [   15.428236] Process grep (pid: 1388, stack limit = 0xedcf6240)
      [   15.434050] Stack: (0xedcf7e88 to 0xedcf8000)
      [   15.438395] 7e80:                   ee03ad00 00000000 edcf7f80 eea6aca8 edcf7ec4 c033b7b0
      [   15.446554] 7ea0: 00000001 ee1cc3f0 00000004 c06e1e44 eebdc000 c06e1e44 eeb02c00 c0337144
      [   15.454713] 7ec0: ee2dac68 c005cffc ee1cc3c0 c06e1e44 00000fff 00001000 eebdc000 c0278ca8
      [   15.462872] 7ee0: c0278c8c ee1cc3c0 eeb7ce00 c014422c edcf7f20 00008000 ee1cc3c0 ee9a48c0
      [   15.471030] 7f00: 00000001 00000001 edcf7f80 c0142d94 c0142d70 c01060f4 00021000 ee1cc3f0
      [   15.479190] 7f20: 00000000 00000000 c06a2150 eebdc000 2e7ec000 ee9a48c0 00008000 00021000
      [   15.487349] 7f40: edcf7f80 00008000 edcf6000 00021000 00021000 c00e39a4 00000000 ee9a48c0
      [   15.495508] 7f60: 00004000 00000000 00000000 ee9a48c0 ee9a48c0 00008000 00021000 c00e3aa0
      [   15.503668] 7f80: 00000000 00000000 0001f2e0 0001f2e0 00021000 00001000 00000003 c000f364
      [   15.511826] 7fa0: 00000000 c000f1a0 0001f2e0 00021000 00000003 00021000 00008000 00000000
      [   15.519986] 7fc0: 0001f2e0 00021000 00001000 00000003 00000001 000205e8 00000000 00021000
      [   15.528145] 7fe0: 00008000 bebbe910 0000a7ad b6edc49c 60000010 00000003 aaaaaaaa aaaaaaaa
      [   15.536320] [<c0339fc4>] (is_ext_pwr_online) from [<c033b7b0>] (charger_get_property+0x170/0x314)
      [   15.545164] [<c033b7b0>] (charger_get_property) from [<c0337144>] (power_supply_show_property+0x48/0x20c)
      [   15.554719] [<c0337144>] (power_supply_show_property) from [<c0278ca8>] (dev_attr_show+0x1c/0x48)
      [   15.563577] [<c0278ca8>] (dev_attr_show) from [<c014422c>] (sysfs_kf_seq_show+0x84/0x104)
      [   15.571725] [<c014422c>] (sysfs_kf_seq_show) from [<c0142d94>] (kernfs_seq_show+0x24/0x28)
      [   15.579973] [<c0142d94>] (kernfs_seq_show) from [<c01060f4>] (seq_read+0x1b0/0x484)
      [   15.587614] [<c01060f4>] (seq_read) from [<c00e39a4>] (vfs_read+0x88/0x144)
      [   15.594552] [<c00e39a4>] (vfs_read) from [<c00e3aa0>] (SyS_read+0x40/0x8c)
      [   15.601417] [<c00e3aa0>] (SyS_read) from [<c000f1a0>] (ret_fast_syscall+0x0/0x48)
      [   15.608877] Code: bad PC value
      [   15.611991] ---[ end trace a88fcc95208db283 ]---
      
      The charger-manager should get reference to charger power supply on
      each use of get_property callback.
      Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Cc: <stable@vger.kernel.org>
      Fixes: 3bb3dbbd ("power_supply: Add initial Charger-Manager driver")
      Signed-off-by: NSebastian Reichel <sre@kernel.org>
      cdaf3e15
    • K
      power: charger-manager: Fix accessing invalidated power supply after fuel gauge unbind · bdbe8144
      Krzysztof Kozlowski 提交于
      The charger manager obtained reference to fuel gauge power supply in probe
      with power_supply_get_by_name() for later usage. However if fuel gauge
      driver was removed and re-added then this reference would point to old
      power supply (from driver which was removed).
      
      This lead to accessing old (and probably invalid) memory which could be
      observed with:
      $ echo "12-0036" > /sys/bus/i2c/drivers/max17042/unbind
      $ echo "12-0036" > /sys/bus/i2c/drivers/max17042/bind
      $ cat /sys/devices/virtual/power_supply/battery/capacity
      [  240.480084] INFO: task cat:1393 blocked for more than 120 seconds.
      [  240.484799]       Not tainted 3.17.0-next-20141007-00028-ge60b6dd79570 #203
      [  240.491782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  240.499589] cat             D c0469530     0  1393      1 0x00000000
      [  240.505947] [<c0469530>] (__schedule) from [<c0469d3c>] (schedule_preempt_disabled+0x14/0x20)
      [  240.514449] [<c0469d3c>] (schedule_preempt_disabled) from [<c046af08>] (mutex_lock_nested+0x1bc/0x458)
      [  240.523736] [<c046af08>] (mutex_lock_nested) from [<c0287a98>] (regmap_read+0x30/0x60)
      [  240.531647] [<c0287a98>] (regmap_read) from [<c032238c>] (max17042_get_property+0x2e8/0x350)
      [  240.540055] [<c032238c>] (max17042_get_property) from [<c03247d8>] (charger_get_property+0x264/0x348)
      [  240.549252] [<c03247d8>] (charger_get_property) from [<c0320764>] (power_supply_show_property+0x48/0x1e0)
      [  240.558808] [<c0320764>] (power_supply_show_property) from [<c027308c>] (dev_attr_show+0x1c/0x48)
      [  240.567664] [<c027308c>] (dev_attr_show) from [<c0141fb0>] (sysfs_kf_seq_show+0x84/0x104)
      [  240.575814] [<c0141fb0>] (sysfs_kf_seq_show) from [<c0140b18>] (kernfs_seq_show+0x24/0x28)
      [  240.584061] [<c0140b18>] (kernfs_seq_show) from [<c0104574>] (seq_read+0x1b0/0x484)
      [  240.591702] [<c0104574>] (seq_read) from [<c00e1e24>] (vfs_read+0x88/0x144)
      [  240.598640] [<c00e1e24>] (vfs_read) from [<c00e1f20>] (SyS_read+0x40/0x8c)
      [  240.605507] [<c00e1f20>] (SyS_read) from [<c000e760>] (ret_fast_syscall+0x0/0x48)
      [  240.612952] 4 locks held by cat/1393:
      [  240.616589]  #0:  (&p->lock){+.+.+.}, at: [<c01043f4>] seq_read+0x30/0x484
      [  240.623414]  #1:  (&of->mutex){+.+.+.}, at: [<c01417dc>] kernfs_seq_start+0x1c/0x8c
      [  240.631086]  #2:  (s_active#31){++++.+}, at: [<c01417e4>] kernfs_seq_start+0x24/0x8c
      [  240.638777]  #3:  (&map->mutex){+.+...}, at: [<c0287a98>] regmap_read+0x30/0x60
      
      The charger-manager should get reference to fuel gauge power supply on
      each use of get_property callback. The thermal zone 'tzd' field of
      power supply should not be used because of the same reason.
      
      Additionally this change solves also the issue with nested
      thermal_zone_get_temp() calls and related false lockdep positive for
      deadlock for thermal zone's mutex [1]. When fuel gauge is used as source of
      temperature then the charger manager forwards its get_temp calls to fuel
      gauge thermal zone. So actually different mutexes are used (one for
      charger manager thermal zone and second for fuel gauge thermal zone) but
      for lockdep this is one class of mutex.
      
      The recursion is removed by retrieving temperature through power
      supply's get_property().
      
      In case external thermal zone is used ('cm-thermal-zone' property is
      present in DTS) the recursion does not exist. Charger manager simply
      exports POWER_SUPPLY_PROP_TEMP_AMBIENT property (instead of
      POWER_SUPPLY_PROP_TEMP) thus no thermal zone is created for this power
      supply.
      
      [1] https://lkml.org/lkml/2014/10/6/309Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Cc: <stable@vger.kernel.org>
      Fixes: 3bb3dbbd ("power_supply: Add initial Charger-Manager driver")
      Signed-off-by: NSebastian Reichel <sre@kernel.org>
      bdbe8144
    • K
      power_supply: Add no_thermal property to prevent recursive get_temp calls · a69d82b9
      Krzysztof Kozlowski 提交于
      Add a 'no_thermal' property to the power supply class. If true then
      thermal zone won't be created for this power supply in
      power_supply_register().
      
      Power supply drivers may want to set it if they support
      POWER_SUPPLY_PROP_TEMP and they are forwarding this get property call to
      other thermal zone.
      
      If they won't set it lockdep may report false positive deadlock for
      thermal zone's mutex because of nested calls to thermal_zone_get_temp().
      First is the call to thermal_zone_get_temp() of the driver's thermal
      zone. Thermal core gets POWER_SUPPLY_PROP_TEMP property from this
      driver. The driver then calls other thermal zone thermal_zone_get_temp()
      and returns result.
      
      Example of such driver is charger manager.
      Signed-off-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Signed-off-by: NSebastian Reichel <sre@kernel.org>
      a69d82b9
  14. 27 10月, 2014 1 次提交
  15. 24 10月, 2014 8 次提交
    • W
      kvm: vfio: fix unregister kvm_device_ops of vfio · 571ee1b6
      Wanpeng Li 提交于
      After commit 80ce1639 (KVM: VFIO: register kvm_device_ops dynamically),
      kvm_device_ops of vfio can be registered dynamically. Commit 3c3c29fd
      (kvm-vfio: do not use module_init) move the dynamic register invoked by
      kvm_init in order to fix broke unloading of the kvm module. However,
      kvm_device_ops of vfio is unregistered after rmmod kvm-intel module
      which lead to device type collision detection warning after kvm-intel
      module reinsmod.
      
          WARNING: CPU: 1 PID: 10358 at /root/cathy/kvm/arch/x86/kvm/../../../virt/kvm/kvm_main.c:3289 kvm_init+0x234/0x282 [kvm]()
          Modules linked in: kvm_intel(O+) kvm(O) nfsv3 nfs_acl auth_rpcgss oid_registry nfsv4 dns_resolver nfs fscache lockd sunrpc pci_stub bridge stp llc autofs4 8021q cpufreq_ondemand ipv6 joydev microcode pcspkr igb i2c_algo_bit ehci_pci ehci_hcd e1000e i2c_i801 ixgbe ptp pps_core hwmon mdio tpm_tis tpm ipmi_si ipmi_msghandler acpi_cpufreq isci libsas scsi_transport_sas button dm_mirror dm_region_hash dm_log dm_mod [last unloaded: kvm_intel]
          CPU: 1 PID: 10358 Comm: insmod Tainted: G        W  O   3.17.0-rc1 #2
          Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
           0000000000000cd9 ffff880ff08cfd18 ffffffff814a61d9 0000000000000cd9
           0000000000000000 ffff880ff08cfd58 ffffffff810417b7 ffff880ff08cfd48
           ffffffffa045bcac ffffffffa049c420 0000000000000040 00000000000000ff
          Call Trace:
           [<ffffffff814a61d9>] dump_stack+0x49/0x60
           [<ffffffff810417b7>] warn_slowpath_common+0x7c/0x96
           [<ffffffffa045bcac>] ? kvm_init+0x234/0x282 [kvm]
           [<ffffffff810417e6>] warn_slowpath_null+0x15/0x17
           [<ffffffffa045bcac>] kvm_init+0x234/0x282 [kvm]
           [<ffffffffa016e995>] vmx_init+0x1bf/0x42a [kvm_intel]
           [<ffffffffa016e7d6>] ? vmx_check_processor_compat+0x64/0x64 [kvm_intel]
           [<ffffffff810002ab>] do_one_initcall+0xe3/0x170
           [<ffffffff811168a9>] ? __vunmap+0xad/0xb8
           [<ffffffff8109c58f>] do_init_module+0x2b/0x174
           [<ffffffff8109d414>] load_module+0x43e/0x569
           [<ffffffff8109c6d8>] ? do_init_module+0x174/0x174
           [<ffffffff8109c75a>] ? copy_module_from_user+0x39/0x82
           [<ffffffff8109b7dd>] ? module_sect_show+0x20/0x20
           [<ffffffff8109d65f>] SyS_init_module+0x54/0x81
           [<ffffffff814a9a12>] system_call_fastpath+0x16/0x1b
          ---[ end trace 0626f4a3ddea56f3 ]---
      
      The bug can be reproduced by:
      
          rmmod kvm_intel.ko
          insmod kvm_intel.ko
      
      without rmmod/insmod kvm.ko
      This patch fixes the bug by unregistering kvm_device_ops of vfio when the
      kvm-intel module is removed.
      Reported-by: NLiu Rongrong <rongrongx.liu@intel.com>
      Fixes: 3c3c29fdSigned-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      571ee1b6
    • M
      fs: limit filesystem stacking depth · 69c433ed
      Miklos Szeredi 提交于
      Add a simple read-only counter to super_block that indicates how deep this
      is in the stack of filesystems.  Previously ecryptfs was the only stackable
      filesystem and it explicitly disallowed multiple layers of itself.
      
      Overlayfs, however, can be stacked recursively and also may be stacked
      on top of ecryptfs or vice versa.
      
      To limit the kernel stack usage we must limit the depth of the
      filesystem stack.  Initially the limit is set to 2.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      69c433ed
    • M
      vfs: add whiteout support · 787fb6bc
      Miklos Szeredi 提交于
      Whiteout isn't actually a new file type, but is represented as a char
      device (Linus's idea) with 0/0 device number.
      
      This has several advantages compared to introducing a new whiteout file
      type:
      
       - no userspace API changes (e.g. trivial to make backups of upper layer
         filesystem, without losing whiteouts)
      
       - no fs image format changes (you can boot an old kernel/fsck without
         whiteout support and things won't break)
      
       - implementation is trivial
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      787fb6bc
    • M
      vfs: export check_sticky() · cbdf35bc
      Miklos Szeredi 提交于
      It's already duplicated in btrfs and about to be used in overlayfs too.
      
      Move the sticky bit check to an inline helper and call the out-of-line
      helper only in the unlikly case of the sticky bit being set.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      cbdf35bc
    • M
      vfs: introduce clone_private_mount() · c771d683
      Miklos Szeredi 提交于
      Overlayfs needs a private clone of the mount, so create a function for
      this and export to modules.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      c771d683
    • M
      vfs: export __inode_permission() to modules · bd5d0856
      Miklos Szeredi 提交于
      We need to be able to check inode permissions (but not filesystem implied
      permissions) for stackable filesystems.  Expose this interface for overlayfs.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      bd5d0856
    • M
      vfs: export do_splice_direct() to modules · 1c118596
      Miklos Szeredi 提交于
      Export do_splice_direct() to modules.  Needed by overlay filesystem.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      1c118596
    • M
      vfs: add i_op->dentry_open() · 4aa7c634
      Miklos Szeredi 提交于
      Add a new inode operation i_op->dentry_open().  This is for stacked filesystems
      that want to return a struct file from a different filesystem.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      4aa7c634
  16. 23 10月, 2014 3 次提交
    • B
      uprobes: Remove "weak" from function declarations · 271a9c35
      Bjorn Helgaas 提交于
      For the following interfaces:
      
        set_swbp()
        set_orig_insn()
        is_swbp_insn()
        is_trap_insn()
        uprobe_get_swbp_addr()
        arch_uprobe_ignore()
        arch_uprobe_copy_ixol()
      
      kernel/events/uprobes.c provides default definitions explicitly marked
      "weak".  Some architectures provide their own definitions intended to
      override the defaults, but the "weak" attribute on the declarations applied
      to the arch definitions as well, so the linker chose one based on link
      order (see 10629d71 ("PCI: Remove __weak annotation from
      pcibios_get_phb_of_node decl")).
      
      Remove the "weak" attribute from the declarations so we always prefer a
      non-weak definition over the weak one, independent of link order.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      CC: Victor Kamensky <victor.kamensky@linaro.org>
      CC: Oleg Nesterov <oleg@redhat.com>
      CC: David A. Long <dave.long@linaro.org>
      CC: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      271a9c35
    • B
      memory-hotplug: Remove "weak" from memory_block_size_bytes() declaration · e0a8400c
      Bjorn Helgaas 提交于
      drivers/base/memory.c provides a default memory_block_size_bytes()
      definition explicitly marked "weak".  Several architectures provide their
      own definitions intended to override the default, but the "weak" attribute
      on the declaration applied to the arch definitions as well, so the linker
      chose one based on link order (see 10629d71 ("PCI: Remove __weak
      annotation from pcibios_get_phb_of_node decl")).
      
      Remove the "weak" attribute from the declaration so we always prefer a
      non-weak definition over the weak one, independent of link order.
      
      Fixes: 41f10726 ("drivers: base: Add prototype declaration to the header file")
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      CC: Rashika Kheria <rashika.kheria@gmail.com>
      CC: Nathan Fontenot <nfont@austin.ibm.com>
      CC: Anton Blanchard <anton@au1.ibm.com>
      CC: Heiko Carstens <heiko.carstens@de.ibm.com>
      CC: Yinghai Lu <yinghai@kernel.org>
      e0a8400c
    • B
      kgdb: Remove "weak" from kgdb_arch_pc() declaration · 107bcc6d
      Bjorn Helgaas 提交于
      kernel/debug/debug_core.c provides a default kgdb_arch_pc() definition
      explicitly marked "weak".  Several architectures provide their own
      definitions intended to override the default, but the "weak" attribute on
      the declaration applied to the arch definitions as well, so the linker
      chose one based on link order (see 10629d71 ("PCI: Remove __weak
      annotation from pcibios_get_phb_of_node decl")).
      
      Remove the "weak" attribute from the declaration so we always prefer a
      non-weak definition over the weak one, independent of link order.
      
      Fixes: 688b744d ("kgdb: fix signedness mixmatches, add statics, add declaration to header")
      Tested-by: Vineet Gupta <vgupta@synopsys.com>	# for ARC build
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NHarvey Harrison <harvey.harrison@gmail.com>
      107bcc6d