1. 16 4月, 2015 2 次提交
    • E
      mm: allow compaction of unevictable pages · 5bbe3547
      Eric B Munson 提交于
      Currently, pages which are marked as unevictable are protected from
      compaction, but not from other types of migration.  The POSIX real time
      extension explicitly states that mlock() will prevent a major page
      fault, but the spirit of this is that mlock() should give a process the
      ability to control sources of latency, including minor page faults.
      However, the mlock manpage only explicitly says that a locked page will
      not be written to swap and this can cause some confusion.  The
      compaction code today does not give a developer who wants to avoid swap
      but wants to have large contiguous areas available any method to achieve
      this state.  This patch introduces a sysctl for controlling compaction
      behavior with respect to the unevictable lru.  Users who demand no page
      faults after a page is present can set compact_unevictable_allowed to 0
      and users who need the large contiguous areas can enable compaction on
      locked memory by leaving the default value of 1.
      
      To illustrate this problem I wrote a quick test program that mmaps a
      large number of 1MB files filled with random data.  These maps are
      created locked and read only.  Then every other mmap is unmapped and I
      attempt to allocate huge pages to the static huge page pool.  When the
      compact_unevictable_allowed sysctl is 0, I cannot allocate hugepages
      after fragmenting memory.  When the value is set to 1, allocations
      succeed.
      Signed-off-by: NEric B Munson <emunson@akamai.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5bbe3547
    • V
      memcg: zap mem_cgroup_lookup() · adbe427b
      Vladimir Davydov 提交于
      mem_cgroup_lookup() is a wrapper around mem_cgroup_from_id(), which
      checks that id != 0 before issuing the function call.  Today, there is
      no point in this additional check apart from optimization, because there
      is no css with id <= 0, so that css_from_id, called by
      mem_cgroup_from_id, will return NULL for any id <= 0.
      
      Since mem_cgroup_from_id is only called from mem_cgroup_lookup, let us
      zap mem_cgroup_lookup, substituting calls to it with mem_cgroup_from_id
      and moving the check if id > 0 to css_from_id.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      adbe427b
  2. 15 4月, 2015 10 次提交
  3. 13 4月, 2015 1 次提交
    • P
      cpu: Defer smpboot kthread unparking until CPU known to scheduler · 00df35f9
      Paul E. McKenney 提交于
      Currently, smpboot_unpark_threads() is invoked before the incoming CPU
      has been added to the scheduler's runqueue structures.  This might
      potentially cause the unparked kthread to run on the wrong CPU, since the
      correct CPU isn't fully set up yet.
      
      That causes a sporadic, hard to debug boot crash triggering on some
      systems, reported by Borislav Petkov, and bisected down to:
      
        2a442c9c ("x86: Use common outgoing-CPU-notification code")
      
      This patch places smpboot_unpark_threads() in a CPU hotplug
      notifier with priority set so that these kthreads are unparked just after
      the CPU has been added to the runqueues.
      Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      00df35f9
  4. 09 4月, 2015 4 次提交
    • J
      locking/mutex: Further simplify mutex_spin_on_owner() · 01ac33c1
      Jason Low 提交于
      Similar to what Linus suggested for rwsem_spin_on_owner(), in
      mutex_spin_on_owner() instead of having while (true) and
      breaking out of the spin loop on lock->owner != owner, we can
      have the loop directly check for while (lock->owner == owner) to
      improve the readability of the code.
      
      It also shrinks the code a bit:
      
         text    data     bss     dec     hex filename
         3721       0       0    3721     e89 mutex.o.before
         3705       0       0    3705     e79 mutex.o.after
      Signed-off-by: NJason Low <jason.low2@hp.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Aswin Chandramouleeswaran <aswin@hp.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Link: http://lkml.kernel.org/r/1428521960-5268-2-git-send-email-jason.low2@hp.com
      [ Added code generation info. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      01ac33c1
    • L
      Copy the kernel module data from user space in chunks · 3afe9f84
      Linus Torvalds 提交于
      Unlike most (all?) other copies from user space, kernel module loading
      is almost unlimited in size.  So we do a potentially huge
      "copy_from_user()" when we copy the module data from user space to the
      kernel buffer, which can be a latency concern when preemption is
      disabled (or voluntary).
      
      Also, because 'copy_from_user()' clears the tail of the kernel buffer on
      failures, even a *failed* copy can end up wasting a lot of time.
      
      Normally neither of these are concerns in real life, but they do trigger
      when doing stress-testing with trinity.  Running in a VM seems to add
      its own overheadm causing trinity module load testing to even trigger
      the watchdog.
      
      The simple fix is to just chunk up the module loading, so that it never
      tries to copy insanely big areas in one go.  That bounds the latency,
      and also the amount of (unnecessarily, in this case) cleared memory for
      the failure case.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3afe9f84
    • M
      genirq: Allow the irqchip state of an IRQ to be save/restored · 1b7047ed
      Marc Zyngier 提交于
      There is a number of cases where a kernel subsystem may want to
      introspect the state of an interrupt at the irqchip level:
      
      - When a peripheral is shared between virtual machines,
        its interrupt state becomes part of the guest's state,
        and must be switched accordingly. KVM on arm/arm64 requires
        this for its guest-visible timer
      - Some GPIO controllers seem to require peeking into the
        interrupt controller they are connected to to report
        their internal state
      
      This seem to be a pattern that is common enough for the core code
      to try and support this without too many horrible hacks. Introduce
      a pair of accessors (irq_get_irqchip_state/irq_set_irqchip_state)
      to retrieve the bits that can be of interest to another subsystem:
      pending, active, and masked.
      
      - irq_get_irqchip_state returns the state of the interrupt according
        to a parameter set to IRQCHIP_STATE_PENDING, IRQCHIP_STATE_ACTIVE,
        IRQCHIP_STATE_MASKED or IRQCHIP_STATE_LINE_LEVEL.
      - irq_set_irqchip_state similarly sets the state of the interrupt.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NBjorn Andersson <bjorn.andersson@sonymobile.com>
      Tested-by: NBjorn Andersson <bjorn.andersson@sonymobile.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Phong Vo <pvo@apm.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Tin Huynh <tnhuynh@apm.com>
      Cc: Y Vo <yvo@apm.com>
      Cc: Toan Le <toanle@apm.com>
      Cc: Bjorn Andersson <bjorn@kryo.se>
      Cc: Jason Cooper <jason@lakedaemon.net>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Link: http://lkml.kernel.org/r/1426676484-21812-2-git-send-email-marc.zyngier@arm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1b7047ed
    • M
      genirq: MSI: Fix freeing of unallocated MSI · fe0c52fc
      Marc Zyngier 提交于
      While debugging an unrelated issue with the GICv3 ITS driver, the
      following trace triggered:
      
      WARNING: CPU: 1 PID: 1 at kernel/irq/irqdomain.c:1121 irq_domain_free_irqs+0x160/0x17c()
      NULL pointer, cannot free irq
      Modules linked in:
      CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W      3.19.0-rc6+ #3690
      Hardware name: FVP Base (DT)
      Call trace:
      [<ffffffc000089398>] dump_backtrace+0x0/0x13c
      [<ffffffc0000894e4>] show_stack+0x10/0x1c
      [<ffffffc00066d134>] dump_stack+0x74/0x94
      [<ffffffc0000a92f8>] warn_slowpath_common+0x9c/0xd4
      [<ffffffc0000a938c>] warn_slowpath_fmt+0x5c/0x80
      [<ffffffc0000ee04c>] irq_domain_free_irqs+0x15c/0x17c
      [<ffffffc0000ef918>] msi_domain_free_irqs+0x58/0x74
      [<ffffffc000386f58>] free_msi_irqs+0xb4/0x1c0
      
          // The msi_prepare callback fails here
      
      [<ffffffc0003872c0>] pci_enable_msix+0x25c/0x3d4
      [<ffffffc00038746c>] pci_enable_msix_range+0x34/0x80
      [<ffffffc0003924ac>] vp_try_to_find_vqs+0xec/0x528
      [<ffffffc000392954>] vp_find_vqs+0x6c/0xa8
      [<ffffffc0003ee2a8>] init_vq+0x120/0x248
      [<ffffffc0003eefb0>] virtblk_probe+0xb0/0x6bc
      [<ffffffc00038fc34>] virtio_dev_probe+0x17c/0x214
      [<ffffffc0003d4a04>] driver_probe_device+0x7c/0x23c
      [<ffffffc0003d4cb0>] __driver_attach+0x98/0xa0
      [<ffffffc0003d2c60>] bus_for_each_dev+0x60/0xb4
      [<ffffffc0003d455c>] driver_attach+0x1c/0x28
      [<ffffffc0003d41b0>] bus_add_driver+0x150/0x208
      [<ffffffc0003d54c0>] driver_register+0x64/0x130
      [<ffffffc00038f9e8>] register_virtio_driver+0x24/0x68
      [<ffffffc00091320c>] init+0x70/0xac
      [<ffffffc0000828f0>] do_one_initcall+0x94/0x1d0
      [<ffffffc0008e9b00>] kernel_init_freeable+0x144/0x1e4
      [<ffffffc00066a434>] kernel_init+0xc/0xd8
      ---[ end trace f9ee562a77cc7bae ]---
      
      The ITS msi_prepare callback having failed, we end-up trying to
      free MSIs that have never been allocated. Oddly enough, the kernel
      is pretty upset about it.
      
      It turns out that this behaviour was expected before the MSI domain
      was introduced (and dealt with in arch_teardown_msi_irqs).
      
      The obvious fix is to detect this early enough and bail out.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NJiang Liu <jiang.liu@linux.intel.com>
      Link: http://lkml.kernel.org/r/1422299419-6051-1-git-send-email-marc.zyngier@arm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      fe0c52fc
  5. 08 4月, 2015 4 次提交
  6. 07 4月, 2015 1 次提交
  7. 06 4月, 2015 1 次提交
    • F
      workqueue: Reorder sysfs code · 6ba94429
      Frederic Weisbecker 提交于
      The sysfs code usually belongs to the botom of the file since it deals
      with high level objects. In the workqueue code it's misplaced and such
      that we'll need to work around functions references to allow the sysfs
      code to call APIs like apply_workqueue_attrs().
      
      Lets move that block further in the file, almost the botom.
      
      And declare workqueue_sysfs_unregister() just before destroy_workqueue()
      which reference it.
      
      tj: Moved workqueue_sysfs_unregister() forward declaration where other
          forward declarations are.
      Suggested-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6ba94429
  8. 03 4月, 2015 17 次提交