1. 03 7月, 2017 1 次提交
  2. 02 7月, 2017 1 次提交
  3. 01 7月, 2017 1 次提交
  4. 29 6月, 2017 4 次提交
    • J
      block: provide bio_uninit() free freeing integrity/task associations · 9ae3b3f5
      Jens Axboe 提交于
      Wen reports significant memory leaks with DIF and O_DIRECT:
      
      "With nvme devive + T10 enabled, On a system it has 256GB and started
      logging /proc/meminfo & /proc/slabinfo for every minute and in an hour
      it increased by 15968128 kB or ~15+GB.. Approximately 256 MB / minute
      leaking.
      
      /proc/meminfo | grep SUnreclaim...
      
      SUnreclaim:      6752128 kB
      SUnreclaim:      6874880 kB
      SUnreclaim:      7238080 kB
      ....
      SUnreclaim:     22307264 kB
      SUnreclaim:     22485888 kB
      SUnreclaim:     22720256 kB
      
      When testcases with T10 enabled call into __blkdev_direct_IO_simple,
      code doesn't free memory allocated by bio_integrity_alloc. The patch
      fixes the issue. HTX has been run with +60 hours without failure."
      
      Since __blkdev_direct_IO_simple() allocates the bio on the stack, it
      doesn't go through the regular bio free. This means that any ancillary
      data allocated with the bio through the stack is not freed. Hence, we
      can leak the integrity data associated with the bio, if the device is
      using DIF/DIX.
      
      Fix this by providing a bio_uninit() and export it, so that we can use
      it to free this data. Note that this is a minimal fix for this issue.
      Any current user of bio's that are allocated outside of
      bio_alloc_bioset() suffers from this issue, most notably some drivers.
      We will fix those in a more comprehensive patch for 4.13. This also
      means that the commit marked as being fixed by this isn't the real
      culprit, it's just the most obvious one out there.
      
      Fixes: 542ff7bf ("block: new direct I/O implementation")
      Reported-by: NWen Xiong <wenxiong@linux.vnet.ibm.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9ae3b3f5
    • C
      blk-mq: Create hctx for each present CPU · 4b855ad3
      Christoph Hellwig 提交于
      Currently we only create hctx for online CPUs, which can lead to a lot
      of churn due to frequent soft offline / online operations.  Instead
      allocate one for each present CPU to avoid this and dramatically simplify
      the code.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: linux-block@vger.kernel.org
      Cc: linux-nvme@lists.infradead.org
      Link: http://lkml.kernel.org/r/20170626102058.10200-3-hch@lst.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      4b855ad3
    • M
      regmap: irq: add chip option mask_writeonly · a71411db
      Michael Grzeschik 提交于
      Some irq controllers have writeonly/multipurpose register layouts. In
      those cases we read invalid data back. Here we add the option
      mask_writeonly as masking option.
      Signed-off-by: NMichael Grzeschik <m.grzeschik@pengutronix.de>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      a71411db
    • K
      locking/refcount: Create unchecked atomic_t implementation · fd25d19f
      Kees Cook 提交于
      Many subsystems will not use refcount_t unless there is a way to build the
      kernel so that there is no regression in speed compared to atomic_t. This
      adds CONFIG_REFCOUNT_FULL to enable the full refcount_t implementation
      which has the validation but is slightly slower. When not enabled,
      refcount_t uses the basic unchecked atomic_t routines, which results in
      no code changes compared to just using atomic_t directly.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: David Windsor <dwindsor@gmail.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Elena Reshetova <elena.reshetova@intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Hans Liljestrand <ishkamiel@gmail.com>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Serge E. Hallyn <serge@hallyn.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arozansk@redhat.com
      Cc: axboe@kernel.dk
      Cc: linux-arch <linux-arch@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20170621200026.GA115679@beastSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fd25d19f
  5. 28 6月, 2017 10 次提交
  6. 27 6月, 2017 2 次提交
    • O
      tty: add function to convert device name to number · fc61ed51
      Okash Khawaja 提交于
      The function converts strings like ttyS0 and ttyUSB0 to dev_t like
      (4, 64) and (188, 0). It does this by scanning tty_drivers list for
      corresponding device name and index. If the driver is not registered,
      this function returns -ENODEV. It also acquires tty_mutex.
      Signed-off-by: NOkash Khawaja <okash.khawaja@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc61ed51
    • L
      x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF · f8475cef
      Len Brown 提交于
      The goal of this change is to give users a uniform and meaningful
      result when they read /sys/...cpufreq/scaling_cur_freq
      on modern x86 hardware, as compared to what they get today.
      
      Modern x86 processors include the hardware needed
      to accurately calculate frequency over an interval --
      APERF, MPERF, and the TSC.
      
      Here we provide an x86 routine to make this calculation
      on supported hardware, and use it in preference to any
      driver driver-specific cpufreq_driver.get() routine.
      
      MHz is computed like so:
      
      MHz = base_MHz * delta_APERF / delta_MPERF
      
      MHz is the average frequency of the busy processor
      over a measurement interval.  The interval is
      defined to be the time between successive invocations
      of aperfmperf_khz_on_cpu(), which are expected to to
      happen on-demand when users read sysfs attribute
      cpufreq/scaling_cur_freq.
      
      As with previous methods of calculating MHz,
      idle time is excluded.
      
      base_MHz above is from TSC calibration global "cpu_khz".
      
      This x86 native method to calculate MHz returns a meaningful result
      no matter if P-states are controlled by hardware or firmware
      and/or if the Linux cpufreq sub-system is or is-not installed.
      
      When this routine is invoked more frequently, the measurement
      interval becomes shorter.  However, the code limits re-computation
      to 10ms intervals so that average frequency remains meaningful.
      
      Discerning users are encouraged to take advantage of
      the turbostat(8) utility, which can gracefully handle
      concurrent measurement intervals of arbitrary length.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      f8475cef
  7. 26 6月, 2017 1 次提交
  8. 25 6月, 2017 1 次提交
  9. 24 6月, 2017 4 次提交
    • D
      genirq/timings: Add infrastructure for estimating the next interrupt arrival time · e1c92149
      Daniel Lezcano 提交于
      An interrupt behaves with a burst of activity with periodic interval of time
      followed by one or two peaks of longer interval.
      
      As the time intervals are periodic, statistically speaking they follow a normal
      distribution and each interrupts can be tracked individually.
      
      Add a mechanism to compute the statistics on all interrupts, except the
      timers which are deterministic from a prediction point of view, as their
      expiry time is known.
      
      The goal is to extract the periodicity for each interrupt, with the last
      timestamp and sum them, so the next event can be predicted to a certain
      extent.
      
      Taking the earliest prediction gives the expected wakeup on the system
      (assuming a timer won't expire before).
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: "Rafael J . Wysocki" <rafael@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Link: http://lkml.kernel.org/r/1498227072-5980-2-git-send-email-daniel.lezcano@linaro.org
      e1c92149
    • D
      genirq/timings: Add infrastructure to track the interrupt timings · b2d3d61a
      Daniel Lezcano 提交于
      The interrupt framework gives a lot of information about each interrupt. It
      does not keep track of when those interrupts occur though, which is a
      prerequisite for estimating the next interrupt arrival for power management
      purposes.
      
      Add a mechanism to record the timestamp for each interrupt occurrences in a
      per-CPU circular buffer to help with the prediction of the next occurrence
      using a statistical model.
      
      Each CPU can store up to IRQ_TIMINGS_SIZE events <irq, timestamp>, the
      current value of IRQ_TIMINGS_SIZE is 32.
      
      Each event is encoded into a single u64, where the high 48 bits are used
      for the timestamp and the low 16 bits are for the irq number.
      
      A static key is introduced so when the irq prediction is switched off at
      runtime, the overhead is near to zero.
      
      It results in most of the code in internals.h for inline reasons and a very
      few in the new file timings.c. The latter will contain more in the next patch
      which will provide the statistical model for the next event prediction.
      Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NNicolas Pitre <nicolas.pitre@linaro.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: "Rafael J . Wysocki" <rafael@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Link: http://lkml.kernel.org/r/1498227072-5980-1-git-send-email-daniel.lezcano@linaro.org
      b2d3d61a
    • V
      PM / OPP: Add dev_pm_opp_{set|put}_clkname() · 829a4e8c
      Viresh Kumar 提交于
      In order to support OPP switching, OPP layer needs to get pointer to the
      clock for the device. Simple cases work fine without using the routines
      added by this patch (i.e.  by passing connection-id as NULL), but for a
      device with multiple clocks available, the OPP core needs to know the
      exact name of the clk to use.
      
      Add a new set of APIs to get that done.
      Tested-by: NRajendra Nayak <rnayak@codeaurora.org>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      829a4e8c
    • T
      slub: make sysfs file removal asynchronous · 3b7b3140
      Tejun Heo 提交于
      Commit bf5eb3de ("slub: separate out sysfs_slab_release() from
      sysfs_slab_remove()") made slub sysfs file removals synchronous to
      kmem_cache shutdown.
      
      Unfortunately, this created a possible ABBA deadlock between slab_mutex
      and sysfs draining mechanism triggering the following lockdep warning.
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        4.10.0-test+ #48 Not tainted
        -------------------------------------------------------
        rmmod/1211 is trying to acquire lock:
         (s_active#120){++++.+}, at: [<ffffffff81308073>] kernfs_remove+0x23/0x40
      
        but task is already holding lock:
         (slab_mutex){+.+.+.}, at: [<ffffffff8120f691>] kmem_cache_destroy+0x41/0x2d0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (slab_mutex){+.+.+.}:
      	 lock_acquire+0xf6/0x1f0
      	 __mutex_lock+0x75/0x950
      	 mutex_lock_nested+0x1b/0x20
      	 slab_attr_store+0x75/0xd0
      	 sysfs_kf_write+0x45/0x60
      	 kernfs_fop_write+0x13c/0x1c0
      	 __vfs_write+0x28/0x120
      	 vfs_write+0xc8/0x1e0
      	 SyS_write+0x49/0xa0
      	 entry_SYSCALL_64_fastpath+0x1f/0xc2
      
        -> #0 (s_active#120){++++.+}:
      	 __lock_acquire+0x10ed/0x1260
      	 lock_acquire+0xf6/0x1f0
      	 __kernfs_remove+0x254/0x320
      	 kernfs_remove+0x23/0x40
      	 sysfs_remove_dir+0x51/0x80
      	 kobject_del+0x18/0x50
      	 __kmem_cache_shutdown+0x3e6/0x460
      	 kmem_cache_destroy+0x1fb/0x2d0
      	 kvm_exit+0x2d/0x80 [kvm]
      	 vmx_exit+0x19/0xa1b [kvm_intel]
      	 SyS_delete_module+0x198/0x1f0
      	 entry_SYSCALL_64_fastpath+0x1f/0xc2
      
        other info that might help us debug this:
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(slab_mutex);
      				 lock(s_active#120);
      				 lock(slab_mutex);
          lock(s_active#120);
      
         *** DEADLOCK ***
      
        2 locks held by rmmod/1211:
         #0:  (cpu_hotplug.dep_map){++++++}, at: [<ffffffff810a7877>] get_online_cpus+0x37/0x80
         #1:  (slab_mutex){+.+.+.}, at: [<ffffffff8120f691>] kmem_cache_destroy+0x41/0x2d0
      
        stack backtrace:
        CPU: 3 PID: 1211 Comm: rmmod Not tainted 4.10.0-test+ #48
        Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
        Call Trace:
         print_circular_bug+0x1be/0x210
         __lock_acquire+0x10ed/0x1260
         lock_acquire+0xf6/0x1f0
         __kernfs_remove+0x254/0x320
         kernfs_remove+0x23/0x40
         sysfs_remove_dir+0x51/0x80
         kobject_del+0x18/0x50
         __kmem_cache_shutdown+0x3e6/0x460
         kmem_cache_destroy+0x1fb/0x2d0
         kvm_exit+0x2d/0x80 [kvm]
         vmx_exit+0x19/0xa1b [kvm_intel]
         SyS_delete_module+0x198/0x1f0
         ? SyS_delete_module+0x5/0x1f0
         entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      It'd be the cleanest to deal with the issue by removing sysfs files
      without holding slab_mutex before the rest of shutdown; however, given
      the current code structure, it is pretty difficult to do so.
      
      This patch punts sysfs file removal to a work item.  Before commit
      bf5eb3de, the removal was punted to a RCU delayed work item which is
      executed after release.  Now, we're punting to a different work item on
      shutdown which still maintains the goal removing the sysfs files earlier
      when destroying kmem_caches.
      
      Link: http://lkml.kernel.org/r/20170620204512.GI21326@htj.duckdns.org
      Fixes: bf5eb3de ("slub: separate out sysfs_slab_release() from sysfs_slab_remove()")
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Tested-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b7b3140
  10. 23 6月, 2017 14 次提交
    • M
      genirq/irqdomain: Remove auto-recursive hierarchy support · 6a6544e5
      Marc Zyngier 提交于
      It did seem like a good idea at the time, but it never really
      caught on, and auto-recursive domains remain unused 3 years after
      having been introduced.
      
      Oh well, time for a late spring cleanup.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6a6544e5
    • M
      genirq/irqdomain: Add irq_domain_update_bus_token helper · 61d0a000
      Marc Zyngier 提交于
      We can have irq domains that are identified by the same fwnode
      (because they are serviced by the same HW), and yet have different
      functionnality (because they serve different busses, for example).
      This is what we use the bus_token field.
      
      Since we don't use this field when generating the domain name,
      all the aliasing domains will get the same name, and the debugfs
      file creation fails. Also, bus_token is updated by individual drivers,
      and the core code is unaware of that update.
      
      In order to sort this mess, let's introduce a helper that takes care
      of updating bus_token, and regenerate the debugfs file.
      
      A separate patch will update all the individual users.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      61d0a000
    • T
      genirq: Introduce IRQD_SINGLE_TARGET flag · d52dd441
      Thomas Gleixner 提交于
      Many interrupt chips allow only a single CPU as interrupt target. The core
      code has no knowledge about that. That's unfortunate as it could avoid
      trying to readd a newly online CPU to the effective affinity mask.
      
      Add the status flag and the necessary accessors.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: http://lkml.kernel.org/r/20170619235447.352343969@linutronix.de
      d52dd441
    • T
      genirq/cpuhotplug: Handle managed IRQs on CPU hotplug · c5cb83bb
      Thomas Gleixner 提交于
      If a CPU goes offline, interrupts affine to the CPU are moved away. If the
      outgoing CPU is the last CPU in the affinity mask the migration code breaks
      the affinity and sets it it all online cpus.
      
      This is a problem for affinity managed interrupts as CPU hotplug is often
      used for power management purposes. If the affinity is broken, the
      interrupt is not longer affine to the CPUs to which it was allocated.
      
      The affinity spreading allows to lay out multi queue devices in a way that
      they are assigned to a single CPU or a group of CPUs. If the last CPU goes
      offline, then the queue is not longer used, so the interrupt can be
      shutdown gracefully and parked until one of the assigned CPUs comes online
      again.
      
      Add a graceful shutdown mechanism into the irq affinity breaking code path,
      mark the irq as MANAGED_SHUTDOWN and leave the affinity mask unmodified.
      
      In the online path, scan the active interrupts for managed interrupts and
      if the interrupt is functional and the newly online CPU is part of the
      affinity mask, restart the interrupt if it is marked MANAGED_SHUTDOWN or if
      the interrupts is started up, try to add the CPU back to the effective
      affinity mask.
      Originally-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170619235447.273417334@linutronix.de
      c5cb83bb
    • T
      genirq: Handle managed irqs gracefully in irq_startup() · 761ea388
      Thomas Gleixner 提交于
      Affinity managed interrupts should keep their assigned affinity accross CPU
      hotplug. To avoid magic hackery in device drivers, the core code shall
      manage them transparently and set these interrupts into a managed shutdown
      state when the last CPU of the assigned affinity mask goes offline. The
      interrupt will be restarted when one of the CPUs in the assigned affinity
      mask comes back online.
      
      Add the necessary logic to irq_startup(). If an interrupt is requested and
      started up, the code checks whether it is affinity managed and if so, it
      checks whether a CPU in the interrupts affinity mask is online. If not, it
      puts the interrupt into managed shutdown state. 
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: http://lkml.kernel.org/r/20170619235447.189851170@linutronix.de
      761ea388
    • T
      genirq: Introduce IRQD_MANAGED_SHUTDOWN · 54fdf6a0
      Thomas Gleixner 提交于
      Affinity managed interrupts should keep their assigned affinity accross CPU
      hotplug. To avoid magic hackery in device drivers, the core code shall
      manage them transparently. This will set these interrupts into a managed
      shutdown state when the last CPU of the assigned affinity mask goes
      offline. The interrupt will be restarted when one of the CPUs in the
      assigned affinity mask comes back online.
      
      Introduce the necessary state flag and the accessor functions.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: http://lkml.kernel.org/r/20170619235446.954523476@linutronix.de
      54fdf6a0
    • T
      genirq: Introduce effective affinity mask · 0d3f5425
      Thomas Gleixner 提交于
      There is currently no way to evaluate the effective affinity mask of a
      given interrupt. Many irq chips allow only a single target CPU or a subset
      of CPUs in the affinity mask.
      
      Updating the mask at the time of setting the affinity to the subset would
      be counterproductive because information for cpu hotplug about assigned
      interrupt affinities gets lost. On CPU hotplug it's also pointless to force
      migrate an interrupt, which is not targeted at the CPU effectively. But
      currently the information is not available.
      
      Provide a seperate mask to be updated by the irq_chip->irq_set_affinity()
      implementations. Implement the read only proc files so the user can see the
      effective mask as well w/o trying to deduce it from /proc/interrupts.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: http://lkml.kernel.org/r/20170619235446.247834245@linutronix.de
      0d3f5425
    • T
      genirq: Move irq_fixup_move_pending() to core · 36d84fb4
      Thomas Gleixner 提交于
      Now that x86 uses the generic code, the function declaration and inline
      stub can move to the core internal header.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: http://lkml.kernel.org/r/20170619235445.928156166@linutronix.de
      36d84fb4
    • T
      genirq/cpuhotplug: Add support for cleaning up move in progress · f0383c24
      Thomas Gleixner 提交于
      In order to move x86 to the generic hotplug migration code, add support for
      cleaning up move in progress bits.
      
      On architectures which have this x86 specific (mis)feature not enabled,
      this is optimized out by the compiler.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: http://lkml.kernel.org/r/20170619235445.525817311@linutronix.de
      f0383c24
    • T
      genirq: Provide irq_fixup_move_pending() · cdd16365
      Thomas Gleixner 提交于
      If an CPU goes offline, the interrupts are migrated away, but a eventually
      pending interrupt move, which has not yet been made effective is kept
      pending even if the outgoing CPU is the sole target of the pending affinity
      mask. What's worse is, that the pending affinity mask is discarded even if
      it would contain a valid subset of the online CPUs.
      
      Implement a helper function which allows to avoid these issues.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: http://lkml.kernel.org/r/20170619235444.691345468@linutronix.de
      cdd16365
    • T
      genirq: Add missing comment for IRQD_STARTED · 1bb04016
      Thomas Gleixner 提交于
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: http://lkml.kernel.org/r/20170619235444.614913014@linutronix.de
      1bb04016
    • T
      genirq/debugfs: Add proper debugfs interface · 087cdfb6
      Thomas Gleixner 提交于
      Debugging (hierarchical) interupt domains is tedious as there is no
      information about the hierarchy and no information about states of
      interrupts in the various domain levels.
      
      Add a debugfs directory 'irq' and subdirectories 'domains' and 'irqs'.
      
      The domains directory contains the domain files. The content is information
      about the domain. If the domain is part of a hierarchy then the parent
      domains are printed as well.
      
      # ls /sys/kernel/debug/irq/domains/
      default     INTEL-IR-2	    INTEL-IR-MSI-2  IO-APIC-IR-2  PCI-MSI
      DMAR-MSI    INTEL-IR-3	    INTEL-IR-MSI-3  IO-APIC-IR-3  unknown-1
      INTEL-IR-0  INTEL-IR-MSI-0  IO-APIC-IR-0    IO-APIC-IR-4  VECTOR
      INTEL-IR-1  INTEL-IR-MSI-1  IO-APIC-IR-1    PCI-HT
      
      # cat /sys/kernel/debug/irq/domains/VECTOR 
      name:   VECTOR
       size:   0
       mapped: 216
       flags:  0x00000041
      
      # cat /sys/kernel/debug/irq/domains/IO-APIC-IR-0 
      name:   IO-APIC-IR-0
       size:   24
       mapped: 19
       flags:  0x00000041
       parent: INTEL-IR-3
          name:   INTEL-IR-3
           size:   65536
           mapped: 167
           flags:  0x00000041
           parent: VECTOR
              name:   VECTOR
               size:   0
               mapped: 216
               flags:  0x00000041
      
      Unfortunately there is no per cpu information about the VECTOR domain (yet).
      
      The irqs directory contains detailed information about mapped interrupts.
      
      # cat /sys/kernel/debug/irq/irqs/3
      handler:  handle_edge_irq
      status:   0x00004000
      istate:   0x00000000
      ddepth:   1
      wdepth:   0
      dstate:   0x01018000
                  IRQD_IRQ_DISABLED
                  IRQD_SINGLE_TARGET
                  IRQD_MOVE_PCNTXT
      node:     0
      affinity: 0-143
      effectiv: 0
      pending:  
      domain:  IO-APIC-IR-0
       hwirq:   0x3
       chip:    IR-IO-APIC
        flags:   0x10
                   IRQCHIP_SKIP_SET_WAKE
       parent:
          domain:  INTEL-IR-3
           hwirq:   0x20000
           chip:    INTEL-IR
            flags:   0x0
           parent:
              domain:  VECTOR
               hwirq:   0x3
               chip:    APIC
                flags:   0x0
      
      This was developed to simplify the debugging of the managed affinity
      changes.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: http://lkml.kernel.org/r/20170619235444.537566163@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      087cdfb6
    • T
      genirq/irqdomain: Add map counter · 9dc6be3d
      Thomas Gleixner 提交于
      Add a map counter instead of counting radix tree entries for
      diagnosis. That also gives correct information for linear domains.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: http://lkml.kernel.org/r/20170619235444.459397746@linutronix.de
      9dc6be3d
    • T
      genirq: Allow fwnode to carry name information only · d59f6617
      Thomas Gleixner 提交于
      In order to provide proper debug interface it's required to have domain
      names available when the domain is added. Non fwnode based architectures
      like x86 have no way to do so.
      
      It's not possible to use domain ops or host data for this as domain ops
      might be the same for several instances, but the names have to be unique.
      
      Extend the irqchip fwnode to allow transporting the domain name. If no node
      is supplied, create a 'unknown-N' placeholder.
      
      Warn if an invalid node is supplied and treat it like no node. This happens
      e.g. with i2 devices on x86 which hand in an ACPI type node which has no
      interface for retrieving the name.
      
      [ Folded a fix from Marc to make DT name parsing work ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Link: http://lkml.kernel.org/r/20170619235443.588784933@linutronix.de
      d59f6617
  11. 22 6月, 2017 1 次提交