1. 07 3月, 2015 2 次提交
  2. 06 3月, 2015 1 次提交
  3. 05 3月, 2015 4 次提交
    • T
      workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE · 8603e1b3
      Tejun Heo 提交于
      cancel[_delayed]_work_sync() are implemented using
      __cancel_work_timer() which grabs the PENDING bit using
      try_to_grab_pending() and then flushes the work item with PENDING set
      to prevent the on-going execution of the work item from requeueing
      itself.
      
      try_to_grab_pending() can always grab PENDING bit without blocking
      except when someone else is doing the above flushing during
      cancelation.  In that case, try_to_grab_pending() returns -ENOENT.  In
      this case, __cancel_work_timer() currently invokes flush_work().  The
      assumption is that the completion of the work item is what the other
      canceling task would be waiting for too and thus waiting for the same
      condition and retrying should allow forward progress without excessive
      busy looping
      
      Unfortunately, this doesn't work if preemption is disabled or the
      latter task has real time priority.  Let's say task A just got woken
      up from flush_work() by the completion of the target work item.  If,
      before task A starts executing, task B gets scheduled and invokes
      __cancel_work_timer() on the same work item, its try_to_grab_pending()
      will return -ENOENT as the work item is still being canceled by task A
      and flush_work() will also immediately return false as the work item
      is no longer executing.  This puts task B in a busy loop possibly
      preventing task A from executing and clearing the canceling state on
      the work item leading to a hang.
      
      task A			task B			worker
      
      						executing work
      __cancel_work_timer()
        try_to_grab_pending()
        set work CANCELING
        flush_work()
          block for work completion
      						completion, wakes up A
      			__cancel_work_timer()
      			while (forever) {
      			  try_to_grab_pending()
      			    -ENOENT as work is being canceled
      			  flush_work()
      			    false as work is no longer executing
      			}
      
      This patch removes the possible hang by updating __cancel_work_timer()
      to explicitly wait for clearing of CANCELING rather than invoking
      flush_work() after try_to_grab_pending() fails with -ENOENT.
      
      Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com
      
      v3: bit_waitqueue() can't be used for work items defined in vmalloc
          area.  Switched to custom wake function which matches the target
          work item and exclusive wait and wakeup.
      
      v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
          the target bit waitqueue has wait_bit_queue's on it.  Use
          DEFINE_WAIT_BIT() and __wake_up_bit() instead.  Reported by Tomeu
          Vizoso.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NRabin Vincent <rabin.vincent@axis.com>
      Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com>
      Cc: stable@vger.kernel.org
      Tested-by: NJesper Nilsson <jesper.nilsson@axis.com>
      Tested-by: NRabin Vincent <rabin.vincent@axis.com>
      8603e1b3
    • A
      drm/ttm: device address space != CPU address space · 54c4cd68
      Alex Deucher 提交于
      We need to store device offsets in 64 bit as the device
      address space may be larger than the CPU's.
      
      Fixes GPU init failures on radeons with 4GB or more of
      vram on 32 bit kernels.  We put vram at the start of the
      GPU's address space so the gart aperture starts at 4 GB
      causing all GPU addresses in the gart aperture to get
      truncated.
      
      bug:
      https://bugs.freedesktop.org/show_bug.cgi?id=89072
      
      [airlied: fix warning on nouveau build]
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      Cc: thellstrom@vmware.com
      Acked-by: NThomas Hellstrom <thellstrom@vmware.com>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      54c4cd68
    • T
      drm/mm: Support 4 GiB and larger ranges · 440fd528
      Thierry Reding 提交于
      The current implementation is limited by the number of addresses that
      fit into an unsigned long. This causes problems on 32-bit Tegra where
      unsigned long is 32-bit but drm_mm is used to manage an IOVA space of
      4 GiB. Given the 32-bit limitation, the range is limited to 4 GiB - 1
      (or 4 GiB - 4 KiB for page granularity).
      
      This commit changes the start and size of the range to be an unsigned
      64-bit integer, thus allowing much larger ranges to be supported.
      
      [airlied: fix i915 warnings and coloring callback]
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      
      fixupo
      440fd528
    • R
      genirq / PM: Add flag for shared NO_SUSPEND interrupt lines · 17f48034
      Rafael J. Wysocki 提交于
      It currently is required that all users of NO_SUSPEND interrupt
      lines pass the IRQF_NO_SUSPEND flag when requesting the IRQ or the
      WARN_ON_ONCE() in irq_pm_install_action() will trigger.  That is
      done to warn about situations in which unprepared interrupt handlers
      may be run unnecessarily for suspended devices and may attempt to
      access those devices by mistake.  However, it may cause drivers
      that have no technical reasons for using IRQF_NO_SUSPEND to set
      that flag just because they happen to share the interrupt line
      with something like a timer.
      
      Moreover, the generic handling of wakeup interrupts introduced by
      commit 9ce7a258 (genirq: Simplify wakeup mechanism) only works
      for IRQs without any NO_SUSPEND users, so the drivers of wakeup
      devices needing to use shared NO_SUSPEND interrupt lines for
      signaling system wakeup generally have to detect wakeup in their
      interrupt handlers.  Thus if they happen to share an interrupt line
      with a NO_SUSPEND user, they also need to request that their
      interrupt handlers be run after suspend_device_irqs().
      
      In both cases the reason for using IRQF_NO_SUSPEND is not because
      the driver in question has a genuine need to run its interrupt
      handler after suspend_device_irqs(), but because it happens to
      share the line with some other NO_SUSPEND user.  Otherwise, the
      driver would do without IRQF_NO_SUSPEND just fine.
      
      To make it possible to specify that condition explicitly, introduce
      a new IRQ action handler flag for shared IRQs, IRQF_COND_SUSPEND,
      that, when set, will indicate to the IRQ core that the interrupt
      user is generally fine with suspending the IRQ, but it also can
      tolerate handler invocations after suspend_device_irqs() and, in
      particular, it is capable of detecting system wakeup and triggering
      it as appropriate from its interrupt handler.
      
      That will allow us to work around a problem with a shared timer
      interrupt line on at91 platforms.
      
      Link: http://marc.info/?l=linux-kernel&m=142252777602084&w=2
      Link: http://marc.info/?t=142252775300011&r=1&w=2
      Link: https://lkml.org/lkml/2014/12/15/552Reported-by: NBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      17f48034
  4. 04 3月, 2015 1 次提交
    • T
      NFS: Fix a regression in the read() syscall · 874f9463
      Trond Myklebust 提交于
      When invalidating the page cache for a regular file, we want to first
      sync all dirty data to disk and then call invalidate_inode_pages2().
      The latter relies on nfs_launder_page() and nfs_release_page() to deal
      respectively with dirty pages, and unstable written pages.
      
      When commit 95905446 ("NFS: avoid deadlocks with loop-back mounted
      NFS filesystems.") changed the behaviour of nfs_release_page(), then it
      made it possible for invalidate_inode_pages2() to fail with an EBUSY.
      Unfortunately, that error is then propagated back to read().
      
      Let's therefore work around the problem for now by protecting the call
      to sync the data and invalidate_inode_pages2() so that they are atomic
      w.r.t. the addition of new writes.
      Later on, we can revisit whether or not we still need nfs_launder_page()
      and nfs_release_page().
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      874f9463
  5. 03 3月, 2015 1 次提交
  6. 02 3月, 2015 3 次提交
  7. 28 2月, 2015 1 次提交
  8. 27 2月, 2015 1 次提交
  9. 26 2月, 2015 2 次提交
    • T
      OMAPDSS: fix regression with display sysfs files · a38bb793
      Tomi Valkeinen 提交于
      omapdss's sysfs directories for displays used to have 'name' file,
      giving the name for the display. This file was later renamed to
      'display_name' to avoid conflicts with i2c sysfs 'name' file. Looks like
      at least xserver-xorg-video-omap3 requires the 'name' file to be
      present.
      
      To fix the regression, this patch creates new kobjects for each display,
      allowing us to create sysfs directories for the displays. This way we
      have the whole directory for omapdss, and there will be no sysfs file
      clashes with the underlying display device's sysfs files.
      
      We can thus add the 'name' sysfs file back.
      Signed-off-by: NTomi Valkeinen <tomi.valkeinen@ti.com>
      Tested-by: NNeilBrown <neilb@suse.de>
      a38bb793
    • M
      genirq / PM: better describe IRQF_NO_SUSPEND semantics · 737eb030
      Mark Rutland 提交于
      The IRQF_NO_SUSPEND flag is intended to be used for interrupts required
      to be enabled during the suspend-resume cycle. This mostly consists of
      IPIs and timer interrupts, potentially including chained irqchip
      interrupts if these are necessary to handle timers or IPIs. If an
      interrupt does not fall into one of the aforementioned categories,
      requesting it with IRQF_NO_SUSPEND is likely incorrect.
      
      Using IRQF_NO_SUSPEND does not guarantee that the interrupt can wake the
      system from a suspended state. For an interrupt to be able to trigger a
      wakeup, it may be necessary to program various components of the system.
      In these cases it is necessary to use {enable,disabled}_irq_wake.
      
      Unfortunately, several drivers assume that IRQF_NO_SUSPEND ensures that
      an IRQ can wake up the system, and the documentation can be read
      ambiguously w.r.t. this property.
      
      This patch updates the documentation regarding IRQF_NO_SUSPEND to make
      this caveat explicit, hopefully making future misuse rarer. Cleanup of
      existing misuse will occur as part of later patch series.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      737eb030
  10. 25 2月, 2015 1 次提交
  11. 24 2月, 2015 2 次提交
    • J
    • D
      x86/xen: allow privcmd hypercalls to be preempted · fdfd811d
      David Vrabel 提交于
      Hypercalls submitted by user space tools via the privcmd driver can
      take a long time (potentially many 10s of seconds) if the hypercall
      has many sub-operations.
      
      A fully preemptible kernel may deschedule such as task in any upcall
      called from a hypercall continuation.
      
      However, in a kernel with voluntary or no preemption, hypercall
      continuations in Xen allow event handlers to be run but the task
      issuing the hypercall will not be descheduled until the hypercall is
      complete and the ioctl returns to user space.  These long running
      tasks may also trigger the kernel's soft lockup detection.
      
      Add xen_preemptible_hcall_begin() and xen_preemptible_hcall_end() to
      bracket hypercalls that may be preempted.  Use these in the privcmd
      driver.
      
      When returning from an upcall, call xen_maybe_preempt_hcall() which
      adds a schedule point if if the current task was within a preemptible
      hypercall.
      
      Since _cond_resched() can move the task to a different CPU, clear and
      set xen_in_preemptible_hcall around the call.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      fdfd811d
  12. 23 2月, 2015 5 次提交
  13. 22 2月, 2015 2 次提交
  14. 21 2月, 2015 2 次提交
    • D
      caif: fix a signedness bug in cfpkt_iterate() · 278f7b4f
      Dan Carpenter 提交于
      The cfpkt_iterate() function can return -EPROTO on error, but the
      function is a u16 so the negative value gets truncated to a positive
      unsigned short.  This causes a static checker warning.
      
      The only caller which might care is cffrml_receive(), when it's checking
      the frame checksum.  I modified cffrml_receive() so that it never says
      -EPROTO is a valid checksum.
      
      Also this isn't ever going to be inlined so I removed the "inline".
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      278f7b4f
    • G
      net: Initialize all members in skb_gro_remcsum_init() · 846cd667
      Geert Uytterhoeven 提交于
      skb_gro_remcsum_init() initializes the gro_remcsum.delta member only,
      leading to compiler warnings about a possibly uninitialized
      gro_remcsum.offset member:
      
      drivers/net/vxlan.c: In function ‘vxlan_gro_receive’:
      drivers/net/vxlan.c:602: warning: ‘grc.offset’ may be used uninitialized in this function
      net/ipv4/fou.c: In function ‘gue_gro_receive’:
      net/ipv4/fou.c:262: warning: ‘grc.offset’ may be used uninitialized in this function
      
      While these are harmless for now:
        - skb_gro_remcsum_process() sets offset before changing delta,
        - skb_gro_remcsum_cleanup() checks if delta is non-zero before
          accessing offset,
      it's safer to let the initialization function initialize all members.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      846cd667
  15. 20 2月, 2015 6 次提交
    • K
      NVMe: Fix potential corruption during shutdown · 07836e65
      Keith Busch 提交于
      The driver has to end unreturned commands at some point even if the
      controller has not provided a completion. The driver tried to be safe by
      deleting IO queues prior to ending all unreturned commands. That should
      cause the controller to internally abort inflight commands, but IO queue
      deletion request does not have to be successful, so all bets are off. We
      still have to make progress, so to be extra safe, this patch doesn't
      clear a queue to release the dma mapping for a command until after the
      pci device has been disabled.
      
      This patch removes the special handling during device initialization
      so controller recovery can be done all the time. This is possible since
      initialization is not inlined with pci probe anymore.
      Reported-by: NNilish Choudhury <nilesh.choudhury@oracle.com>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      07836e65
    • K
      NVMe: Asynchronous controller probe · 2e1d8448
      Keith Busch 提交于
      This performs the longest parts of nvme device probe in scheduled work.
      This speeds up probe significantly when multiple devices are in use.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      2e1d8448
    • K
      NVMe: Register management handle under nvme class · b3fffdef
      Keith Busch 提交于
      This creates a new class type for nvme devices to register their
      management character devices with. This is so we do not rely on miscdev
      to provide enough minors for as many nvme devices some people plan to
      use. The previous limit was approximately 60 NVMe controllers, depending
      on the platform and kernel. Now the limit is 1M, which ought to be enough
      for anybody.
      
      Since we have a new device class, it makes sense to attach the block
      devices under this as well, so part of this patch moves the management
      handle initialization prior to the namespaces discovery.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      b3fffdef
    • K
      NVMe: Update SCSI Inquiry VPD 83h translation · 4f1982b4
      Keith Busch 提交于
      The original translation created collisions on Inquiry VPD 83 for many
      existing devices. Newer specifications provide other ways to translate
      based on the device's version can be used to create unique identifiers.
      
      Version 1.1 provides an EUI64 field that uniquely identifies each
      namespace, and 1.2 added the longer NGUID field for the same reason.
      Both follow the IEEE EUI format and readily translate to the SCSI device
      identification EUI designator type 2h. For devices implementing either,
      the translation will use this type, defaulting to the EUI64 8-byte type if
      implemented then NGUID's 16 byte version if not. If neither are provided,
      the 1.0 translation is used, and is updated to use the SCSI String format
      to guarantee a unique identifier.
      
      Knowing when to use the new fields depends on the nvme controller's
      revision. The NVME_VS macro was not decoding this correctly, so that is
      fixed in this patch and moved to a more appropriate place.
      
      Since the Identify Namespace structure required an update for the NGUID
      field, this patch adds the remaining new 1.2 fields to the structure.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      4f1982b4
    • K
      NVMe: Metadata format support · e1e5e564
      Keith Busch 提交于
      Adds support for NVMe metadata formats and exposes block devices for
      all namespaces regardless of their format. Namespace formats that are
      unusable will have disk capacity set to 0, but a handle to the block
      device is created to simplify device management. A namespace is not
      usable when the format requires host interleave block and metadata in
      single buffer, has no provisioned storage, or has better data but failed
      to register with blk integrity.
      
      The namespace has to be scanned in two phases to support separate
      metadata formats. The first establishes the sector size and capacity
      prior to invoking add_disk. If metadata is required, the capacity will
      be temporarilly set to 0 until it can be revalidated and registered with
      the integrity extenstions after add_disk completes.
      
      The driver relies on the integrity extensions to provide the metadata
      buffer. NVMe requires this be a single physically contiguous region,
      so only one integrity segment is allowed per command. If the metadata
      is used for T10 PI, the driver provides mappings to save and restore
      the reftag physical block translation. The driver provides no-op
      functions for generate and verify if metadata is not used for protection
      information. This way the setup is always provided by the block layer.
      
      If a request does not supply a required metadata buffer, the command
      is failed with bad address. This could only happen if a user manually
      disables verify/generate on such a disk. The only exception to where
      this is okay is if the controller is capable of stripping/generating
      the metadata, which is possible on some types of formats.
      
      The metadata scatter gather list now occupies the spot in the nvme_iod
      that used to be used to link retryable IOD's, but we don't do that
      anymore, so the field was unused.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      e1e5e564
    • D
      kdb: Avoid printing KERN_ levels to consoles · f7d4ca8b
      Daniel Thompson 提交于
      Currently when kdb traps printk messages then the raw log level prefix
      (consisting of '\001' followed by a numeral) does not get stripped off
      before the message is issued to the various I/O handlers supported by
      kdb. This causes annoying visual noise as well as causing problems
      grepping for ^. It is also a change of behaviour compared to normal usage
      of printk() usage. For example <SysRq>-h ends up with different output to
      that of kdb's "sr h".
      
      This patch addresses the problem by stripping log levels from messages
      before they are issued to the I/O handlers. printk() which can also
      act as an i/o handler in some cases is special cased; if the caller
      provided a log level then the prefix will be preserved when sent to
      printk().
      
      The addition of non-printable characters to the output of kdb commands is a
      regression, albeit and extremely elderly one, introduced by commit
      04d2c8c8 ("printk: convert the format for KERN_<LEVEL> to a 2 byte
      pattern"). Note also that this patch does *not* restore the original
      behaviour from v3.5. Instead it makes printk() from within a kdb command
      display the message without any prefix (i.e. like printk() normally does).
      Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
      Cc: Joe Perches <joe@perches.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      f7d4ca8b
  16. 19 2月, 2015 6 次提交