提交 · a7c9b603cf2371edacb054abc35597e810c1e5fd · openeuler / Kernel

05 3月, 2016 14 次提交

Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · a7c9b603

由 Linus Torvalds 提交于 3月 04, 2016

Pull libnvcimm fix from Dan Williams:
 "One straggling fix for NVDIMM support.

  The KVM/QEMU enabling for NVDIMMs has recently reached the point where
  it is able to accept some ACPI _DSM requests from a guest VM.  However
  they immediately found that the 4.5-rc kernel is unusable because the
  kernel's 'nfit' driver fails to load upon seeing a valid "not
  supported" response from the virtual BIOS for an address range scrub
  command.

  It is not mandatory that a platform implement address range scrubbing,
  so this fix from Vishal properly treats the 'not supported' response
  as 'skip scrubbing and continue loading the driver'"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
  nfit: Continue init even if ARS commands are unimplemented

a7c9b603

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · c12f83c3

由 Linus Torvalds 提交于 3月 04, 2016

Pull SCSI fixes from James Bottomley:
 "Two fairly simple fixes.

  One is a regression with ipr firmware loading caused by one of the
  trivial patches in the last merge window which failed to strip the \n
  from the file name string, so now the firmware loader no longer works
  leading to a lot of unhappy ipr users; fix by stripping the \n.

  The second is a memory leak within SCSI: the BLK_PREP_INVALID state
  was introduced a recent fix but we forgot to account for it correctly
  when freeing state, resulting in memory leakage.  Add the correct
  state freeing in scsi_prep_return()"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  ipr: Fix regression when loading firmware
  SCSI: Free resources when we return BLKPREP_INVALID

c12f83c3

Merge branch 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata · fab3e94a

由 Linus Torvalds 提交于 3月 04, 2016

Pull libata fixes from Tejun Heo:
 "Assorted fixes for libata drivers.

   - Turns out HDIO_GET_32BIT ioctl was subtly broken all along.

   - Recent update to ahci external port handling was incorrectly
     marking hotpluggable ports as external making userland handle
     devices connected to those ports incorrectly.

   - ahci_xgene needs its own irq handler to work around a hardware
     erratum.  libahci updated to allow irq handler override.

   - Misc driver specific updates"

* 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
  ata: ahci: don't mark HotPlugCapable Ports as external/removable
  ahci: Workaround for ThunderX Errata#22536
  libata: Align ata_device's id on a cacheline
  Adding Intel Lewisburg device IDs for SATA
  pata-rb532-cf: get rid of the irq_to_gpio() call
  libata: fix HDIO_GET_32BIT ioctl
  ahci_xgene: Implement the workaround to fix the missing of the edge interrupt for the HOST_IRQ_STAT.
  ata: Remove the AHCI_HFLAG_EDGE_IRQ support from libahci.
  libahci: Implement the capability to override the generic ahci interrupt handler.

fab3e94a

Merge branch 'for-linus2' of git://git.kernel.dk/linux-block · e5322c54

由 Linus Torvalds 提交于 3月 04, 2016

Pull block fixes from Jens Axboe:
 "Round 2 of this.  I cut back to the bare necessities, the patch is
  still larger than it usually would be at this time, due to the number
  of NVMe fixes in there.  This pull request contains:

   - The 4 core fixes from Ming, that fix both problems with exceeding
     the virtual boundary limit in case of merging, and the gap checking
     for cloned bio's.

   - NVMe fixes from Keith and Christoph:

        - Regression on larger user commands, causing problems with
          reading log pages (for instance). This touches both NVMe,
          and the block core since that is now generally utilized also
          for these types of commands.

        - Hot removal fixes.

        - User exploitable issue with passthrough IO commands, if !length
          is given, causing us to fault on writing to the zero
          page.

        - Fix for a hang under error conditions

   - And finally, the current series regression for umount with cgroup
     writeback, where the final flush would happen async and hence open
     up window after umount where the device wasn't consistent.  fsck
     right after umount would show this.  From Tejun"

* 'for-linus2' of git://git.kernel.dk/linux-block:
  block: support large requests in blk_rq_map_user_iov
  block: fix blk_rq_get_max_sectors for driver private requests
  nvme: fix max_segments integer truncation
  nvme: set queue limits for the admin queue
  writeback: flush inode cgroup wb switches instead of pinning super_block
  NVMe: Fix 0-length integrity payload
  NVMe: Don't allow unsupported flags
  NVMe: Move error handling to failed reset handler
  NVMe: Simplify device reset failure
  NVMe: Fix namespace removal deadlock
  NVMe: Use IDA for namespace disk naming
  NVMe: Don't unmap controller registers on reset
  block: merge: get the 1st and last bvec via helpers
  block: get the 1st and last bvec via helpers
  block: check virt boundary in bio_will_gap()
  block: bio: introduce helpers to get the 1st and last bvec

e5322c54

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · bdf9d297

由 Linus Torvalds 提交于 3月 04, 2016

Pull rdma fixes from Doug Ledford:
 "Additional 4.5-rc6 fixes.

  I have four patches today.  I had previously thought I had submitted
  two of them last week, but they were accidentally skipped :-(.

   - One fix to an error path in the core
   - One fix for RoCE in the core
   - Two related fixes for the core/mlx5"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/core: Use GRH when the path hop-limit > 0
  IB/{core, mlx5}: Fix input len in vendor part of create_qp/srq
  IB/mlx5: Avoid using user-index for SRQs
  IB/core: Fix missed clean call in registration path

bdf9d297

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 638c201e

由 Linus Torvalds 提交于 3月 04, 2016

Pull drm fixes from Dave Airlie:
 "This contains one i915 patch twice, as I merged it locally for
  testing, and then pulled some stuff in on top, and then Jani sent to
  me, I didn't think it was worth redoing all the merges of what I had
  tested.

  Summary:

   - amdgpu/radeon fixes for some more power management and VM races.

   - Two i915 fixes, one for the a recent regression, one another power
     management fix for skylake.

   - Two tegra dma mask fixes for a regression.

   - One ast fix for a typo I made transcribing the userspace driver,
     that I'd like to get into stable so I don't forget about it"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  gpu: host1x: Set DMA ops on device creation
  gpu: host1x: Set DMA mask
  drm/amdgpu: return from atombios_dp_get_dpcd only when error
  drm/amdgpu/cz: remove commented out call to enable vce pg
  drm/amdgpu/powerplay/cz: enable/disable vce dpm independent of vce pg
  drm/amdgpu/cz: enable/disable vce dpm even if vce pg is disabled
  drm/amdgpu/gfx8: specify which engine to wait before vm flush
  drm/amdgpu: apply gfx_v8 fixes to gfx_v7 as well
  drm/amd/powerplay: send event to notify powerplay all modules are initialized.
  drm/amd/powerplay: export AMD_PP_EVENT_COMPLETE_INIT task to amdgpu.
  drm/radeon/pm: update current crtc info after setting the powerstate
  drm/amdgpu/pm: update current crtc info after setting the powerstate
  drm/i915: Balance assert_rpm_wakelock_held() for !IS_ENABLED(CONFIG_PM)
  drm/i915/skl: Fix power domain suspend sequence
  drm/ast: Fix incorrect register check for DRAM width
  drm/i915: Balance assert_rpm_wakelock_held() for !IS_ENABLED(CONFIG_PM)

638c201e

Merge tag 'pm+acpi-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · b80e8e28

由 Linus Torvalds 提交于 3月 04, 2016

Pull power management and ACPI fixes from Rafael Wysocki:
 "Two build fixes for cpufreq drivers (including one for breakage
  introduced recently) and a fix for a graph tracer crash when used over
  suspend-to-RAM on x86.

  Specifics:

   - Prevent the graph tracer from crashing when used over suspend-to-
     RAM on x86 by pausing it before invoking do_suspend_lowlevel() and
     un-pausing it when that function has returned (Todd Brandt).

   - Fix build issues in the qoriq and mediatek cpufreq drivers related
     to broken dependencies on THERMAL (Arnd Bergmann)"

* tag 'pm+acpi-4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / sleep / x86: Fix crash on graph trace through x86 suspend
  cpufreq: mediatek: allow building as a module
  cpufreq: qoriq: allow building as module with THERMAL=m

b80e8e28

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · ed385c7a

由 Linus Torvalds 提交于 3月 04, 2016

Pull arm64 fix from Will Deacon:
 "Arm64 fix for -rc7.  Without it, our struct page array can overflow
  the vmemmap region on systems with a large PHYS_OFFSET.

  Nothing else on the radar at the moment, so hopefully that's it for
  4.5 from us.

  Summary: Ensure struct page array fits within vmemmap area"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: vmemmap: use virtual projection of linear region

ed385c7a

Merge tag 'for-linus-20160304' of git://git.infradead.org/linux-mtd · c51797d2

由 Linus Torvalds 提交于 3月 04, 2016

Pull jffs2 fixes from David Woodhouse:
 "This contains two important JFFS2 fixes marked for stable:

   - a lock ordering problem between the page lock and the internal
     f->sem mutex, which was causing occasional deadlocks in garbage
     collection

   - a scan failure causing moved directories to sometimes end up
     appearing to have hard links.

  There are also a couple of trivial MAINTAINERS file updates"

* tag 'for-linus-20160304' of git://git.infradead.org/linux-mtd:
  MAINTAINERS: add maintainer entry for FREESCALE GPMI NAND driver
  Fix directory hardlinks from deleted directories
  jffs2: Fix page lock / f->sem deadlock
  Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin"
  MAINTAINERS: update Han's email

c51797d2

Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 2cdcb2b5

由 Linus Torvalds 提交于 3月 04, 2016

Pull btrfs fix from Chris Mason:
 "Filipe nailed down a problem where tree log replay would do some work
  that orphan code wasn't expecting to be done yet, leading to BUG_ON"

* 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix loading of orphan roots leading to BUG_ON

2cdcb2b5

Merge tag 'trace-fixes-v4.5-rc6' of... · 78baab7a

由 Linus Torvalds 提交于 3月 04, 2016

Merge tag 'trace-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull tracing fix from Steven Rostedt:
 "A feature was added in 4.3 that allowed users to filter trace points
  on a tasks "comm" field.  But this prevented filtering on a comm field
  that is within a trace event (like sched_migrate_task).

  When trying to filter on when a program migrated, this change
  prevented the filtering of the sched_migrate_task.

  To fix this, the event fields are examined first, and then the extra
  fields like "comm" and "cpu" are examined.  Also, instead of testing
  to assign the comm filter function based on the field's name, the
  generic comm field is given a new filter type (FILTER_COMM).  When
  this field is used to filter the type is checked.  The same is done
  for the cpu filter field.

  Two new special filter types are added: "COMM" and "CPU".  This allows
  users to still filter the tasks comm for events that have "comm" as
  one of their fields, in cases that users would like to filter
  sched_migrate_task on the comm of the task that called the event, and
  not the comm of the task that is being migrated"

* tag 'trace-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Do not have 'comm' filter override event 'comm' field

78baab7a

nfit: Continue init even if ARS commands are unimplemented · 6e2452df

由 Vishal Verma 提交于 3月 03, 2016

If firmware doesn't implement any of the ARS commands, take that to
mean that ARS is unsupported, and continue to initialize regions without
bad block lists. We cannot make the assumption that ARS commands will be
unconditionally supported on all NVDIMMs.
Reported-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NVishal Verma <vishal.l.verma@intel.com>
Acked-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
Tested-by: NHaozhong Zhang <haozhong.zhang@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

6e2452df

Merge tag 'drm/tegra/for-4.5-rc7' of git://anongit.freedesktop.org/tegra/linux into drm-fixes · 26bae5e0

由 Dave Airlie 提交于 3月 05, 2016

drm/tegra: Fixes for v4.5-rc7

Two small fixes that restore PRIME support.

* tag 'drm/tegra/for-4.5-rc7' of git://anongit.freedesktop.org/tegra/linux:
  gpu: host1x: Set DMA ops on device creation
  gpu: host1x: Set DMA mask

26bae5e0

Merge branches 'pm-cpufreq-fixes' and 'pm-sleep-fixes' · bfc6b97d

由 Rafael J. Wysocki 提交于 3月 04, 2016

* pm-cpufreq-fixes:
  cpufreq: mediatek: allow building as a module
  cpufreq: qoriq: allow building as module with THERMAL=m

* pm-sleep-fixes:
  PM / sleep / x86: Fix crash on graph trace through x86 suspend

bfc6b97d

04 3月, 2016 26 次提交

gpu: host1x: Set DMA ops on device creation · c95469aa

由 Alexandre Courbot 提交于 2月 26, 2016

Currently host1x-instanciated devices have their dma_ops left to NULL,
which makes any DMA operation (like buffer import) on ARM64 fallback
to the dummy_dma_ops and fail with an error.

This patch calls of_dma_configure() with the host1x node when creating
such a device, so the proper DMA operations are set.
Suggested-by: NThierry Reding <thierry.reding@gmail.com>
Signed-off-by: NAlexandre Courbot <acourbot@nvidia.com>
Signed-off-by: NThierry Reding <treding@nvidia.com>

c95469aa

gpu: host1x: Set DMA mask · 097452e6

由 Alexandre Courbot 提交于 2月 26, 2016

The default DMA mask covers a 32 bits address range, but host1x devices
can address a larger range on TK1 and TX1. Set the DMA mask to the range
addressable when we use the IOMMU to prevent the use of bounce buffers.
Signed-off-by: NAlexandre Courbot <acourbot@nvidia.com>
Signed-off-by: NThierry Reding <treding@nvidia.com>

097452e6

tracing: Do not have 'comm' filter override event 'comm' field · e57cbaf0

由 Steven Rostedt (Red Hat) 提交于 3月 03, 2016

Commit 9f616680 "tracing: Allow triggers to filter for CPU ids and
process names" added a 'comm' filter that will filter events based on the
current tasks struct 'comm'. But this now hides the ability to filter events
that have a 'comm' field too. For example, sched_migrate_task trace event.
That has a 'comm' field of the task to be migrated.

echo 'comm == "bash"' > events/sched_migrate_task/filter

will now filter all sched_migrate_task events for tasks named "bash" that
migrates other tasks (in interrupt context), instead of seeing when "bash"
itself gets migrated.

This fix requires a couple of changes.

1) Change the look up order for filter predicates to look at the events
fields before looking at the generic filters.

2) Instead of basing the filter function off of the "comm" name, have the
generic "comm" filter have its own filter_type (FILTER_COMM). Test
against the type instead of the name to assign the filter function.

3) Add a new "COMM" filter that works just like "comm" but will filter based
on the current task, even if the trace event contains a "comm" field.

Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU".

Cc: stable@vger.kernel.org # v4.3+
Fixes: 9f616680 "tracing: Allow triggers to filter for CPU ids and process names"
Reported-by: NMatt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

e57cbaf0

Merge tag 'drm-intel-fixes-2016-03-03' of git://anongit.freedesktop.org/drm-intel into drm-fixes · 0dff9738

由 Dave Airlie 提交于 3月 04, 2016

Small conflict as I had the balance in my tree already for testing.

* tag 'drm-intel-fixes-2016-03-03' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Balance assert_rpm_wakelock_held() for !IS_ENABLED(CONFIG_PM)
  drm/i915/skl: Fix power domain suspend sequence

0dff9738

Btrfs: fix loading of orphan roots leading to BUG_ON · 909c3a22

由 Filipe Manana 提交于 3月 02, 2016

When looking for orphan roots during mount we can end up hitting a
BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is
replayed and qgroups are enabled. This is because after a log tree is
replayed, a transaction commit is made, which triggers qgroup extent
accounting which in turn does backref walking which ends up reading and
inserting all roots in the radix tree fs_info->fs_root_radix, including
orphan roots (deleted snapshots). So after the log tree is replayed, when
finding orphan roots we hit the BUG_ON with the following trace:

[118209.182438] ------------[ cut here ]------------
[118209.183279] kernel BUG at fs/btrfs/root-tree.c:314!
[118209.184074] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[118209.185123] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic ppdev xor raid6_pq evdev sg parport_pc parport acpi_cpufreq tpm_tis tpm psmouse
processor i2c_piix4 serio_raw pcspkr i2c_core button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata
virtio_pci virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs]
[118209.186318] CPU: 14 PID: 28428 Comm: mount Tainted: G        W       4.5.0-rc5-btrfs-next-24+ #1
[118209.186318] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[118209.186318] task: ffff8801ec131040 ti: ffff8800af34c000 task.ti: ffff8800af34c000
[118209.186318] RIP: 0010:[<ffffffffa04237d7>]  [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
[118209.186318] RSP: 0018:ffff8800af34faa8  EFLAGS: 00010246
[118209.186318] RAX: 00000000ffffffef RBX: 00000000ffffffef RCX: 0000000000000001
[118209.186318] RDX: 0000000080000000 RSI: 0000000000000001 RDI: 00000000ffffffff
[118209.186318] RBP: ffff8800af34fb08 R08: 0000000000000001 R09: 0000000000000000
[118209.186318] R10: ffff8800af34f9f0 R11: 6db6db6db6db6db7 R12: ffff880171b97000
[118209.186318] R13: ffff8801ca9d65e0 R14: ffff8800afa2e000 R15: 0000160000000000
[118209.186318] FS:  00007f5bcb914840(0000) GS:ffff88023edc0000(0000) knlGS:0000000000000000
[118209.186318] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[118209.186318] CR2: 00007f5bcaceb5d9 CR3: 00000000b49b5000 CR4: 00000000000006e0
[118209.186318] Stack:
[118209.186318]  fffffbffffffffff 010230ffffffffff 0101000000000000 ff84000000000000
[118209.186318]  fbffffffffffffff 30ffffffffffffff 0000000000000101 ffff880082348000
[118209.186318]  0000000000000000 ffff8800afa2e000 ffff8800afa2e000 0000000000000000
[118209.186318] Call Trace:
[118209.186318]  [<ffffffffa042e2db>] open_ctree+0x1e37/0x21b9 [btrfs]
[118209.186318]  [<ffffffffa040a753>] btrfs_mount+0x97e/0xaed [btrfs]
[118209.186318]  [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
[118209.186318]  [<ffffffff8117b87e>] mount_fs+0x67/0x131
[118209.186318]  [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
[118209.186318]  [<ffffffffa0409f81>] btrfs_mount+0x1ac/0xaed [btrfs]
[118209.186318]  [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
[118209.186318]  [<ffffffff8108c26b>] ? lockdep_init_map+0xb9/0x1b3
[118209.186318]  [<ffffffff8117b87e>] mount_fs+0x67/0x131
[118209.186318]  [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
[118209.186318]  [<ffffffff81195637>] do_mount+0x8a6/0x9e8
[118209.186318]  [<ffffffff8119598d>] SyS_mount+0x77/0x9f
[118209.186318]  [<ffffffff81493017>] entry_SYSCALL_64_fastpath+0x12/0x6b
[118209.186318] Code: 64 00 00 85 c0 89 c3 75 24 f0 41 80 4c 24 20 20 49 8b bc 24 f0 01 00 00 4c 89 e6 e8 e8 65 00 00 85 c0 89 c3 74 11 83 f8 ef 75 02 <0f> 0b
4c 89 e7 e8 da 72 00 00 eb 1c 41 83 bc 24 00 01 00 00 00
[118209.186318] RIP  [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
[118209.186318]  RSP <ffff8800af34faa8>
[118209.230735] ---[ end trace 83938f987d85d477 ]---

So fix this by not treating the error -EEXIST, returned when attempting
to insert a root already inserted by the backref walking code, as an error.

The following test case for xfstests reproduces the bug:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1	# failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
      _cleanup_flakey
      cd /
      rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  # real QA test starts here
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch
  _require_dm_target flakey
  _require_metadata_journaling $SCRATCH_DEV

  rm -f $seqres.full

  _scratch_mkfs >>$seqres.full 2>&1
  _init_flakey
  _mount_flakey

  _run_btrfs_util_prog quota enable $SCRATCH_MNT

  # Create 2 directories with one file in one of them.
  # We use these just to trigger a transaction commit later, moving the file from
  # directory a to directory b and doing an fsync against directory a.
  mkdir $SCRATCH_MNT/a
  mkdir $SCRATCH_MNT/b
  touch $SCRATCH_MNT/a/f
  sync

  # Create our test file with 2 4K extents.
  $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io

  # Create a snapshot and delete it. This doesn't really delete the snapshot
  # immediately, just makes it inaccessible and invisible to user space, the
  # snapshot is deleted later by a dedicated kernel thread (cleaner kthread)
  # which is woke up at the next transaction commit.
  # A root orphan item is inserted into the tree of tree roots, so that if a
  # power failure happens before the dedicated kernel thread does the snapshot
  # deletion, the next time the filesystem is mounted it resumes the snapshot
  # deletion.
  _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
  _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap

  # Now overwrite half of the extents we wrote before. Because we made a snapshpot
  # before, which isn't really deleted yet (since no transaction commit happened
  # after we did the snapshot delete request), the non overwritten extents get
  # referenced twice, once by the default subvolume and once by the snapshot.
  $XFS_IO_PROG -c "pwrite -S 0xbb 4K 8K" $SCRATCH_MNT/foobar | _filter_xfs_io

  # Now move file f from directory a to directory b and fsync directory a.
  # The fsync on the directory a triggers a transaction commit (because a file
  # was moved from it to another directory) and the file fsync leaves a log tree
  # with file extent items to replay.
  mv $SCRATCH_MNT/a/f $SCRATCH_MNT/a/b
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar

  echo "File digest before power failure:"
  md5sum $SCRATCH_MNT/foobar | _filter_scratch

  # Now simulate a power failure and mount the filesystem to replay the log tree.
  # After the log tree was replayed, we used to hit a BUG_ON() when processing
  # the root orphan item for the deleted snapshot. This is because when processing
  # an orphan root the code expected to be the first code inserting the root into
  # the fs_info->fs_root_radix radix tree, while in reallity it was the second
  # caller attempting to do it - the first caller was the transaction commit that
  # took place after replaying the log tree, when updating the qgroup counters.
  _flakey_drop_and_remount

  echo "File digest before after failure:"
  # Must match what he got before the power failure.
  md5sum $SCRATCH_MNT/foobar | _filter_scratch

  _unmount_flakey
  status=0
  exit

Fixes: 2d9e9776 ("Btrfs: use btrfs_get_fs_root in resolve_indirect_ref")
Cc: stable@vger.kernel.org  # 4.4+
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: NChris Mason <clm@fb.com>

909c3a22

block: support large requests in blk_rq_map_user_iov · 4d6af73d

由 Christoph Hellwig 提交于 3月 02, 2016

This patch adds support for larger requests in blk_rq_map_user_iov by
allowing it to build multiple bios for a request.  This functionality
used to exist for the non-vectored blk_rq_map_user in the past, and
this patch reuses the existing functionality for it on the unmap side,
which stuck around.  Thanks to the iov_iter API supporting multiple
bios is fairly trivial, as we can just iterate the iov until we've
consumed the whole iov_iter.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NJeff Lien <Jeff.Lien@hgst.com>
Tested-by: NJeff Lien <Jeff.Lien@hgst.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

4d6af73d

block: fix blk_rq_get_max_sectors for driver private requests · f2101842

由 Christoph Hellwig 提交于 3月 03, 2016

Driver private request types should not get the artifical cap for the
FS requests.  This is important to use the full device capabilities
for internal command or NVMe pass through commands.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NJeff Lien <Jeff.Lien@hgst.com>
Tested-by: NJeff Lien <Jeff.Lien@hgst.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>

Updated by me to use an explicit check for the one command type that
does support extended checking, instead of relying on the ordering
of the enum command values - as suggested by Keith.
Signed-off-by: NJens Axboe <axboe@fb.com>

f2101842

nvme: fix max_segments integer truncation · 45686b61

由 Christoph Hellwig 提交于 3月 02, 2016

The block layer uses an unsigned short for max_segments.  The way we
calculate the value for NVMe tends to generate very large 32-bit values,
which after integer truncation may lead to a zero value instead of
the desired outcome.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NJeff Lien <Jeff.Lien@hgst.com>
Tested-by: NJeff Lien <Jeff.Lien@hgst.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

45686b61

nvme: set queue limits for the admin queue · da35825d

由 Christoph Hellwig 提交于 3月 02, 2016

Factor out a helper to set all the device specific queue limits and apply
them to the admin queue in addition to the I/O queues.  Without this the
command size on the admin queue is arbitrarily low, and the missing
other limitations are just minefields waiting for victims.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NJeff Lien <Jeff.Lien@hgst.com>
Tested-by: NJeff Lien <Jeff.Lien@hgst.com>
Reviewed-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

da35825d

writeback: flush inode cgroup wb switches instead of pinning super_block · a1a0e23e

由 Tejun Heo 提交于 2月 29, 2016

If cgroup writeback is in use, inodes can be scheduled for
asynchronous wb switching.  Before 5ff8eaac ("writeback: keep
superblock pinned during cgroup writeback association switches"), this
could race with umount leading to super_block being destroyed while
inodes are pinned for wb switching.  5ff8eaac fixed it by bumping
s_active while wb switches are in flight; however, this allowed
in-flight wb switches to make umounts asynchronous when the userland
expected synchronosity - e.g. fsck immediately following umount may
fail because the device is still busy.

This patch removes the problematic super_block pinning and instead
makes generic_shutdown_super() flush in-flight wb switches.  wb
switches are now executed on a dedicated isw_wq so that they can be
flushed and isw_nr_in_flight keeps track of the number of in-flight wb
switches so that flushing can be avoided in most cases.

v2: Move cgroup_writeback_umount() further below and add MS_ACTIVE
    check in inode_switch_wbs() as Jan an Al suggested.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NTahsin Erdogan <tahsin@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com
Fixes: 5ff8eaac ("writeback: keep superblock pinned during cgroup writeback association switches")
Cc: stable@vger.kernel.org #v4.5
Reviewed-by: NJan Kara <jack@suse.cz>
Tested-by: NTahsin Erdogan <tahsin@google.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

a1a0e23e

NVMe: Fix 0-length integrity payload · e9fc63d6

由 Keith Busch 提交于 2月 24, 2016

A user could send a passthrough IO command with a metadata pointer to a
namespace without metadata. With metadata length of 0, kmalloc returns
ZERO_SIZE_PTR. Since that is not NULL, the driver would have set this as
the bio's integrity payload, which causes an access fault on completion.

This patch ignores the users metadata buffer if the namespace format
does not support separate metadata.
Reported-by: NStephen Bates <stephen.bates@microsemi.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

e9fc63d6

NVMe: Don't allow unsupported flags · 63088ec7

由 Keith Busch 提交于 2月 24, 2016

The command flags can change the meaning of other fields in the command
that the driver is not prepared to handle. Specifically, the user could
passthrough an SGL flag, causing the controller to misinterpret the PRP
list the driver created, potentially corrupting memory or data.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJon Derrick <jonathan.derrick@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

63088ec7

NVMe: Move error handling to failed reset handler · 69d9a99c

由 Keith Busch 提交于 2月 24, 2016

This moves failed queue handling out of the namespace removal path and
into the reset failure path, fixing a hanging condition if the controller
fails or link down during del_gendisk. Previously the driver had to see
the controller as degraded prior to calling del_gendisk to setup the
queues to fail. But, if the controller happened to fail after this,
there was no task to end outstanding requests.

On failure, all namespace states are set to dead. This has capacity
revalidate to 0, and ends all new requests with error status.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

69d9a99c

NVMe: Simplify device reset failure · f58944e2

由 Keith Busch 提交于 2月 24, 2016

A reset failure schedules the device to unbind from the driver through
the pci driver's remove. This cleans up all intialization, so there is
no need to duplicate the potentially racy cleanup.

To help understand why a reset failed, the status is logged with the
existing warning message.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

f58944e2

NVMe: Fix namespace removal deadlock · 646017a6

由 Keith Busch 提交于 2月 24, 2016

This patch makes nvme namespace removal lockless. It is up to the caller
to ensure no active namespace scanning is occuring. To ensure no scan
work occurs, the nvme pci driver adds a removing state to the controller
device to avoid queueing scan work during removal. The work is flushed
after setting the state, so no new scan work can be queued.

The lockless removal allows the driver to cleanup a namespace
request_queue if the controller fails during removal. Previously this
could deadlock trying to acquire the namespace mutex in order to handle
such events.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

646017a6

NVMe: Use IDA for namespace disk naming · 075790eb

由 Keith Busch 提交于 2月 24, 2016

A namespace may be detached from a controller, but a user may be holding
a reference to it. Attaching a new namespace with the same NSID will create
duplicate names when using the NSID to name the disk.

This patch uses an IDA that is released only when the last reference is
released instead of using the namespace ID.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

075790eb

NVMe: Don't unmap controller registers on reset · b00a726a

由 Keith Busch 提交于 2月 24, 2016

Unmapping the registers on reset or shutdown is not necessary. Keeping
the mapping simplifies reset handling.
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

b00a726a

block: merge: get the 1st and last bvec via helpers · e827091c

由 Ming Lei 提交于 2月 26, 2016

This patch applies the two introduced helpers to
figure out the 1st and last bvec.
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e827091c

block: get the 1st and last bvec via helpers · 25e71a99

由 Ming Lei 提交于 2月 26, 2016

This patch applies the two introduced helpers to
figure out the 1st and last bvec, and fixes the
original way after bio splitting.

Cc: stable@vger.kernel.org
Reported-by: NSagi Grimberg <sagig@dev.mellanox.co.il>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

25e71a99

block: check virt boundary in bio_will_gap() · e0af2917

由 Ming Lei 提交于 2月 26, 2016

In the following patch, the way for figuring out
the last bvec will be changed with a bit cost introduced,
so return immediately if the queue doesn't have virt
boundary limit. Actually most of devices have not
this limit.
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

e0af2917

block: bio: introduce helpers to get the 1st and last bvec · 7bcd79ac

由 Ming Lei 提交于 2月 26, 2016

The bio passed to bio_will_gap() may be fast cloned from upper
layer(dm, md, bcache, fs, ...), or from bio splitting in block
core.

Unfortunately bio_will_gap() just figures out the last bvec via
'bi_io_vec[prev->bi_vcnt - 1]' directly, and this way is obviously
wrong.

This patch introduces two helpers for getting the first and last
bvec of one bio for fixing the issue.

Cc: stable@vger.kernel.org
Reported-by: NSagi Grimberg <sagig@dev.mellanox.co.il>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@canonical.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

7bcd79ac

Merge tag 'pci-v4.5-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · e3c2ef41

由 Linus Torvalds 提交于 3月 03, 2016

Pull PCI fixes from Bjorn Helgaas:
 "Freescale Layerscape host bridge driver:
    Fix MSG TLP drop setting (Minghuan Lian)

  TI Keystone host bridge driver:
    Fix MSI code that retrieves struct pcie_port pointer (Murali Karicheri)"

* tag 'pci-v4.5-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: layerscape: Fix MSG TLP drop setting
  PCI: keystone: Fix MSI code that retrieves struct pcie_port pointer

e3c2ef41

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · c2687cf9

由 Linus Torvalds 提交于 3月 03, 2016

Pull KVM fixes from Paolo Bonzini:
 - ARM/MIPS: Fixes for ioctls when copy_from_user returns nonzero
 - x86: Small fix for Skylake TSC scaling
 - x86: Improved fix for last week's missed hardware breakpoint bug

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm: x86: Update tsc multiplier on change.
  mips/kvm: fix ioctl error handling
  arm/arm64: KVM: Fix ioctl error handling
  KVM: x86: fix root cause for missed hardware breakpoints

c2687cf9

Merge tag 'gpio-v4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · 4237b2e6

由 Linus Torvalds 提交于 3月 03, 2016

Pull late GPIO fix from Linus Walleij:
 "Regressions never arrive when you want them to, so here is a late fix
  for the Renesas RCAR GPIO driver.  It only affects that driver on the
  very specific Renesas platforms:

   - Fix a runtime PM suspend/resume bug in the RCAR driver"

* tag 'gpio-v4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: rcar: Add Runtime PM handling for interrupts

4237b2e6

Merge tag 'iommu-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · 19eab220

由 Linus Torvalds 提交于 3月 03, 2016

Pull IOMMU fixes from Joerg Roedel:
 "One fix for Intel VT-d:

   - Use BUS_NOTIFY_REMOVED_DEVICE notifier to unbind a device from its
     domain _after_ it has been unbound from its driver.  This fixes a
     BUG_ON being triggered in the PCI hotplug path.

  And three for AMD IOMMU:

   - Add a workaround for a hardware issue with ATS in use

   - Fix ATS enable/disable balance when a device is removed

   - Fix a boot warning being triggered when the system has IOMMU
     performance counters and PCI device 00:00.0 is not covered by the
     IOMMU"

* tag 'iommu-fixes-v4.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/vt-d: Use BUS_NOTIFY_REMOVED_DEVICE in hotplug path
  iommu/amd: Detach device from domain before removal
  iommu/amd: Apply workaround for ATS write permission check
  iommu/amd: Fix boot warning when device 00:00.0 is not iommu covered

19eab220

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · f4bd9822

由 Linus Torvalds 提交于 3月 03, 2016

Pull minor virtio/vhost fixes from Michael Tsirkin:
 "This fixes two minor bugs: error handling in vhost, and capability
  processing in virtio"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost: fix error path in vhost_init_used()
  virtio-pci: read the right virtio_pci_notify_cap field

f4bd9822

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功