1. 22 3月, 2018 2 次提交
    • D
      libnvdimm, nfit: fix persistence domain reporting · fe9a552e
      Dan Williams 提交于
      The persistence domain is a point in the platform where once writes
      reach that destination the platform claims it will make them persistent
      relative to power loss. In the ACPI NFIT this is currently communicated
      as 2 bits in the "NFIT - Platform Capabilities Structure". The bits
      comprise a hierarchy, i.e. bit0 "CPU Cache Flush to NVDIMM Durability on
      Power Loss Capable" implies bit1 "Memory Controller Flush to NVDIMM
      Durability on Power Loss Capable".
      
      Commit 96c3a239 "libnvdimm: expose platform persistence attr..."
      shows the persistence domain as flags, but it's really an enumerated
      hierarchy.
      
      Fix this newly introduced user ABI to show the closest available
      persistence domain before userspace develops dependencies on seeing, or
      needing to develop code to tolerate, the raw NFIT flags communicated
      through the libnvdimm-generic region attribute.
      
      Fixes: 96c3a239 ("libnvdimm: expose platform persistence attr...")
      Reviewed-by: NDave Jiang <dave.jiang@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      fe9a552e
    • D
      libnvdimm, region: hide persistence_domain when unknown · 896196dc
      Dan Williams 提交于
      Similar to other region attributes, do not emit the persistence_domain
      attribute if its contents are empty.
      
      Fixes: 96c3a239 ("libnvdimm: expose platform persistence attr...")
      Cc: Dave Jiang <dave.jiang@intel.com>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      896196dc
  2. 16 3月, 2018 1 次提交
  3. 14 3月, 2018 1 次提交
  4. 08 3月, 2018 1 次提交
  5. 03 3月, 2018 2 次提交
    • D
      libnvdimm: re-enable deep flush for pmem devices via fsync() · 5fdf8e5b
      Dave Jiang 提交于
      Re-enable deep flush so that users always have a way to be sure that a
      write makes it all the way out to media. Writes from the PMEM driver
      always arrive at the NVDIMM since movnt is used to bypass the cache, and
      the driver relies on the ADR (Asynchronous DRAM Refresh) mechanism to
      flush write buffers on power failure. The Deep Flush mechanism is there
      to explicitly write buffers to protect against (rare) ADR failure.  This
      change prevents a regression in deep flush behavior so that applications
      can continue to depend on fsync() as a mechanism to trigger deep flush
      in the filesystem-DAX case.
      
      Fixes: 06e8ccda ("acpi: nfit: Add support for detect platform CPU cache...")
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      5fdf8e5b
    • D
      vfio: disable filesystem-dax page pinning · 94db151d
      Dan Williams 提交于
      Filesystem-DAX is incompatible with 'longterm' page pinning. Without
      page cache indirection a DAX mapping maps filesystem blocks directly.
      This means that the filesystem must not modify a file's block map while
      any page in a mapping is pinned. In order to prevent the situation of
      userspace holding of filesystem operations indefinitely, disallow
      'longterm' Filesystem-DAX mappings.
      
      RDMA has the same conflict and the plan there is to add a 'with lease'
      mechanism to allow the kernel to notify userspace that the mapping is
      being torn down for block-map maintenance. Perhaps something similar can
      be put in place for vfio.
      
      Note that xfs and ext4 still report:
      
         "DAX enabled. Warning: EXPERIMENTAL, use at your own risk"
      
      ...at mount time, and resolving the dax-dma-vs-truncate problem is one
      of the last hurdles to remove that designation.
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: kvm@vger.kernel.org
      Cc: <stable@vger.kernel.org>
      Reported-by: NHaozhong Zhang <haozhong.zhang@intel.com>
      Tested-by: NHaozhong Zhang <haozhong.zhang@intel.com>
      Fixes: d475c634 ("dax,ext2: replace XIP read and write with DAX I/O")
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      94db151d
  6. 02 3月, 2018 2 次提交
  7. 01 3月, 2018 21 次提交
  8. 28 2月, 2018 7 次提交
    • V
      clocksource/drivers/arc_timer: Update some comments · a4f53857
      Vineet Gupta 提交于
      TIMER0 interrupt ACK is different for ARC700 and HS3x cores.
      
      This came to light in some internal discussions and it is nice to have this
      documented rather than digging up the PRM (Programmers Reference Manual).
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
      Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
      Cc: linux-snps-arc@lists.infradead.org
      Link: https://lkml.kernel.org/r/1519241491-12570-1-git-send-email-vgupta@synopsys.com
      a4f53857
    • F
      clocksource/drivers/mips-gic-timer: Use correct shift count to extract data · 5753405e
      Felix Fietkau 提交于
      __gic_clocksource_init() extracts the GIC_CONFIG_COUNTBITS field from
      read_gic_config() by right shifting the register value. The shift count is
      determined by the most significant bit (__fls) of the bitmask which is
      wrong as it shifts out the complete bitfield.
      
      Use the least significant bit (__ffs) instead to shift the bitfield down to
      bit 0.
      
      Fixes: e07127a0 ("clocksource: mips-gic-timer: Use new GIC accessor functions")
      Signed-off-by: NFelix Fietkau <nbd@nbd.name>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: daniel.lezcano@linaro.org
      Cc: paul.burton@imgtec.com
      Link: https://lkml.kernel.org/r/20180228095610.50341-1-nbd@nbd.name
      5753405e
    • B
      nvme-multipath: fix sysfs dangerously created links · 9bd82b1a
      Baegjae Sung 提交于
      If multipathing is enabled, each NVMe subsystem creates a head
      namespace (e.g., nvme0n1) and multiple private namespaces
      (e.g., nvme0c0n1 and nvme0c1n1) in sysfs. When creating links for
      private namespaces, links of head namespace are used, so the
      namespace creation order must be followed (e.g., nvme0n1 ->
      nvme0c1n1). If the order is not followed, links of sysfs will be
      incomplete or kernel panic will occur.
      
      The kernel panic was:
        kernel BUG at fs/sysfs/symlink.c:27!
        Call Trace:
          nvme_mpath_add_disk_links+0x5d/0x80 [nvme_core]
          nvme_validate_ns+0x5c2/0x850 [nvme_core]
          nvme_scan_work+0x1af/0x2d0 [nvme_core]
      
      Correct order
      Context A     Context B
      nvme0n1
      nvme0c0n1     nvme0c1n1
      
      Incorrect order
      Context A     Context B
                    nvme0c1n1
      nvme0n1
      nvme0c0n1
      
      The nvme_mpath_add_disk (for creating head namespace) is called
      just before the nvme_mpath_add_disk_links (for creating private
      namespaces). In nvme_mpath_add_disk, the first context acquires
      the lock of subsystem and creates a head namespace, and other
      contexts do nothing by checking GENHD_FL_UP of a head namespace
      after waiting to acquire the lock. We verified the code with or
      without multipathing using three vendors of dual-port NVMe SSDs.
      Signed-off-by: NBaegjae Sung <baegjae@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      9bd82b1a
    • G
      nbd: fix return value in error handling path · 0979962f
      Gustavo A. R. Silva 提交于
      It seems that the proper value to return in this particular case is the
      one contained into variable new_index instead of ret.
      
      Addresses-Coverity-ID: 1465148 ("Copy-paste error")
      Fixes: e46c7287 ("nbd: add a basic netlink interface")
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0979962f
    • T
      bcache: fix kcrashes with fio in RAID5 backend dev · 60eb34ec
      Tang Junhui 提交于
      Kernel crashed when run fio in a RAID5 backend bcache device, the call
      trace is bellow:
      [  440.012034] kernel BUG at block/blk-ioc.c:146!
      [  440.012696] invalid opcode: 0000 [#1] SMP NOPTI
      [  440.026537] CPU: 2 PID: 2205 Comm: md127_raid5 Not tainted 4.15.0 #8
      [  440.027441] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 07/16
      /2015
      [  440.028615] RIP: 0010:put_io_context+0x8b/0x90
      [  440.029246] RSP: 0018:ffffa8c882b43af8 EFLAGS: 00010246
      [  440.029990] RAX: 0000000000000000 RBX: ffffa8c88294fca0 RCX: 0000000000
      0f4240
      [  440.031006] RDX: 0000000000000004 RSI: 0000000000000286 RDI: ffffa8c882
      94fca0
      [  440.032030] RBP: ffffa8c882b43b10 R08: 0000000000000003 R09: ffff949cb8
      0c1700
      [  440.033206] R10: 0000000000000104 R11: 000000000000b71c R12: 00000000000
      01000
      [  440.034222] R13: 0000000000000000 R14: ffff949cad84db70 R15: ffff949cb11
      bd1e0
      [  440.035239] FS:  0000000000000000(0000) GS:ffff949cba280000(0000) knlGS:
      0000000000000000
      [  440.060190] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  440.084967] CR2: 00007ff0493ef000 CR3: 00000002f1e0a002 CR4: 00000000001
      606e0
      [  440.110498] Call Trace:
      [  440.135443]  bio_disassociate_task+0x1b/0x60
      [  440.160355]  bio_free+0x1b/0x60
      [  440.184666]  bio_put+0x23/0x30
      [  440.208272]  search_free+0x23/0x40 [bcache]
      [  440.231448]  cached_dev_write_complete+0x31/0x70 [bcache]
      [  440.254468]  closure_put+0xb6/0xd0 [bcache]
      [  440.277087]  request_endio+0x30/0x40 [bcache]
      [  440.298703]  bio_endio+0xa1/0x120
      [  440.319644]  handle_stripe+0x418/0x2270 [raid456]
      [  440.340614]  ? load_balance+0x17b/0x9c0
      [  440.360506]  handle_active_stripes.isra.58+0x387/0x5a0 [raid456]
      [  440.380675]  ? __release_stripe+0x15/0x20 [raid456]
      [  440.400132]  raid5d+0x3ed/0x5d0 [raid456]
      [  440.419193]  ? schedule+0x36/0x80
      [  440.437932]  ? schedule_timeout+0x1d2/0x2f0
      [  440.456136]  md_thread+0x122/0x150
      [  440.473687]  ? wait_woken+0x80/0x80
      [  440.491411]  kthread+0x102/0x140
      [  440.508636]  ? find_pers+0x70/0x70
      [  440.524927]  ? kthread_associate_blkcg+0xa0/0xa0
      [  440.541791]  ret_from_fork+0x35/0x40
      [  440.558020] Code: c2 48 00 5b 41 5c 41 5d 5d c3 48 89 c6 4c 89 e7 e8 bb c2
      48 00 48 8b 3d bc 36 4b 01 48 89 de e8 7c f7 e0 ff 5b 41 5c 41 5d 5d c3 <0f> 0b
      0f 1f 00 0f 1f 44 00 00 55 48 8d 47 b8 48 89 e5 41 57 41
      [  440.610020] RIP: put_io_context+0x8b/0x90 RSP: ffffa8c882b43af8
      [  440.628575] ---[ end trace a1fd79d85643a73e ]--
      
      All the crash issue happened when a bypass IO coming, in such scenario
      s->iop.bio is pointed to the s->orig_bio. In search_free(), it finishes the
      s->orig_bio by calling bio_complete(), and after that, s->iop.bio became
      invalid, then kernel would crash when calling bio_put(). Maybe its upper
      layer's faulty, since bio should not be freed before we calling bio_put(),
      but we'd better calling bio_put() first before calling bio_complete() to
      notify upper layer ending this bio.
      
      This patch moves bio_complete() under bio_put() to avoid kernel crash.
      
      [mlyle: fixed commit subject for character limits]
      Reported-by: NMatthias Ferdinand <bcache@mfedv.net>
      Tested-by: NMatthias Ferdinand <bcache@mfedv.net>
      Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      60eb34ec
    • C
      bcache: correct flash only vols (check all uuids) · 02aa8a8b
      Coly Li 提交于
      Commit 2831231d ("bcache: reduce cache_set devices iteration by
      devices_max_used") adds c->devices_max_used to reduce iteration of
      c->uuids elements, this value is updated in bcache_device_attach().
      
      But for flash only volume, when calling flash_devs_run(), the function
      bcache_device_attach() is not called yet and c->devices_max_used is not
      updated. The unexpected result is, the flash only volume won't be run
      by flash_devs_run().
      
      This patch fixes the issue by iterate all c->uuids elements in
      flash_devs_run(). c->devices_max_used will be updated properly when
      bcache_device_attach() gets called.
      
      [mlyle: commit subject edited for character limit]
      
      Fixes: 2831231d ("bcache: reduce cache_set devices iteration by devices_max_used")
      Reported-by: NTang Junhui <tang.junhui@zte.com.cn>
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      02aa8a8b
    • V
      cpufreq: s3c24xx: Fix broken s3c_cpufreq_init() · 0373ca74
      Viresh Kumar 提交于
      commit a307a1e6 "cpufreq: s3c: use cpufreq_generic_init()"
      accidentally broke cpufreq on s3c2410 and s3c2412.
      
      These two platforms don't have a CPU frequency table and used to skip
      calling cpufreq_table_validate_and_show() for them.  But with the
      above commit, we started calling it unconditionally and that will
      eventually fail as the frequency table pointer is NULL.
      
      Fix this by calling cpufreq_table_validate_and_show() conditionally
      again.
      
      Fixes: a307a1e6 "cpufreq: s3c: use cpufreq_generic_init()"
      Cc: 3.13+ <stable@vger.kernel.org> # v3.13+
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0373ca74
  9. 27 2月, 2018 3 次提交