1. 15 2月, 2016 1 次提交
    • M
      nbd: Create size change events for userspace · 37091fdd
      Markus Pargmann 提交于
      The userspace needs to know when nbd devices are ready for use.
      Currently no events are created for the userspace which doesn't work for
      systemd.
      
      See the discussion here: https://github.com/systemd/systemd/pull/358
      
      This patch uses a central point to setup the nbd-internal sizes. A ioctl
      to set a size does not lead to a visible size change. The size of the
      block device will be kept at 0 until nbd is connected. As soon as it
      connects, the size will be changed to the real value and a uevent is
      created. When disconnecting, the blockdevice is set to 0 size and
      another uevent is generated.
      Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
      37091fdd
  2. 05 2月, 2016 5 次提交
    • D
      nbd: ratelimit error msgs after socket close · da6ccaaa
      Dan Streetman 提交于
      Make the "Attempted send on closed socket" error messages generated in
      nbd_request_handler() ratelimited.
      
      When the nbd socket is shutdown, the nbd_request_handler() function emits
      an error message for every request remaining in its queue.  If the queue
      is large, this will spam a large amount of messages to the log.  There's
      no need for a separate error message for each request, so this patch
      ratelimits it.
      
      In the specific case this was found, the system was virtual and the error
      messages were logged to the serial port, which overwhelmed it.
      
      Fixes: 4d48a542 ("nbd: fix I/O hang on disconnected nbds")
      Signed-off-by: NDan Streetman <dan.streetman@canonical.com>
      Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
      da6ccaaa
    • M
      nbd: Move flag parsing to a function · d02cf531
      Markus Pargmann 提交于
      nbd changes properties of the blockdevice depending on flags that were
      received. This patch moves this flag parsing into a separate function
      nbd_parse_flags().
      Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
      d02cf531
    • M
      nbd: Cleanup reset of nbd and bdev after a disconnect · 0e4f0f6f
      Markus Pargmann 提交于
      Group all variables that are reset after a disconnect into reset
      functions. This patch adds two of these functions, nbd_reset() and
      nbd_bdev_reset().
      Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
      0e4f0f6f
    • M
      nbd: Timeouts are not user requested disconnects · 1f7b5cf1
      Markus Pargmann 提交于
      It may be useful to know in the client that a connection timed out. The
      current code returns success for a timeout.
      
      This patch reports the error code -ETIMEDOUT for a timeout.
      Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
      1f7b5cf1
    • M
      nbd: Remove signal usage · 23272a67
      Markus Pargmann 提交于
      As discussed on the mailing list, the usage of signals for timeout
      handling has a lot of potential issues. The nbd driver used for some
      time signals for timeouts. These signals where able to get the threads
      out of the blocking socket operations.
      
      This patch removes all signal usage and uses a socket shutdown instead.
      The socket descriptor itself is cleared later when the whole nbd device
      is closed.
      
      The tasks_lock is removed as we do not depend on this anymore. Instead
      a new lock for the socket is introduced so we can safely work with the
      socket in the timeout handler outside of the two main threads.
      
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      23272a67
  3. 03 2月, 2016 1 次提交
    • M
      nbd: Fix debugfs error handling · 27ea43fe
      Markus Pargmann 提交于
      Static checker complains about the implemented error handling. It is
      indeed wrong. We don't care about the return values of created debugfs
      files.
      
      We only have to check the return values of created dirs for NULL
      pointer. If we use a null pointer as parent directory for files, this
      may lead to debugfs files in wrong places.
      Signed-off-by: NMarkus Pargmann <mpa@pengutronix.de>
      27ea43fe
  4. 23 1月, 2016 2 次提交
  5. 22 1月, 2016 1 次提交
  6. 16 1月, 2016 6 次提交
    • D
      mm, dax, pmem: introduce pfn_t · 34c0fd54
      Dan Williams 提交于
      For the purpose of communicating the optional presence of a 'struct
      page' for the pfn returned from ->direct_access(), introduce a type that
      encapsulates a page-frame-number plus flags.  These flags contain the
      historical "page_link" encoding for a scatterlist entry, but can also
      denote "device memory".  Where "device memory" is a set of pfns that are
      not part of the kernel's linear mapping by default, but are accessed via
      the same memory controller as ram.
      
      The motivation for this new type is large capacity persistent memory
      that needs struct page entries in the 'memmap' to support 3rd party DMA
      (i.e.  O_DIRECT I/O with a persistent memory source/target).  However,
      we also need it in support of maintaining a list of mapped inodes which
      need to be unmapped at driver teardown or freeze_bdev() time.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34c0fd54
    • J
      zram: don't call idr_remove() from zram_remove() · 17ec4cd9
      Jerome Marchand 提交于
      The use of idr_remove() is forbidden in the callback functions of
      idr_for_each().  It is therefore unsafe to call idr_remove in
      zram_remove().
      
      This patch moves the call to idr_remove() from zram_remove() to
      hot_remove_store().  In the detroy_devices() path, idrs are removed by
      idr_destroy().  This solves an use-after-free detected by KASan.
      
      [akpm@linux-foundation.org: fix coding stype, per Sergey]
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: <stable@vger.kernel.org>	[4.2+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      17ec4cd9
    • S
      zram/zcomp: do not zero out zcomp private pages · e02d238c
      Sergey Senozhatsky 提交于
      Do not __GFP_ZERO allocated zcomp ->private pages.  We keep allocated
      streams around and use them for read/write requests, so we supply a
      zeroed out ->private to compression algorithm as a scratch buffer only
      once -- the first time we use that stream.  For the rest of IO requests
      served by this stream ->private usually contains some temporarily data
      from the previous requests.
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e02d238c
    • M
      zram: pass gfp from zcomp frontend to backend · 75d8947a
      Minchan Kim 提交于
      Each zcomp backend uses own gfp flag but it's pointless because the
      context they could be called is driven by upper layer(ie, zcomp
      frontend).  As well, zcomp frondend could call them in different
      context.  One context(ie, zram init part) is it should be better to make
      sure successful allocation other context(ie, further stream allocation
      part for accelarating I/O speed) is just optional so let's pass gfp down
      from driver (ie, zcomp frontend) like normal MM convention.
      
      [sergey.senozhatsky@gmail.com: add missing __vmalloc zero and highmem gfps]
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      75d8947a
    • K
      zram: try vmalloc() after kmalloc() · d913897a
      Kyeongdon Kim 提交于
      When we're using LZ4 multi compression streams for zram swap, we found
      out page allocation failure message in system running test.  That was
      not only once, but a few(2 - 5 times per test).  Also, some failure
      cases were continually occurring to try allocation order 3.
      
      In order to make parallel compression private data, we should call
      kzalloc() with order 2/3 in runtime(lzo/lz4).  But if there is no order
      2/3 size memory to allocate in that time, page allocation fails.  This
      patch makes to use vmalloc() as fallback of kmalloc(), this prevents
      page alloc failure warning.
      
      After using this, we never found warning message in running test, also
      It could reduce process startup latency about 60-120ms in each case.
      
      For reference a call trace :
      
          Binder_1: page allocation failure: order:3, mode:0x10c0d0
          CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
          Call trace:
            dump_backtrace+0x0/0x270
            show_stack+0x10/0x1c
            dump_stack+0x1c/0x28
            warn_alloc_failed+0xfc/0x11c
            __alloc_pages_nodemask+0x724/0x7f0
            __get_free_pages+0x14/0x5c
            kmalloc_order_trace+0x38/0xd8
            zcomp_lz4_create+0x2c/0x38
            zcomp_strm_alloc+0x34/0x78
            zcomp_strm_multi_find+0x124/0x1ec
            zcomp_strm_find+0xc/0x18
            zram_bvec_rw+0x2fc/0x780
            zram_make_request+0x25c/0x2d4
            generic_make_request+0x80/0xbc
            submit_bio+0xa4/0x15c
            __swap_writepage+0x218/0x230
            swap_writepage+0x3c/0x4c
            shrink_page_list+0x51c/0x8d0
            shrink_inactive_list+0x3f8/0x60c
            shrink_lruvec+0x33c/0x4cc
            shrink_zone+0x3c/0x100
            try_to_free_pages+0x2b8/0x54c
            __alloc_pages_nodemask+0x514/0x7f0
            __get_free_pages+0x14/0x5c
            proc_info_read+0x50/0xe4
            vfs_read+0xa0/0x12c
            SyS_read+0x44/0x74
          DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
               0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
      
      [minchan@kernel.org: change vmalloc gfp and adding comment about gfp]
      [sergey.senozhatsky@gmail.com: tweak comments and styles]
      Signed-off-by: NKyeongdon Kim <kyeongdon.kim@lge.com>
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d913897a
    • S
      zram/zcomp: use GFP_NOIO to allocate streams · 3d5fe03a
      Sergey Senozhatsky 提交于
      We can end up allocating a new compression stream with GFP_KERNEL from
      within the IO path, which may result is nested (recursive) IO
      operations.  That can introduce problems if the IO path in question is a
      reclaimer, holding some locks that will deadlock nested IOs.
      
      Allocate streams and working memory using GFP_NOIO flag, forbidding
      recursive IO and FS operations.
      
      An example:
      
        inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
        git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes:
         (jbd2_handle){+.+.?.}, at:  start_this_handle+0x4ca/0x555
        {IN-RECLAIM_FS-W} state was registered at:
           __lock_acquire+0x8da/0x117b
           lock_acquire+0x10c/0x1a7
           start_this_handle+0x52d/0x555
           jbd2__journal_start+0xb4/0x237
           __ext4_journal_start_sb+0x108/0x17e
           ext4_dirty_inode+0x32/0x61
           __mark_inode_dirty+0x16b/0x60c
           iput+0x11e/0x274
           __dentry_kill+0x148/0x1b8
           shrink_dentry_list+0x274/0x44a
           prune_dcache_sb+0x4a/0x55
           super_cache_scan+0xfc/0x176
           shrink_slab.part.14.constprop.25+0x2a2/0x4d3
           shrink_zone+0x74/0x140
           kswapd+0x6b7/0x930
           kthread+0x107/0x10f
           ret_from_fork+0x3f/0x70
        irq event stamp: 138297
        hardirqs last  enabled at (138297):  debug_check_no_locks_freed+0x113/0x12f
        hardirqs last disabled at (138296):  debug_check_no_locks_freed+0x33/0x12f
        softirqs last  enabled at (137818):  __do_softirq+0x2d3/0x3e9
        softirqs last disabled at (137813):  irq_exit+0x41/0x95
      
                     other info that might help us debug this:
         Possible unsafe locking scenario:
               CPU0
               ----
          lock(jbd2_handle);
          <Interrupt>
            lock(jbd2_handle);
      
                      *** DEADLOCK ***
        5 locks held by git/20158:
         #0:  (sb_writers#7){.+.+.+}, at: [<ffffffff81155411>] mnt_want_write+0x24/0x4b
         #1:  (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffff81145087>] lock_rename+0xd9/0xe3
         #2:  (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8114f8e2>] lock_two_nondirectories+0x3f/0x6b
         #3:  (&sb->s_type->i_mutex_key#11/4){+.+.+.}, at: [<ffffffff8114f909>] lock_two_nondirectories+0x66/0x6b
         #4:  (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555
      
                     stack backtrace:
        CPU: 2 PID: 20158 Comm: git Not tainted 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211
        Call Trace:
          dump_stack+0x4c/0x6e
          mark_lock+0x384/0x56d
          mark_held_locks+0x5f/0x76
          lockdep_trace_alloc+0xb2/0xb5
          kmem_cache_alloc_trace+0x32/0x1e2
          zcomp_strm_alloc+0x25/0x73 [zram]
          zcomp_strm_multi_find+0xe7/0x173 [zram]
          zcomp_strm_find+0xc/0xe [zram]
          zram_bvec_rw+0x2ca/0x7e0 [zram]
          zram_make_request+0x1fa/0x301 [zram]
          generic_make_request+0x9c/0xdb
          submit_bio+0xf7/0x120
          ext4_io_submit+0x2e/0x43
          ext4_bio_write_page+0x1b7/0x300
          mpage_submit_page+0x60/0x77
          mpage_map_and_submit_buffers+0x10f/0x21d
          ext4_writepages+0xc8c/0xe1b
          do_writepages+0x23/0x2c
          __filemap_fdatawrite_range+0x84/0x8b
          filemap_flush+0x1c/0x1e
          ext4_alloc_da_blocks+0xb8/0x117
          ext4_rename+0x132/0x6dc
          ? mark_held_locks+0x5f/0x76
          ext4_rename2+0x29/0x2b
          vfs_rename+0x540/0x636
          SyS_renameat2+0x359/0x44d
          SyS_rename+0x1e/0x20
          entry_SYSCALL_64_fastpath+0x12/0x6f
      
      [minchan@kernel.org: add stable mark]
      Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Kyeongdon Kim <kyeongdon.kim@lge.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d5fe03a
  7. 14 1月, 2016 1 次提交
    • A
      null_blk: use sector_div instead of do_div · e93d12ae
      Arnd Bergmann 提交于
      Dividing a sector_t number should be done using sector_div rather than do_div
      to optimize the 32-bit sector_t case, and with the latest do_div optimizations,
      we now get a compile-time warning for this:
      
      arch/arm/include/asm/div64.h:32:95: note: expected 'uint64_t * {aka long long unsigned int *}' but argument is of type 'sector_t * {aka long unsigned int *}'
      drivers/block/null_blk.c:521:81: warning: comparison of distinct pointer types lacks a cast
      
      This changes the newly added code to use sector_div. It is a simplified version
      of the original patch, as Linus Torvalds pointed out that we should not be using
      an expensive division function in the first place.
      
      This version was suggested by Matias Bjorling.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Matias Bjorling <m@bjorling.me>
      Fixes: b2b7e001 ("null_blk: register as a LightNVM device")
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e93d12ae
  8. 12 1月, 2016 1 次提交
    • M
      lightnvm: refactor end_io functions for sync · 91276162
      Matias Bjørling 提交于
      To implement sync I/O support within the LightNVM core, the end_io
      functions are refactored to take an end_io function pointer instead of
      testing for initialized media manager, followed by calling its end_io
      function.
      
      Sync I/O can then be implemented using a callback that signal I/O
      completion. This is similar to the logic found in blk_to_execute_io().
      By implementing it this way, the underlying device I/Os submission logic
      is abstracted away from core, targets, and media managers.
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      91276162
  9. 09 1月, 2016 2 次提交
  10. 06 1月, 2016 2 次提交
  11. 05 1月, 2016 18 次提交