1. 20 6月, 2019 9 次提交
  2. 17 6月, 2019 1 次提交
  3. 16 6月, 2019 5 次提交
    • T
      blkcg, writeback: dead memcgs shouldn't contribute to writeback ownership arbitration · 66311422
      Tejun Heo 提交于
      wbc_account_io() collects information on cgroup ownership of writeback
      pages to determine which cgroup should own the inode.  Pages can stay
      associated with dead memcgs but we want to avoid attributing IOs to
      dead blkcgs as much as possible as the association is likely to be
      stale.  However, currently, pages associated with dead memcgs
      contribute to the accounting delaying and/or confusing the
      arbitration.
      
      Fix it by ignoring pages associated with dead memcgs.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      66311422
    • T
      blkcg: blkcg_activate_policy() should initialize ancestors first · 71c81407
      Tejun Heo 提交于
      When blkcg_activate_policy() is creating blkg_policy_data for existing
      blkgs, it did in the wrong order - descendants first.  Fix it.  None
      of the existing controllers seem affected by this.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      71c81407
    • T
      blkcg: perpcu_ref init/exit should be done from blkg_alloc/free() · ef069b97
      Tejun Heo 提交于
      blkg alloc is performed as a separate step from the rest of blkg
      creation so that GFP_KERNEL allocations can be used when creating
      blkgs from configuration file writes because otherwise user actions
      may fail due to failures of opportunistic GFP_NOWAIT allocations.
      
      While making blkgs use percpu_ref, 7fcf2b03 ("blkcg: change blkg
      reference counting to use percpu_ref") incorrectly added unconditional
      opportunistic percpu_ref_init() to blkg_create() breaking this
      guarantee.
      
      This patch moves percpu_ref_init() to blkg_alloc() so makes it use
      @gfp_mask that blkg_alloc() is called with.  Also, percpu_ref_exit()
      is moved to blkg_free() for consistency.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: 7fcf2b03 ("blkcg: change blkg reference counting to use percpu_ref")
      Cc: Dennis Zhou <dennis@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ef069b97
    • T
      blkcg: update blkcg_print_stat() to handle larger outputs · f539da82
      Tejun Heo 提交于
      Depending on the number of devices, blkcg stats can go over the
      default seqfile buf size.  seqfile normally retries with a larger
      buffer but since the ->pd_stat() addition, blkcg_print_stat() doesn't
      tell seqfile that overflow has happened and the output gets printed
      truncated.  Fix it by calling seq_commit() w/ -1 on possible
      overflows.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: 903d23f0 ("blk-cgroup: allow controllers to output their own stats")
      Cc: stable@vger.kernel.org # v4.19+
      Cc: Josef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f539da82
    • T
      blk-iolatency: clear use_delay when io.latency is set to zero · 5de0073f
      Tejun Heo 提交于
      If use_delay was non-zero when the latency target of a cgroup was set
      to zero, it will stay stuck until io.latency is enabled on the cgroup
      again.  This keeps readahead disabled for the cgroup impacting
      performance negatively.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Josef Bacik <jbacik@fb.com>
      Fixes: d7067512 ("block: introduce blk-iolatency io controller")
      Cc: stable@vger.kernel.org # v4.19+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5de0073f
  4. 15 6月, 2019 17 次提交
    • G
      block: bio: Use struct_size() in kmalloc() · f1f8f292
      Gustavo A. R. Silva 提交于
      One of the more common cases of allocation size calculations is finding
      the size of a structure that has a zero-sized array at the end, along
      with memory for some number of elements for that array. For example:
      
      struct bio_map_data {
      	...
              struct iovec iov[];
      };
      
      instance = kmalloc(sizeof(sizeof(struct bio_map_data) + sizeof(struct iovec) *
                                count, GFP_KERNEL);
      
      Instead of leaving these open-coded and prone to type mistakes, we can
      now use the new struct_size() helper:
      
      instance = kmalloc(struct_size(instance, iov, count), GFP_KERNEL);
      
      This code was detected with the help of Coccinelle.
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f1f8f292
    • G
      block: genhd: Use struct_size() helper · 78b90a2c
      Gustavo A. R. Silva 提交于
      Make use of the struct_size() helper instead of an open-coded version
      in order to avoid any potential type mistakes, in particular in the
      context in which this code is being used.
      
      So, replace the following form:
      
      sizeof(*new_ptbl) + target * sizeof(new_ptbl->part[0])
      
      with:
      
      struct_size(new_ptbl, part, target)
      
      Also, notice that variable size is unnecessary, hence it is removed.
      
      This code was detected with the help of Coccinelle.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      78b90a2c
    • B
      block: null_blk: fix race condition for null_del_dev · 7602843f
      Bob Liu 提交于
      Dulicate call of null_del_dev() will trigger null pointer error like below.
      The reason is a race condition between nullb_device_power_store() and
      nullb_group_drop_item().
      
        CPU#0                         CPU#1
        ----------------              -----------------
        do_rmdir()
         >configfs_rmdir()
          >client_drop_item()
           >nullb_group_drop_item()
                                      nullb_device_power_store()
      				>null_del_dev()
      
            >test_and_clear_bit(NULLB_DEV_FL_UP
             >null_del_dev()
             ^^^^^
             Duplicated null_dev_dev() triger null pointer error
      
      				>clear_bit(NULLB_DEV_FL_UP
      
      The fix could be keep the sequnce of clear NULLB_DEV_FL_UP and null_del_dev().
      
      [  698.613600] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
      [  698.613608] #PF error: [normal kernel read fault]
      [  698.613611] PGD 0 P4D 0
      [  698.613619] Oops: 0000 [#1] SMP PTI
      [  698.613627] CPU: 3 PID: 6382 Comm: rmdir Not tainted 5.0.0+ #35
      [  698.613631] Hardware name: LENOVO 20LJS2EV08/20LJS2EV08, BIOS R0SET33W (1.17 ) 07/18/2018
      [  698.613644] RIP: 0010:null_del_dev+0xc/0x110 [null_blk]
      [  698.613649] Code: 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 0b eb 97 e8 47 bb 2a e8 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 53 <8b> 77 18 48 89 fb 4c 8b 27 48 c7 c7 40 57 1e c1 e8 bf c7 cb e8 48
      [  698.613654] RSP: 0018:ffffb887888bfde0 EFLAGS: 00010286
      [  698.613659] RAX: 0000000000000000 RBX: ffff9d436d92bc00 RCX: ffff9d43a9184681
      [  698.613663] RDX: ffffffffc11e5c30 RSI: 0000000068be6540 RDI: 0000000000000000
      [  698.613667] RBP: ffffb887888bfdf0 R08: 0000000000000001 R09: 0000000000000000
      [  698.613671] R10: ffffb887888bfdd8 R11: 0000000000000f16 R12: ffff9d436d92bc08
      [  698.613675] R13: ffff9d436d94e630 R14: ffffffffc11e5088 R15: ffffffffc11e5000
      [  698.613680] FS:  00007faa68be6540(0000) GS:ffff9d43d14c0000(0000) knlGS:0000000000000000
      [  698.613685] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  698.613689] CR2: 0000000000000018 CR3: 000000042f70c002 CR4: 00000000003606e0
      [  698.613693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  698.613697] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  698.613700] Call Trace:
      [  698.613712]  nullb_group_drop_item+0x50/0x70 [null_blk]
      [  698.613722]  client_drop_item+0x29/0x40
      [  698.613728]  configfs_rmdir+0x1ed/0x300
      [  698.613738]  vfs_rmdir+0xb2/0x130
      [  698.613743]  do_rmdir+0x1c7/0x1e0
      [  698.613750]  __x64_sys_rmdir+0x17/0x20
      [  698.613759]  do_syscall_64+0x5a/0x110
      [  698.613768]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7602843f
    • P
      blk-mq/debugfs: Fix improper print qualifier · 315eb656
      Pavel Begunkov 提交于
      struct blk_rq_stat::mean is a u64 value, so use %llu
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      315eb656
    • G
      md/raid10: read balance chooses idlest disk for SSD · e9eeba28
      Guoqing Jiang 提交于
      Andy reported that raid10 array with SSD disks has poor
      read performance. Compared with raid1, RAID-1 can be 3x
      faster than RAID-10 sometimes [1].
      
      The thing is that raid10 chooses the low distance disk
      for read request, however, the approach doesn't work
      well for SSD device since it doesn't have spindle like
      HDD, we should just read from the SSD which has less
      pending IO like commit 9dedf603 ("md/raid1: read
      balance chooses idlest disk for SSD").
      
      So this commit selects the idlest SSD disk for read if
      array has none rotational disk, otherwise, read_balance
      uses the previous distance priority algorithm. With the
      change, the performance of raid10 gets increased largely
      per Andy's test [2].
      
      [1]. https://marc.info/?l=linux-raid&m=155915890004761&w=2
      [2]. https://marc.info/?l=linux-raid&m=155990654223786&w=2Tested-by: NAndy Smith <andy@strugglers.net>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e9eeba28
    • M
      md: raid1-10: Unify r{1,10}bio_pool_free · c7afa803
      Marcos Paulo de Souza 提交于
      Avoiding duplicated code, since they just execute a kfree.
      Signed-off-by: NMarcos Paulo de Souza <marcos.souza.org@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c7afa803
    • G
      md: raid10: Use struct_size() in kmalloc() · 8cf05a78
      Gustavo A. R. Silva 提交于
      One of the more common cases of allocation size calculations is finding
      the size of a structure that has a zero-sized array at the end, along
      with memory for some number of elements for that array. For example:
      
      struct foo {
         int stuff;
         struct boo entry[];
      };
      
      instance = kmalloc(size, GFP_KERNEL);
      
      Instead of leaving these open-coded and prone to type mistakes, we can
      now use the new struct_size() helper:
      
      instance = kmalloc(struct_size(instance, entry, count), GFP_KERNEL);
      
      This code was detected with the help of Coccinelle.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8cf05a78
    • Y
      md/raid1: get rid of extra blank line and space · ebfeb444
      Yufen Yu 提交于
      This patch get rid of extra blank line and space, and
      add necessary space for code.
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ebfeb444
    • Y
      md: fix spelling typo and add necessary space · e5b521ee
      Yufen Yu 提交于
      This patch fix a spelling typo and add necessary space for code.
      In addition, the patch get rid of the unnecessary 'if'.
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e5b521ee
    • M
      md: md.c: Return -ENODEV when mddev is NULL in rdev_attr_show · 168b305b
      Marcos Paulo de Souza 提交于
      Commit c42d3240
      ("md: return -ENODEV if rdev has no mddev assigned") changed
      rdev_attr_store to return -ENODEV when rdev->mddev is NULL, now do the
      same to rdev_attr_show.
      Signed-off-by: NMarcos Paulo de Souza <marcos.souza.org@gmail.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      168b305b
    • X
      raid5-cache: Need to do start() part job after adding journal device · d9771f5e
      Xiao Ni 提交于
      commit d5d885fd ("md: introduce new personality funciton start()")
      splits the init job to two parts. The first part run() does the jobs that
      do not require the md threads. The second part start() does the jobs that
      require the md threads.
      
      Now it just does run() in adding new journal device. It needs to do the
      second part start() too.
      
      Fixes: d5d885fd ("md: introduce new personality funciton start()")
      Cc: stable@vger.kernel.org #v4.9+
      Reported-by: NMichal Soltys <soltys@ziu.info>
      Signed-off-by: NXiao Ni <xni@redhat.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d9771f5e
    • M
      drivers: md: Unify common definitions of raid1 and raid10 · 3f677f9c
      Marcos Paulo de Souza 提交于
      These definitions are being moved to raid1-10.c.
      Signed-off-by: NMarcos Paulo de Souza <marcos.souza.org@gmail.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3f677f9c
    • L
      Merge tag 'for-linus-20190614' of git://git.kernel.dk/linux-block · 7b103151
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
      
       - Remove references to old schedulers for the scheduler switching and
         blkio controller documentation (Andreas)
      
       - Kill duplicate check for report zone for null_blk (Chaitanya)
      
       - Two bcache fixes (Coly)
      
       - Ensure that mq-deadline is selected if zoned block device is enabled,
         as we need that to support them (Damien)
      
       - Fix io_uring memory leak (Eric)
      
       - ps3vram fallout from LBDAF removal (Geert)
      
       - Redundant blk-mq debugfs debugfs_create return check cleanup (Greg)
      
       - Extend NOPLM quirk for ST1000LM024 drives (Hans)
      
       - Remove error path warning that can now trigger after the queue
         removal/addition fixes (Ming)
      
      * tag 'for-linus-20190614' of git://git.kernel.dk/linux-block:
        block/ps3vram: Use %llu to format sector_t after LBDAF removal
        libata: Extend quirks for the ST1000LM024 drives with NOLPM quirk
        bcache: only set BCACHE_DEV_WB_RUNNING when cached device attached
        bcache: fix stack corruption by PRECEDING_KEY()
        blk-mq: remove WARN_ON(!q->elevator) from blk_mq_sched_free_requests
        blkio-controller.txt: Remove references to CFQ
        block/switching-sched.txt: Update to blk-mq schedulers
        null_blk: remove duplicate check for report zone
        blk-mq: no need to check return value of debugfs_create functions
        io_uring: fix memory leak of UNIX domain socket inode
        block: force select mq-deadline for zoned block devices
      7b103151
    • L
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 5dcedf46
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "I2C has two simple but wanted driver fixes for you"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: pca-platform: Fix GPIO lookup code
        i2c: acorn: fix i2c warning
      5dcedf46
    • C
      Smack: Restore the smackfsdef mount option and add missing prefixes · 6e7739fc
      Casey Schaufler 提交于
      The 5.1 mount system rework changed the smackfsdef mount option to
      smackfsdefault.  This fixes the regression by making smackfsdef treated
      the same way as smackfsdefault.
      
      Also fix the smack_param_specs[] to have "smack" prefixes on all the
      names.  This isn't visible to a user unless they either:
      
       (a) Try to mount a filesystem that's converted to the internal mount API
           and that implements the ->parse_monolithic() context operation - and
           only then if they call security_fs_context_parse_param() rather than
           security_sb_eat_lsm_opts().
      
           There are no examples of this upstream yet, but nfs will probably want
           to do this for nfs2 or nfs3.
      
       (b) Use fsconfig() to configure the filesystem - in which case
           security_fs_context_parse_param() will be called.
      
      This issue is that smack_sb_eat_lsm_opts() checks for the "smack" prefix
      on the options, but smack_fs_context_parse_param() does not.
      
      Fixes: c3300aaf ("smack: get rid of match_token()")
      Fixes: 2febd254 ("smack: Implement filesystem context security hooks")
      Cc: stable@vger.kernel.org
      Reported-by: NJose Bollo <jose.bollo@iot.bzh>
      Signed-off-by: NCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e7739fc
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · 72a20cee
      Linus Torvalds 提交于
      Pull arm64 fixes from Will Deacon:
       "Here are some arm64 fixes for -rc5.
      
        The only non-trivial change (in terms of the diffstat) is fixing our
        SVE ptrace API for big-endian machines, but the majority of this is
        actually the addition of much-needed comments and updates to the
        documentation to try to avoid this mess biting us again in future.
      
        There are still a couple of small things on the horizon, but nothing
        major at this point.
      
        Summary:
      
         - Fix broken SVE ptrace API when running in a big-endian configuration
      
         - Fix performance regression due to off-by-one in TLBI range checking
      
         - Fix build regression when using Clang"
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64/sve: Fix missing SVE/FPSIMD endianness conversions
        arm64: tlbflush: Ensure start/end of address range are aligned to stride
        arm64: Don't unconditionally add -Wno-psabi to KBUILD_CFLAGS
      72a20cee
    • L
      Merge branch 'akpm' (patches from Andrew) · fd6b99fa
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "16 fixes"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/devm_memremap_pages: fix final page put race
        PCI/P2PDMA: track pgmap references per resource, not globally
        lib/genalloc: introduce chunk owners
        PCI/P2PDMA: fix the gen_pool_add_virt() failure path
        mm/devm_memremap_pages: introduce devm_memunmap_pages
        drivers/base/devres: introduce devm_release_action()
        mm/vmscan.c: fix trying to reclaim unevictable LRU page
        coredump: fix race condition between collapse_huge_page() and core dumping
        mm/mlock.c: change count_mm_mlocked_page_nr return type
        mm: mmu_gather: remove __tlb_reset_range() for force flush
        fs/ocfs2: fix race in ocfs2_dentry_attach_lock()
        mm/vmscan.c: fix recent_rotated history
        mm/mlock.c: mlockall error for flag MCL_ONFAULT
        scripts/decode_stacktrace.sh: prefix addr2line with $CROSS_COMPILE
        mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node
        mm: memcontrol: don't batch updates of local VM stats and events
      fd6b99fa
  5. 14 6月, 2019 8 次提交