1. 04 10月, 2016 1 次提交
  2. 27 9月, 2016 1 次提交
  3. 22 9月, 2016 18 次提交
    • S
      raid5: handle register_shrinker failure · 30c89465
      Shaohua Li 提交于
      register_shrinker() now can fail. When it happens, shrinker.nr_deferred is
      null. We use it to determine if unregister_shrinker is required.
      Signed-off-by: NShaohua Li <shli@fb.com>
      30c89465
    • C
      raid5: fix to detect failure of register_shrinker · 6a0f53ff
      Chao Yu 提交于
      register_shrinker can fail after commit 1d3d4437 ("vmscan: per-node
      deferred work"), we should detect the failure of it, otherwise we may
      fail to register shrinker after raid5 configuration was setup successfully.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      6a0f53ff
    • S
      md: fix a potential deadlock · 90bcf133
      Shaohua Li 提交于
      lockdep reports a potential deadlock. Fix this by droping the mutex
      before md_import_device
      
      [ 1137.126601] ======================================================
      [ 1137.127013] [ INFO: possible circular locking dependency detected ]
      [ 1137.127013] 4.8.0-rc4+ #538 Not tainted
      [ 1137.127013] -------------------------------------------------------
      [ 1137.127013] mdadm/16675 is trying to acquire lock:
      [ 1137.127013]  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffff81243cf3>] __blkdev_get+0x63/0x450
      [ 1137.127013]
      but task is already holding lock:
      [ 1137.127013]  (detected_devices_mutex){+.+.+.}, at: [<ffffffff81a5138c>] md_ioctl+0x2ac/0x1f50
      [ 1137.127013]
      which lock already depends on the new lock.
      
      [ 1137.127013]
      the existing dependency chain (in reverse order) is:
      [ 1137.127013]
      -> #1 (detected_devices_mutex){+.+.+.}:
      [ 1137.127013]        [<ffffffff810b6f19>] lock_acquire+0xb9/0x220
      [ 1137.127013]        [<ffffffff81c51647>] mutex_lock_nested+0x67/0x3d0
      [ 1137.127013]        [<ffffffff81a4eeaf>] md_autodetect_dev+0x3f/0x90
      [ 1137.127013]        [<ffffffff81595be8>] rescan_partitions+0x1a8/0x2c0
      [ 1137.127013]        [<ffffffff81590081>] __blkdev_reread_part+0x71/0xb0
      [ 1137.127013]        [<ffffffff815900e5>] blkdev_reread_part+0x25/0x40
      [ 1137.127013]        [<ffffffff81590c4b>] blkdev_ioctl+0x51b/0xa30
      [ 1137.127013]        [<ffffffff81242bf1>] block_ioctl+0x41/0x50
      [ 1137.127013]        [<ffffffff81214c96>] do_vfs_ioctl+0x96/0x6e0
      [ 1137.127013]        [<ffffffff81215321>] SyS_ioctl+0x41/0x70
      [ 1137.127013]        [<ffffffff81c56825>] entry_SYSCALL_64_fastpath+0x18/0xa8
      [ 1137.127013]
      -> #0 (&bdev->bd_mutex){+.+.+.}:
      [ 1137.127013]        [<ffffffff810b6af2>] __lock_acquire+0x1662/0x1690
      [ 1137.127013]        [<ffffffff810b6f19>] lock_acquire+0xb9/0x220
      [ 1137.127013]        [<ffffffff81c51647>] mutex_lock_nested+0x67/0x3d0
      [ 1137.127013]        [<ffffffff81243cf3>] __blkdev_get+0x63/0x450
      [ 1137.127013]        [<ffffffff81244307>] blkdev_get+0x227/0x350
      [ 1137.127013]        [<ffffffff812444f6>] blkdev_get_by_dev+0x36/0x50
      [ 1137.127013]        [<ffffffff81a46d65>] lock_rdev+0x35/0x80
      [ 1137.127013]        [<ffffffff81a49bb4>] md_import_device+0xb4/0x1b0
      [ 1137.127013]        [<ffffffff81a513d6>] md_ioctl+0x2f6/0x1f50
      [ 1137.127013]        [<ffffffff815909b3>] blkdev_ioctl+0x283/0xa30
      [ 1137.127013]        [<ffffffff81242bf1>] block_ioctl+0x41/0x50
      [ 1137.127013]        [<ffffffff81214c96>] do_vfs_ioctl+0x96/0x6e0
      [ 1137.127013]        [<ffffffff81215321>] SyS_ioctl+0x41/0x70
      [ 1137.127013]        [<ffffffff81c56825>] entry_SYSCALL_64_fastpath+0x18/0xa8
      [ 1137.127013]
      other info that might help us debug this:
      
      [ 1137.127013]  Possible unsafe locking scenario:
      
      [ 1137.127013]        CPU0                    CPU1
      [ 1137.127013]        ----                    ----
      [ 1137.127013]   lock(detected_devices_mutex);
      [ 1137.127013]                                lock(&bdev->bd_mutex);
      [ 1137.127013]                                lock(detected_devices_mutex);
      [ 1137.127013]   lock(&bdev->bd_mutex);
      [ 1137.127013]
       *** DEADLOCK ***
      
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      90bcf133
    • S
      md/bitmap: fix wrong cleanup · f71f1cf9
      Shaohua Li 提交于
      if bitmap_create fails, the bitmap is already cleaned up and the returned value
      is an error number. We can't do the cleanup again.
      Reported-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Signed-off-by: NShaohua Li <shli@fb.com>
      f71f1cf9
    • S
      raid5: allow arbitrary max_hw_sectors · 1dffdddd
      Shaohua Li 提交于
      raid5 will split bio to proper size internally, there is no point to use
      underlayer disk's max_hw_sectors. In my qemu system, without the change,
      the raid5 only receives 128k size bio, which reduces the chance of bio
      merge sending to underlayer disks.
      Signed-off-by: NShaohua Li <shli@fb.com>
      1dffdddd
    • G
      lib/raid6: Add AVX512 optimized xor_syndrome functions · 694dda62
      Gayatri Kammela 提交于
      Optimize RAID6 xor_syndrome functions to take advantage of the 512-bit
      ZMM integer instructions introduced in AVX512.
      
      AVX512 optimized xor_syndrome functions, which is simply based on sse2.c
      written by hpa.
      
      The patch was tested and benchmarked before submission on
      a hardware that has AVX512 flags to support such instructions
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Megha Dey <megha.dey@linux.intel.com>
      Signed-off-by: NGayatri Kammela <gayatri.kammela@intel.com>
      Reviewed-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      694dda62
    • G
      lib/raid6/test/Makefile: Add avx512 gen_syndrome and recovery functions · 161db5d1
      Gayatri Kammela 提交于
      Adding avx512 gen_syndrome and recovery functions so as to allow code to
      be compiled and tested successfully in userspace.
      
      This patch is tested in userspace and improvement in performace is
      observed.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NMegha Dey <megha.dey@linux.intel.com>
      Signed-off-by: NGayatri Kammela <gayatri.kammela@intel.com>
      Reviewed-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      161db5d1
    • G
      lib/raid6: Add AVX512 optimized recovery functions · 13c520b2
      Gayatri Kammela 提交于
      Optimize RAID6 recovery functions to take advantage of
      the 512-bit ZMM integer instructions introduced in AVX512.
      
      AVX512 optimized recovery functions, which is simply based
      on recov_avx2.c written by Jim Kukunas
      
      This patch was tested and benchmarked before submission on
      a hardware that has AVX512 flags to support such instructions
      
      Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NMegha Dey <megha.dey@linux.intel.com>
      Signed-off-by: NGayatri Kammela <gayatri.kammela@intel.com>
      Reviewed-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      13c520b2
    • G
      lib/raid6: Add AVX512 optimized gen_syndrome functions · e0a491c1
      Gayatri Kammela 提交于
      Optimize RAID6 gen_syndrom functions to take advantage of
      the 512-bit ZMM integer instructions introduced in AVX512.
      
      AVX512 optimized gen_syndrom functions, which is simply based
      on avx2.c written by Yuanhan Liu and sse2.c written by hpa.
      
      The patch was tested and benchmarked before submission on
      a hardware that has AVX512 flags to support such instructions
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NMegha Dey <megha.dey@linux.intel.com>
      Signed-off-by: NGayatri Kammela <gayatri.kammela@intel.com>
      Reviewed-by: NFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      e0a491c1
    • G
      md-cluster: make resync lock also could be interruptted · d6385db9
      Guoqing Jiang 提交于
      When one node is perform resync or recovery, other nodes
      can't get resync lock and could block for a while before
      it holds the lock, so we can't stop array immediately for
      this scenario.
      
      To make array could be stop quickly, we check MD_CLOSING
      in dlm_lock_sync_interruptible to make us can interrupt
      the lock request.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      d6385db9
    • G
      md-cluster: introduce dlm_lock_sync_interruptible to fix tasks hang · 7bcda714
      Guoqing Jiang 提交于
      When some node leaves cluster, then it's bitmap need to be
      synced by another node, so "md*_recover" thread is triggered
      for the purpose. However, with below steps. we can find tasks
      hang happened either in B or C.
      
      1. Node A create a resyncing cluster raid1, assemble it in
         other two nodes (B and C).
      2. stop array in B and C.
      3. stop array in A.
      
      linux44:~ # ps aux|grep md|grep D
      root	5938	0.0  0.1  19852  1964 pts/0    D+   14:52   0:00 mdadm -S md0
      root	5939	0.0  0.0      0     0 ?        D    14:52   0:00 [md0_recover]
      
      linux44:~ # cat /proc/5939/stack
      [<ffffffffa04cf321>] dlm_lock_sync+0x71/0x90 [md_cluster]
      [<ffffffffa04d0705>] recover_bitmaps+0x125/0x220 [md_cluster]
      [<ffffffffa052105d>] md_thread+0x16d/0x180 [md_mod]
      [<ffffffff8107ad94>] kthread+0xb4/0xc0
      [<ffffffff8152a518>] ret_from_fork+0x58/0x90
      
      linux44:~ # cat /proc/5938/stack
      [<ffffffff8107afde>] kthread_stop+0x6e/0x120
      [<ffffffffa0519da0>] md_unregister_thread+0x40/0x80 [md_mod]
      [<ffffffffa04cfd20>] leave+0x70/0x120 [md_cluster]
      [<ffffffffa0525e24>] md_cluster_stop+0x14/0x30 [md_mod]
      [<ffffffffa05269ab>] bitmap_free+0x14b/0x150 [md_mod]
      [<ffffffffa0523f3b>] do_md_stop+0x35b/0x5a0 [md_mod]
      [<ffffffffa0524e83>] md_ioctl+0x873/0x1590 [md_mod]
      [<ffffffff81288464>] blkdev_ioctl+0x214/0x7d0
      [<ffffffff811dd3dd>] block_ioctl+0x3d/0x40
      [<ffffffff811b92d4>] do_vfs_ioctl+0x2d4/0x4b0
      [<ffffffff811b9538>] SyS_ioctl+0x88/0xa0
      [<ffffffff8152a5c9>] system_call_fastpath+0x16/0x1b
      
      The problem is caused by recover_bitmaps can't reliably abort
      when the thread is unregistered. So dlm_lock_sync_interruptible
      is introduced to detect the thread's situation to fix the problem.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      7bcda714
    • G
      md-cluster: convert the completion to wait queue · fccb60a4
      Guoqing Jiang 提交于
      Previously, we used completion to sync between require dlm lock
      and sync_ast, however we will have to expose completion.wait
      and completion.done in dlm_lock_sync_interruptible (introduced
      later), it is not a common usage for completion, so convert
      related things to wait queue.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      fccb60a4
    • G
      md-cluster: protect md_find_rdev_nr_rcu with rcu lock · 5f0aa21d
      Guoqing Jiang 提交于
      We need to use rcu_read_lock/unlock to avoid potential
      race.
      Reported-by: NShaohua Li <shli@fb.com>
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      5f0aa21d
    • G
      md-cluster: clean related infos of cluster · c20c33f0
      Guoqing Jiang 提交于
      cluster_info and bitmap_info.nodes also need to be
      cleared when array is stopped.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      c20c33f0
    • G
      md: changes for MD_STILL_CLOSED flag · af8d8e6f
      Guoqing Jiang 提交于
      When stop clustered raid while it is pending on resync,
      MD_STILL_CLOSED flag could be cleared since udev rule
      is triggered to open the mddev. So obviously array can't
      be stopped soon and returns EBUSY.
      
      	mdadm -Ss          md-raid-arrays.rules
        set MD_STILL_CLOSED          md_open()
      	... ... ...          clear MD_STILL_CLOSED
      	do_md_stop
      
      We make below changes to resolve this issue:
      
      1. rename MD_STILL_CLOSED to MD_CLOSING since it is set
         when stop array and it means we are stopping array.
      2. let md_open returns early if CLOSING is set, so no
         other threads will open array if one thread is trying
         to close it.
      3. no need to clear CLOSING bit in md_open because 1 has
         ensure the bit is cleared, then we also don't need to
         test CLOSING bit in do_md_stop.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      af8d8e6f
    • G
      md-cluster: remove some unnecessary dlm_unlock_sync · e3f924d3
      Guoqing Jiang 提交于
      Since DLM_LKF_FORCEUNLOCK is used in lockres_free,
      we don't need to call dlm_unlock_sync before free
      lock resource.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      e3f924d3
    • G
      md-cluster: use FORCEUNLOCK in lockres_free · 400cb454
      Guoqing Jiang 提交于
      For dlm_unlock, we need to pass flag to dlm_unlock as the
      third parameter instead of set res->flags.
      
      Also, DLM_LKF_FORCEUNLOCK is more suitable for dlm_unlock
      since it works even the lock is on waiting or convert queue.
      Acked-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      400cb454
    • G
      md-cluster: call md_kick_rdev_from_array once ack failed · e566aef1
      Guoqing Jiang 提交于
      The new_disk_ack could return failure if WAITING_FOR_NEWDISK
      is not set, so we need to kick the dev from array in case
      failure happened.
      
      And we missed to check err before call new_disk_ack othwise
      we could kick a rdev which isn't in array, thanks for the
      reminder from Shaohua.
      Reviewed-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NGuoqing Jiang <gqjiang@suse.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      e566aef1
  4. 21 9月, 2016 5 次提交
    • L
      Merge tag 'usercopy-v4.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux · 7d1e0423
      Linus Torvalds 提交于
      Pull usercopy hardening fix from Kees Cook:
       "Expand the arm64 vmalloc check to include skipping the module space
        too"
      
      * tag 'usercopy-v4.8-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
        mm: usercopy: Check for module addresses
      7d1e0423
    • A
      fix fault_in_multipages_...() on architectures with no-op access_ok() · e23d4159
      Al Viro 提交于
      Switching iov_iter fault-in to multipages variants has exposed an old
      bug in underlying fault_in_multipages_...(); they break if the range
      passed to them wraps around.  Normally access_ok() done by callers will
      prevent such (and it's a guaranteed EFAULT - ERR_PTR() values fall into
      such a range and they should not point to any valid objects).
      
      However, on architectures where userland and kernel live in different
      MMU contexts (e.g. s390) access_ok() is a no-op and on those a range
      with a wraparound can reach fault_in_multipages_...().
      
      Since any wraparound means EFAULT there, the fix is trivial - turn
      those
      
          while (uaddr <= end)
      	    ...
      into
      
          if (unlikely(uaddr > end))
      	    return -EFAULT;
          do
      	    ...
          while (uaddr <= end);
      Reported-by: NJan Stancek <jstancek@redhat.com>
      Tested-by: NJan Stancek <jstancek@redhat.com>
      Cc: stable@vger.kernel.org # v3.5+
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e23d4159
    • L
      mm: usercopy: Check for module addresses · aa4f0601
      Laura Abbott 提交于
      While running a compile on arm64, I hit a memory exposure
      
      usercopy: kernel memory exposure attempt detected from fffffc0000f3b1a8 (buffer_head) (1 bytes)
      ------------[ cut here ]------------
      kernel BUG at mm/usercopy.c:75!
      Internal error: Oops - BUG: 0 [#1] SMP
      Modules linked in: ip6t_rpfilter ip6t_REJECT
      nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_broute bridge stp
      llc ebtable_nat ip6table_security ip6table_raw ip6table_nat
      nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
      iptable_security iptable_raw iptable_nat nf_conntrack_ipv4
      nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle
      ebtable_filter ebtables ip6table_filter ip6_tables vfat fat xgene_edac
      xgene_enet edac_core i2c_xgene_slimpro i2c_core at803x realtek xgene_dma
      mdio_xgene gpio_dwapb gpio_xgene_sb xgene_rng mailbox_xgene_slimpro nfsd
      auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c sdhci_of_arasan
      sdhci_pltfm sdhci mmc_core xhci_plat_hcd gpio_keys
      CPU: 0 PID: 19744 Comm: updatedb Tainted: G        W 4.8.0-rc3-threadinfo+ #1
      Hardware name: AppliedMicro X-Gene Mustang Board/X-Gene Mustang Board, BIOS 3.06.12 Aug 12 2016
      task: fffffe03df944c00 task.stack: fffffe00d128c000
      PC is at __check_object_size+0x70/0x3f0
      LR is at __check_object_size+0x70/0x3f0
      ...
      [<fffffc00082b4280>] __check_object_size+0x70/0x3f0
      [<fffffc00082cdc30>] filldir64+0x158/0x1a0
      [<fffffc0000f327e8>] __fat_readdir+0x4a0/0x558 [fat]
      [<fffffc0000f328d4>] fat_readdir+0x34/0x40 [fat]
      [<fffffc00082cd8f8>] iterate_dir+0x190/0x1e0
      [<fffffc00082cde58>] SyS_getdents64+0x88/0x120
      [<fffffc0008082c70>] el0_svc_naked+0x24/0x28
      
      fffffc0000f3b1a8 is a module address. Modules may have compiled in
      strings which could get copied to userspace. In this instance, it
      looks like "." which matches with a size of 1 byte. Extend the
      is_vmalloc_addr check to be is_vmalloc_or_module_addr to cover
      all possible cases.
      Signed-off-by: NLaura Abbott <labbott@redhat.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      aa4f0601
    • J
      fs/proc/kcore.c: Add bounce buffer for ktext data · df04abfd
      Jiri Olsa 提交于
      We hit hardened usercopy feature check for kernel text access by reading
      kcore file:
      
        usercopy: kernel memory exposure attempt detected from ffffffff8179a01f (<kernel text>) (4065 bytes)
        kernel BUG at mm/usercopy.c:75!
      
      Bypassing this check for kcore by adding bounce buffer for ktext data.
      Reported-by: NSteve Best <sbest@redhat.com>
      Fixes: f5509cc1 ("mm: Hardened usercopy")
      Suggested-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      df04abfd
    • J
      fs/proc/kcore.c: Make bounce buffer global for read · f5beeb18
      Jiri Olsa 提交于
      Next patch adds bounce buffer for ktext area, so it's
      convenient to have single bounce buffer for both
      vmalloc/module and ktext cases.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f5beeb18
  5. 20 9月, 2016 15 次提交
反馈
建议
客服 返回
顶部