1. 08 10月, 2016 6 次提交
  2. 06 10月, 2016 1 次提交
  3. 01 10月, 2016 1 次提交
    • E
      ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock() · c33f0785
      Eric Ren 提交于
      The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally.
      
      In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it;
      there are 2 process repeatedly performing the following operations
      respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a',
      1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then
      ftruncate(fd, CLUSTER_SIZE) again and again.
      
      This is the backtrace when the deadlock happens:
      
         __wait_on_bit_lock+0x50/0xa0
         __lock_page+0xb7/0xc0
         ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2]
         ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2]
         do_page_mkwrite+0x66/0xc0
         handle_mm_fault+0x685/0x1350
         __do_page_fault+0x1d8/0x4d0
         trace_do_page_fault+0x37/0xf0
         do_async_page_fault+0x19/0x70
         async_page_fault+0x28/0x30
      
      In ocfs2_write_begin_nolock(), we first grab the pages and then allocate
      disk space for this write; ocfs2_try_to_free_truncate_log() will be
      called if -ENOSPC is returned; if we're lucky to get enough clusters,
      which is usually the case, we start over again.
      
      But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we
      will deadlock when trying to grab the target page again.
      
      Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write().
      Another deadlock will happen in __do_page_mkwrite() if
      ocfs2_page_mkwrite() returns non-VM_FAULT_LOCKED, and along with a
      locked target page.
      
      These two errors fail on the same path, so fix them by unlocking the
      target page manually before ocfs2_free_write_ctxt().
      
      Jan Kara helps me clear out the JBD2 part, and suggest the hint for root
      cause.
      
      Changes since v1:
      1. Also put ENOMEM error case into consideration.
      
      Link: http://lkml.kernel.org/r/1474173902-32075-1-git-send-email-zren@suse.comSigned-off-by: NEric Ren <zren@suse.com>
      Reviewed-by: NHe Gang <ghe@suse.com>
      Acked-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c33f0785
  4. 28 9月, 2016 1 次提交
  5. 27 9月, 2016 2 次提交
    • M
      fs: rename "rename2" i_op to "rename" · 2773bf00
      Miklos Szeredi 提交于
      Generated patch:
      
      sed -i "s/\.rename2\t/\.rename\t\t/" `git grep -wl rename2`
      sed -i "s/\brename2\b/rename/g" `git grep -wl rename2`
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      2773bf00
    • M
      fs: make remaining filesystems use .rename2 · 1cd66c93
      Miklos Szeredi 提交于
      This is trivial to do:
      
       - add flags argument to foo_rename()
       - check if flags is zero
       - assign foo_rename() to .rename2 instead of .rename
      
      This doesn't mean it's impossible to support RENAME_NOREPLACE for these
      filesystems, but it is not trivial, like for local filesystems.
      RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible
      for a file to be created on one host while it is overwritten by rename on
      another host).
      
      Filesystems converted:
      
      9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs.
      
      After this, we can get rid of the duplicate interfaces for rename.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: David Howells <dhowells@redhat.com> [AFS]
      Acked-by: NMike Marshall <hubcap@omnibond.com>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ilya Dryomov <idryomov@gmail.com>
      Cc: Jan Harkes <jaharkes@cs.cmu.edu>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      1cd66c93
  6. 22 9月, 2016 2 次提交
  7. 20 9月, 2016 6 次提交
    • J
      Revert "ocfs2: bump up o2cb network protocol version" · 63b52c49
      Junxiao Bi 提交于
      This reverts commit 38b52efd ("ocfs2: bump up o2cb network protocol
      version").
      
      This commit made rolling upgrade fail.  When one node is upgraded to new
      version with this commit, the remaining nodes will fail to establish
      connections to it, then the application like VMs on the remaining nodes
      can't be live migrated to the upgraded one.  This will cause an outage.
      Since negotiate hb timeout behavior didn't change without this commit,
      so revert it.
      
      Fixes: 38b52efd ("ocfs2: bump up o2cb network protocol version")
      Link: http://lkml.kernel.org/r/1471396924-10375-1-git-send-email-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63b52c49
    • A
      ocfs2: fix start offset to ocfs2_zero_range_for_truncate() · d21c353d
      Ashish Samant 提交于
      If we punch a hole on a reflink such that following conditions are met:
      
      1. start offset is on a cluster boundary
      2. end offset is not on a cluster boundary
      3. (end offset is somewhere in another extent) or
         (hole range > MAX_CONTIG_BYTES(1MB)),
      
      we dont COW the first cluster starting at the start offset.  But in this
      case, we were wrongly passing this cluster to
      ocfs2_zero_range_for_truncate() to zero out.  This will modify the
      cluster in place and zero it in the source too.
      
      Fix this by skipping this cluster in such a scenario.
      
      To reproduce:
      
      1. Create a random file of say 10 MB
           xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
      2. Reflink  it
           reflink -f 10MBfile reflnktest
      3. Punch a hole at starting at cluster boundary  with range greater that
      1MB. You can also use a range that will put the end offset in another
      extent.
           fallocate -p -o 0 -l 1048615 reflnktest
      4. sync
      5. Check the  first cluster in the source file. (It will be zeroed out).
          dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C
      
      Link: http://lkml.kernel.org/r/1470957147-14185-1-git-send-email-ashish.samant@oracle.comSigned-off-by: NAshish Samant <ashish.samant@oracle.com>
      Reported-by: NSaar Maoz <saar.maoz@oracle.com>
      Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Eric Ren <zren@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d21c353d
    • J
      ocfs2: fix double unlock in case retry after free truncate log · 3bb8b653
      Joseph Qi 提交于
      If ocfs2_reserve_cluster_bitmap_bits() fails with ENOSPC, it will try to
      free truncate log and then retry.  Since ocfs2_try_to_free_truncate_log
      will lock/unlock global bitmap inode, we have to unlock it before
      calling this function.  But when retry reserve and it fails with no
      global bitmap inode lock taken, it will unlock again in error handling
      branch and BUG.
      
      This issue also exists if no need retry and then ocfs2_inode_lock fails.
      So fix it.
      
      Fixes: 2070ad1a ("ocfs2: retry on ENOSPC if sufficient space in truncate log")
      Link: http://lkml.kernel.org/r/57D91939.6030809@huawei.comSigned-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Signed-off-by: NJiufei Xue <xuejiufei@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3bb8b653
    • J
      ocfs2: fix trans extend while free cached blocks · d5bf1418
      Junxiao Bi 提交于
      The root cause of this issue is the same with the one fixed by the last
      patch, but this time credits for allocator inode and group descriptor
      may not be consumed before trans extend.
      
      The following error was caught:
      
        WARNING: CPU: 0 PID: 2037 at fs/jbd2/transaction.c:269 start_this_handle+0x4c3/0x510 [jbd2]()
        Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront fb_sys_fops sysimgblt sysfillrect syscopyarea xen_netfront parport_pc parport pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
        CPU: 0 PID: 2037 Comm: rm Tainted: G        W       4.1.12-37.6.3.el6uek.bug24573128v2.x86_64 #2
        Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
        Call Trace:
          dump_stack+0x48/0x5c
          warn_slowpath_common+0x95/0xe0
          warn_slowpath_null+0x1a/0x20
          start_this_handle+0x4c3/0x510 [jbd2]
          jbd2__journal_restart+0x161/0x1b0 [jbd2]
          jbd2_journal_restart+0x13/0x20 [jbd2]
          ocfs2_extend_trans+0x74/0x220 [ocfs2]
          ocfs2_free_cached_blocks+0x16b/0x4e0 [ocfs2]
          ocfs2_run_deallocs+0x70/0x270 [ocfs2]
          ocfs2_commit_truncate+0x474/0x6f0 [ocfs2]
          ocfs2_truncate_for_delete+0xbd/0x380 [ocfs2]
          ocfs2_wipe_inode+0x136/0x6a0 [ocfs2]
          ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2]
          ocfs2_evict_inode+0x28/0x60 [ocfs2]
          evict+0xab/0x1a0
          iput_final+0xf6/0x190
          iput+0xc8/0xe0
          do_unlinkat+0x1b7/0x310
          SyS_unlinkat+0x22/0x40
          system_call_fastpath+0x12/0x71
        ---[ end trace a62437cb060baa71 ]---
        JBD2: rm wants too many credits (149 > 128)
      
      Link: http://lkml.kernel.org/r/1473674623-11810-2-git-send-email-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5bf1418
    • J
      ocfs2: fix trans extend while flush truncate log · 2b0ad008
      Junxiao Bi 提交于
      Every time, ocfs2_extend_trans() included a credit for truncate log
      inode, but as that inode had been managed by jbd2 running transaction
      first time, it will not consume that credit until
      jbd2_journal_restart().
      
      Since total credits to extend always included the un-consumed ones,
      there will be more and more un-consumed credit, at last
      jbd2_journal_restart() will fail due to credit number over the half of
      max transction credit.
      
      The following error was caught when unlinking a large file with many
      extents:
      
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 13626 at fs/jbd2/transaction.c:269 start_this_handle+0x4c3/0x510 [jbd2]()
        Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
        CPU: 0 PID: 13626 Comm: unlink Tainted: G        W       4.1.12-37.6.3.el6uek.x86_64 #2
        Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
        Call Trace:
          dump_stack+0x48/0x5c
          warn_slowpath_common+0x95/0xe0
          warn_slowpath_null+0x1a/0x20
          start_this_handle+0x4c3/0x510 [jbd2]
          jbd2__journal_restart+0x161/0x1b0 [jbd2]
          jbd2_journal_restart+0x13/0x20 [jbd2]
          ocfs2_extend_trans+0x74/0x220 [ocfs2]
          ocfs2_replay_truncate_records+0x93/0x360 [ocfs2]
          __ocfs2_flush_truncate_log+0x13e/0x3a0 [ocfs2]
          ocfs2_remove_btree_range+0x458/0x7f0 [ocfs2]
          ocfs2_commit_truncate+0x1b3/0x6f0 [ocfs2]
          ocfs2_truncate_for_delete+0xbd/0x380 [ocfs2]
          ocfs2_wipe_inode+0x136/0x6a0 [ocfs2]
          ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2]
          ocfs2_evict_inode+0x28/0x60 [ocfs2]
          evict+0xab/0x1a0
          iput_final+0xf6/0x190
          iput+0xc8/0xe0
          do_unlinkat+0x1b7/0x310
          SyS_unlink+0x16/0x20
          system_call_fastpath+0x12/0x71
        ---[ end trace 28aa7410e69369cf ]---
        JBD2: unlink wants too many credits (251 > 128)
      
      Link: http://lkml.kernel.org/r/1473674623-11810-1-git-send-email-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2b0ad008
    • J
      ocfs2/dlm: fix race between convert and migration · e6f0c6e6
      Joseph Qi 提交于
      Commit ac7cf246 ("ocfs2/dlm: fix race between convert and recovery")
      checks if lockres master has changed to identify whether new master has
      finished recovery or not.  This will introduce a race that right after
      old master does umount ( means master will change), a new convert
      request comes.
      
      In this case, it will reset lockres state to DLM_RECOVERING and then
      retry convert, and then fail with lockres->l_action being set to
      OCFS2_AST_INVALID, which will cause inconsistent lock level between
      ocfs2 and dlm, and then finally BUG.
      
      Since dlm recovery will clear lock->convert_pending in
      dlm_move_lockres_to_recovery_list, we can use it to correctly identify
      the race case between convert and recovery.  So fix it.
      
      Fixes: ac7cf246 ("ocfs2/dlm: fix race between convert and recovery")
      Link: http://lkml.kernel.org/r/57CE1569.8010704@huawei.comSigned-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Signed-off-by: NJun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6f0c6e6
  8. 03 8月, 2016 5 次提交
  9. 27 7月, 2016 7 次提交
  10. 21 7月, 2016 1 次提交
  11. 25 6月, 2016 1 次提交
  12. 21 6月, 2016 1 次提交
  13. 20 6月, 2016 1 次提交
    • A
      quota: use time64_t internally · e008bb61
      Arnd Bergmann 提交于
      The quota subsystem has two formats, the old v1 format using architecture
      specific time_t values on the on-disk format, while the v2 format
      (introduced in Linux 2.5.16 and 2.4.22) uses fixed 64-bit little-endian.
      
      While there is no future for the v1 format beyond y2038, the v2 format
      is almost there on 32-bit architectures, as both the user interface
      and the on-disk format use 64-bit timestamps, just not the time_t
      inbetween.
      
      This changes the internal representation to use time64_t, which will
      end up doing the right thing everywhere for v2 format.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      e008bb61
  14. 11 6月, 2016 1 次提交
    • L
      vfs: make the string hashes salt the hash · 8387ff25
      Linus Torvalds 提交于
      We always mixed in the parent pointer into the dentry name hash, but we
      did it late at lookup time.  It turns out that we can simplify that
      lookup-time action by salting the hash with the parent pointer early
      instead of late.
      
      A few other users of our string hashes also wanted to mix in their own
      pointers into the hash, and those are updated to use the same mechanism.
      
      Hash users that don't have any particular initial salt can just use the
      NULL pointer as a no-salt.
      
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: George Spelvin <linux@sciencehorizons.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8387ff25
  15. 08 6月, 2016 4 次提交