1. 08 6月, 2016 1 次提交
  2. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  3. 26 3月, 2016 3 次提交
    • J
      ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local · 35ddf78e
      jiangyiwen 提交于
      This patch fixes a deadlock, as follows:
      
        Node 1                Node 2                  Node 3
      1)volume a and b are    only mount vol a        only mount vol b
        mounted
      
      2)                      start to mount b        start to mount a
      
      3)                      check hb of Node 3      check hb of Node 2
                              in vol a, qs_holds++    in vol b, qs_holds++
      
      4) -------------------- all nodes' network down --------------------
      
      5)                      progress of mount b     the same situation as
                              failed, and then call   Node 2
                              ocfs2_dismount_volume.
                              but the process is hung,
                              since there is a work
                              in ocfs2_wq cannot beo
                              completed. This work is
                              about vol a, because
                              ocfs2_wq is global wq.
                              BTW, this work which is
                              scheduled in ocfs2_wq is
                              ocfs2_orphan_scan_work,
                              and the context in this work
                              needs to take inode lock
                              of orphan_dir, because
                              lockres owner are Node 1 and
                              all nodes' nework has been down
                              at the same time, so it can't
                              get the inode lock.
      
      6)                      Why can't this node be fenced
                              when network disconnected?
                              Because the process of
                              mount is hung what caused qs_holds
                              is not equal 0.
      
      Because all works in the ocfs2_wq are relative to the super block.
      
      The solution is to change the ocfs2_wq from global to local.  In other
      words, move it into struct ocfs2_super.
      Signed-off-by: NYiwen Jiang <jiangyiwen@huawei.com>
      Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Xue jiufei <xuejiufei@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35ddf78e
    • R
      ocfs2: fix ip_unaligned_aio deadlock with dio work queue · e63890f3
      Ryan Ding 提交于
      In the current implementation of unaligned aio+dio, lock order behave as
      follow:
      
      in user process context:
        -> call io_submit()
          -> get i_mutex
      		<== window1
            -> get ip_unaligned_aio
              -> submit direct io to block device
          -> release i_mutex
        -> io_submit() return
      
      in dio work queue context(the work queue is created in __blockdev_direct_IO):
        -> release ip_unaligned_aio
      		<== window2
          -> get i_mutex
            -> clear unwritten flag & change i_size
          -> release i_mutex
      
      There is a limitation to the thread number of dio work queue.  256 at
      default.  If all 256 thread are in the above 'window2' stage, and there
      is a user process in the 'window1' stage, the system will became
      deadlock.  Since the user process hold i_mutex to wait ip_unaligned_aio
      lock, while there is a direct bio hold ip_unaligned_aio mutex who is
      waiting for a dio work queue thread to be schedule.  But all the dio
      work queue thread is waiting for i_mutex lock in 'window2'.
      
      This case only happened in a test which send a large number(more than
      256) of aio at one io_submit() call.
      
      My design is to remove ip_unaligned_aio lock.  Change it to a sync io
      instead.  Just like ip_unaligned_aio lock, serialize the unaligned aio
      dio.
      
      [akpm@linux-foundation.org: remove OCFS2_IOCB_UNALIGNED_IO, per Junxiao Bi]
      Signed-off-by: NRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e63890f3
    • R
      ocfs2: record UNWRITTEN extents when populate write desc · 4506cfb6
      Ryan Ding 提交于
      To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock.
      
      There is still one issue in the direct write procedure.
      
      phase 1: alloc extent with UNWRITTEN flag
      phase 2: submit direct data to disk, add zero page to page cache
      phase 3: clear UNWRITTEN flag when data has been written to disk
      
      When there are 2 direct write A(0~3KB),B(4~7KB) writing to the same
      cluster 0~7KB (cluster size 8KB).  Write request A arrive phase 2 first,
      it will zero the region (4~7KB).  Before request A enter to phase 3,
      request B arrive phase 2, it will zero region (0~3KB).  This is just like
      request B steps request A.
      
      To resolve this issue, we should let request B knows this cluster is already
      under zero, to prevent it from steps the previous write request.
      
      This patch will add function ocfs2_unwritten_check() to do this job.  It
      will record all clusters that are under direct write(it will be recorded
      in the 'ip_unwritten_list' member of inode info), and prevent the later
      direct write writing to the same cluster to do the zero work again.
      Signed-off-by: NRyan Ding <ryan.ding@oracle.com>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4506cfb6
  4. 23 3月, 2016 1 次提交
  5. 16 3月, 2016 1 次提交
  6. 15 1月, 2016 3 次提交
  7. 05 9月, 2015 4 次提交
    • K
      fs: create and use seq_show_option for escaping · a068acf2
      Kees Cook 提交于
      Many file systems that implement the show_options hook fail to correctly
      escape their output which could lead to unescaped characters (e.g.  new
      lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
      could lead to confusion, spoofed entries (resulting in things like
      systemd issuing false d-bus "mount" notifications), and who knows what
      else.  This looks like it would only be the root user stepping on
      themselves, but it's possible weird things could happen in containers or
      in other situations with delegated mount privileges.
      
      Here's an example using overlay with setuid fusermount trusting the
      contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
      of "sudo" is something more sneaky:
      
        $ BASE="ovl"
        $ MNT="$BASE/mnt"
        $ LOW="$BASE/lower"
        $ UP="$BASE/upper"
        $ WORK="$BASE/work/ 0 0
        none /proc fuse.pwn user_id=1000"
        $ mkdir -p "$LOW" "$UP" "$WORK"
        $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
        $ cat /proc/mounts
        none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
        none /proc fuse.pwn user_id=1000 0 0
        $ fusermount -u /proc
        $ cat /proc/mounts
        cat: /proc/mounts: No such file or directory
      
      This fixes the problem by adding new seq_show_option and
      seq_show_option_n helpers, and updating the vulnerable show_option
      handlers to use them as needed.  Some, like SELinux, need to be open
      coded due to unusual existing escape mechanisms.
      
      [akpm@linux-foundation.org: add lost chunk, per Kees]
      [keescook@chromium.org: seq_show_option should be using const parameters]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NJan Kara <jack@suse.com>
      Acked-by: NPaul Moore <paul@paul-moore.com>
      Cc: J. R. Okajima <hooanon05g@gmail.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a068acf2
    • J
      ocfs2: neaten do_error, ocfs2_error and ocfs2_abort · 7ecef14a
      Joe Perches 提交于
      These uses sometimes do and sometimes don't have '\n' terminations.  Make
      the uses consistently use '\n' terminations and remove the newline from
      the functions.
      
      Miscellanea:
      
      o Coalesce formats
      o Realign arguments
      Signed-off-by: NJoe Perches <joe@perches.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ecef14a
    • G
      ocfs2: add errors=continue · 7d0fb914
      Goldwyn Rodrigues 提交于
      OCFS2 is often used in high-availaibility systems.  However, ocfs2
      converts the filesystem to read-only at the drop of the hat.  This may
      not be necessary, since turning the filesystem read-only would affect
      other running processes as well, decreasing availability.
      
      This attempt is to add errors=continue, which would return the EIO to
      the calling process and terminate furhter processing so that the
      filesystem is not corrupted further.  However, the filesystem is not
      converted to read-only.
      
      As a future plan, I intend to create a small utility or extend
      fsck.ocfs2 to fix small errors such as in the inode.  The input to the
      utility such as the inode can come from the kernel logs so we don't have
      to schedule a downtime for fixing small-enough errors.
      
      The patch changes the ocfs2_error to return an error.  The error
      returned depends on the mount option set.  If none is set, the default
      is to turn the filesystem read-only.
      
      Perhaps errors=continue is not the best option name.  Historically it is
      used for making an attempt to progress in the current process itself.
      Should we call it errors=eio? or errors=killproc? Suggestions/Comments
      welcome.
      
      Sources are available at:
        https://github.com/goldwynr/linux/tree/error-contSigned-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d0fb914
    • J
      ocfs2: fix race between dio and recover orphan · 512f62ac
      Joseph Qi 提交于
      During direct io the inode will be added to orphan first and then
      deleted from orphan.  There is a race window that the orphan entry will
      be deleted twice and thus trigger the BUG when validating
      OCFS2_DIO_ORPHANED_FL in ocfs2_del_inode_from_orphan.
      
      ocfs2_direct_IO_write
          ...
          ocfs2_add_inode_to_orphan
          >>>>>>>> race window.
                   1) another node may rm the file and then down, this node
                   take care of orphan recovery and clear flag
                   OCFS2_DIO_ORPHANED_FL.
                   2) since rw lock is unlocked, it may race with another
                   orphan recovery and append dio.
          ocfs2_del_inode_from_orphan
      
      So take inode mutex lock when recovering orphans and make rw unlock at the
      end of aio write in case of append dio.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Reported-by: NYiwen Jiang <jiangyiwen@huawei.com>
      Cc: Weiwei Wang <wangww631@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      512f62ac
  8. 22 4月, 2015 1 次提交
    • L
      Revert "ocfs2: incorrect check for debugfs returns" · 8f443e23
      Linus Torvalds 提交于
      This reverts commit e2ac55b6.
      
      Huang Ying reports that this causes a hang at boot with debugfs disabled.
      
      It is true that the debugfs error checks are kind of confusing, and this
      code certainly merits more cleanup and thinking about it, but there's
      something wrong with the trivial "check not just for NULL, but for error
      pointers too" patch.
      
      Yes, with debugfs disabled, we will end up setting the o2hb_debug_dir
      pointer variable to an error pointer (-ENODEV), and then continue as if
      everything was fine.  But since debugfs is disabled, all the _users_ of
      that pointer end up being compiled away, so even though the pointer can
      not be dereferenced, that's still fine.
      
      So it's confusing and somewhat questionable, but the "more correct"
      error checks end up causing more trouble than they fix.
      Reported-by: NHuang Ying <ying.huang@intel.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NChengyu Song <csong84@gatech.edu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f443e23
  9. 15 4月, 2015 4 次提交
    • V
      cleancache: zap uuid arg of cleancache_init_shared_fs · 9de16262
      Vladimir Davydov 提交于
      Use super_block->s_uuid instead.  Every shared filesystem using cleancache
      must now initialize super_block->s_uuid before calling
      cleancache_init_shared_fs.  The only one on the tree, ocfs2, already meets
      this requirement.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Stefan Hengelein <ilendir@googlemail.com>
      Cc: Florian Schmaus <fschmaus@gmail.com>
      Cc: Andor Daam <andor.daam@googlemail.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Bob Liu <lliubbo@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9de16262
    • V
      ocfs2: copy fs uuid to superblock · 58be19dc
      Vladimir Davydov 提交于
      Currently, maximal number of cleancache enabled filesystems equals 32,
      which is insufficient nowadays, because a Linux host can have hundreds
      of containers on board, each of which might want its own filesystem.
      This patch set targets at removing this limitation - see patch 4 for
      more details.  Patches 1-3 prepare the code for this change.
      
      This patch (of 4):
      
      This will allow us to remove the uuid argument from
      cleancache_init_shared_fs.
      Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Stefan Hengelein <ilendir@googlemail.com>
      Cc: Florian Schmaus <fschmaus@gmail.com>
      Cc: Andor Daam <andor.daam@googlemail.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Bob Liu <lliubbo@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58be19dc
    • J
      ocfs2: logging: remove static buffer, use vsprintf extension %pV · 1543306e
      Joe Perches 提交于
      Use the vsprintf %pV extension to avoid using a static buffer and remove
      the now unnecessary buffer.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1543306e
    • C
      ocfs2: incorrect check for debugfs returns · e2ac55b6
      Chengyu Song 提交于
      debugfs_create_dir and debugfs_create_file may return -ENODEV when debugfs
      is not configured, so the return value should be checked against
      ERROR_VALUE as well, otherwise the later dereference of the dentry pointer
      would crash the kernel.
      
      This patch tries to solve this problem by fixing certain checks. However,
      I have that found other call sites are protected by #ifdef CONFIG_DEBUG_FS.
      In current implementation, if CONFIG_DEBUG_FS is defined, then the above
      two functions will never return any ERROR_VALUE. So another possibility
      to fix this is to surround all the buggy checks/functions with the same
      #ifdef CONFIG_DEBUG_FS. But I'm not sure if this would break any functionality,
      as only OCFS2_FS_STATS declares dependency on DEBUG_FS.
      Signed-off-by: NChengyu Song <csong84@gatech.edu>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e2ac55b6
  10. 17 2月, 2015 1 次提交
  11. 11 2月, 2015 1 次提交
  12. 30 1月, 2015 1 次提交
    • J
      ocfs2: Use generic helpers for quotaon and quotaoff · 664dbd5f
      Jan Kara 提交于
      Ocfs2 can just use the generic helpers provided by quota code for
      turning quotas on and off when quota files are stored as system inodes.
      The only difference is the feature test in ocfs2_quota_on() and that is
      covered by dquot_quota_enable() checking whether usage tracking is
      enabled (which can happen only if the filesystem has the quota feature
      set).
      Signed-off-by: NJan Kara <jack@suse.cz>
      664dbd5f
  13. 11 12月, 2014 1 次提交
  14. 10 11月, 2014 1 次提交
  15. 26 9月, 2014 1 次提交
  16. 17 9月, 2014 1 次提交
    • J
      ocfs2: Don't use MAXQUOTAS value · 52362810
      Jan Kara 提交于
      MAXQUOTAS value defines maximum number of quota types VFS supports.
      This isn't necessarily the number of types ocfs2 supports and with
      addition of project quotas these two numbers stop matching. So make
      ocfs2 use its private definition.
      
      CC: Mark Fasheh <mfasheh@suse.com>
      CC: Joel Becker <jlbec@evilplan.org>
      CC: ocfs2-devel@oss.oracle.com
      Signed-off-by: NJan Kara <jack@suse.cz>
      52362810
  17. 24 6月, 2014 1 次提交
  18. 05 6月, 2014 2 次提交
  19. 04 4月, 2014 6 次提交
    • J
      ocfs2: avoid system inode ref confusion by adding mutex lock · 43b10a20
      jiangyiwen 提交于
      The following case may lead to the same system inode ref in confusion.
      
      A thread                            B thread
      ocfs2_get_system_file_inode
      ->get_local_system_inode
      ->_ocfs2_get_system_file_inode
                                          because of *arr == NULL,
                                          ocfs2_get_system_file_inode
                                          ->get_local_system_inode
                                          ->_ocfs2_get_system_file_inode
      gets first ref thru
      _ocfs2_get_system_file_inode,
      gets second ref thru igrab and
      set *arr = inode
                                          at the moment, B thread also gets
                                          two refs, so lead to one more
                                          inode ref.
      
      So add mutex lock to avoid multi thread set two inode ref once at the
      same time.
      Signed-off-by: Njiangyiwen <jiangyiwen@huawei.com>
      Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43b10a20
    • G
      ocfs2: revert iput deferring code in ocfs2_drop_dentry_lock · 8ed6b237
      Goldwyn Rodrigues 提交于
      The following patches are reverted in this patch because these patches
      caused performance regression in the remote unlink() calls.
      
        ea455f8a - ocfs2: Push out dropping of dentry lock to ocfs2_wq
        f7b1aa69 - ocfs2: Fix deadlock on umount
        5fd13189 - ocfs2: Don't oops in ocfs2_kill_sb on a failed mount
      
      Previous patches in this series removed the possible deadlocks from
      downconvert thread so the above patches shouldn't be needed anymore.
      
      The regression is caused because these patches delay the iput() in case
      of dentry unlocks.  This also delays the unlocking of the open lockres.
      The open lockresource is required to test if the inode can be wiped from
      disk or not.  When the deleting node does not get the open lock, it
      marks it as orphan (even though it is not in use by another
      node/process) and causes a journal checkpoint.  This delays operations
      following the inode eviction.  This also moves the inode to the orphaned
      inode which further causes more I/O and a lot of unneccessary orphans.
      
      The following script can be used to generate the load causing issues:
      
        declare -a create
        declare -a remove
        declare -a iterations=(1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384)
        unique="`mktemp -u XXXXX`"
        script="/tmp/idontknow-${unique}.sh"
        cat <<EOF > "${script}"
        for n in {1..8}; do mkdir -p test/dir\${n}
          eval touch test/dir\${n}/foo{1.."\$1"}
        done
        EOF
        chmod 700 "${script}"
      
        function fcreate ()
        {
          exec 2>&1 /usr/bin/time --format=%E "${script}" "$1"
        }
      
        function fremove ()
        {
          exec 2>&1 /usr/bin/time --format=%E ssh node2 "cd `pwd`; rm -Rf test*"
        }
      
        function fcp ()
        {
          exec 2>&1 /usr/bin/time --format=%E ssh node3 "cd `pwd`; cp -R test test.new"
        }
      
        echo -------------------------------------------------
        echo "| # files | create #s | copy #s | remove #s |"
        echo -------------------------------------------------
        for ((x=0; x < ${#iterations[*]} ; x++)) do
          create[$x]="`fcreate ${iterations[$x]}`"
          copy[$x]="`fcp ${iterations[$x]}`"
          remove[$x]="`fremove`"
          printf "| %8d | %9s | %9s | %9s |\n" ${iterations[$x]} ${create[$x]} ${copy[$x]} ${remove[$x]}
        done
        rm "${script}"
        echo "------------------------"
      Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ed6b237
    • J
      ocfs2: implement delayed dropping of last dquot reference · e3a767b6
      Jan Kara 提交于
      We cannot drop last dquot reference from downconvert thread as that
      creates the following deadlock:
      
      NODE 1                                  NODE2
      holds dentry lock for 'foo'
      holds inode lock for GLOBAL_BITMAP_SYSTEM_INODE
                                              dquot_initialize(bar)
                                                ocfs2_dquot_acquire()
                                                  ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
                                                  ...
      downconvert thread (triggered from another
      node or a different process from NODE2)
        ocfs2_dentry_post_unlock()
          ...
          iput(foo)
            ocfs2_evict_inode(foo)
              ocfs2_clear_inode(foo)
                dquot_drop(inode)
                  ...
      	    ocfs2_dquot_release()
                    ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
                     - blocks
                                                  finds we need more space in
                                                  quota file
                                                  ...
                                                  ocfs2_extend_no_holes()
                                                    ocfs2_inode_lock(GLOBAL_BITMAP_SYSTEM_INODE)
                                                      - deadlocks waiting for
                                                        downconvert thread
      
      We solve the problem by postponing dropping of the last dquot reference to
      a workqueue if it happens from the downconvert thread.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e3a767b6
    • D
      ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file · 2931cdcb
      Darrick J. Wong 提交于
      Currently, ocfs2_sync_file grabs i_mutex and forces the current journal
      transaction to complete.  This isn't terribly efficient, since sync_file
      really only needs to wait for the last transaction involving that inode
      to complete, and this doesn't require i_mutex.
      
      Therefore, implement the necessary bits to track the newest tid
      associated with an inode, and teach sync_file to wait for that instead
      of waiting for everything in the journal to commit.  Furthermore, only
      issue the flush request to the drive if jbd2 hasn't already done so.
      
      This also eliminates the deadlock between ocfs2_file_aio_write() and
      ocfs2_sync_file().  aio_write takes i_mutex then calls
      ocfs2_aiodio_wait() to wait for unaligned dio writes to finish.
      However, if that dio completion involves calling fsync, then we can get
      into trouble when some ocfs2_sync_file tries to take i_mutex.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2931cdcb
    • J
      ocfs2: remove unused variable uuid_net_key in ocfs2_initialize_super · a75fe48c
      joyce.xue 提交于
      Variable uuid_net_key in ocfs2_initialize_super() is not used.  Clean it
      up.
      Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Acked-by: NMark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a75fe48c
    • W
      ocfs2: change ip_unaligned_aio to of type mutex from atomit_t · c18ceab0
      Wengang Wang 提交于
      There is a problem that waitqueue_active() may check stale data thus miss
      a wakeup of threads waiting on ip_unaligned_aio.
      
      The valid value of ip_unaligned_aio is only 0 and 1 so we can change it to
      be of type mutex thus the above prolem is avoid.  Another benifit is that
      mutex which works as FIFO is fairer than wake_up_all().
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c18ceab0
  20. 13 3月, 2014 1 次提交
    • T
      fs: push sync_filesystem() down to the file system's remount_fs() · 02b9984d
      Theodore Ts'o 提交于
      Previously, the no-op "mount -o mount /dev/xxx" operation when the
      file system is already mounted read-write causes an implied,
      unconditional syncfs().  This seems pretty stupid, and it's certainly
      documented or guaraunteed to do this, nor is it particularly useful,
      except in the case where the file system was mounted rw and is getting
      remounted read-only.
      
      However, it's possible that there might be some file systems that are
      actually depending on this behavior.  In most file systems, it's
      probably fine to only call sync_filesystem() when transitioning from
      read-write to read-only, and there are some file systems where this is
      not needed at all (for example, for a pseudo-filesystem or something
      like romfs).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Cc: Jan Kara <jack@suse.cz>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Anders Larsen <al@alarsen.net>
      Cc: Phillip Lougher <phillip@squashfs.org.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: xfs@oss.sgi.com
      Cc: linux-btrfs@vger.kernel.org
      Cc: linux-cifs@vger.kernel.org
      Cc: samba-technical@lists.samba.org
      Cc: codalist@coda.cs.cmu.edu
      Cc: linux-ext4@vger.kernel.org
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: fuse-devel@lists.sourceforge.net
      Cc: cluster-devel@redhat.com
      Cc: linux-mtd@lists.infradead.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-nilfs@vger.kernel.org
      Cc: linux-ntfs-dev@lists.sourceforge.net
      Cc: ocfs2-devel@oss.oracle.com
      Cc: reiserfs-devel@vger.kernel.org
      02b9984d
  21. 22 1月, 2014 3 次提交
  22. 13 11月, 2013 1 次提交