1. 04 4月, 2014 17 次提交
    • J
      ocfs2: remove OCFS2_INODE_SKIP_DELETE flag · 7bf619c1
      Jan Kara 提交于
      The flag was never set, delete it.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bf619c1
    • G
      ocfs2: add dlm_recover_callback_support in sysfs · 765aabbb
      Goldwyn Rodrigues 提交于
      This is a part of the nocontrold feature which was incorporated sometime
      back.
      
      This is required for backward compatibility of the tools, specifically
      the scenario where the tools with recovery callback is used with a
      kernel not using the recovery callbacks (older kernel + newer tools).
      The tools look for this file to understand if the kernel supports DLM
      recovery callbacks.
      
      For kernels which support recovery callbacks but will miss this patch,
      ocfs2 will continue to use the older API and would still be able to
      mount the filesystem.
      
      [akpm@linux-foundation.org: simplify]
      [sfr@canb.auug.org.au: VERIFY_OCTAL_PERMISSIONS fix up]
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      765aabbb
    • J
      ocfs2: dlm: fix recovery hung · ded2cf71
      Junxiao Bi 提交于
      There is a race window in dlm_do_recovery() between dlm_remaster_locks()
      and dlm_reset_recovery() when the recovery master nearly finish the
      recovery process for a dead node.  After the master sends FINALIZE_RECO
      message in dlm_remaster_locks(), another node may become the recovery
      master for another dead node, and then send the BEGIN_RECO message to
      all the nodes included the old master, in the handler of this message
      dlm_begin_reco_handler() of old master, dlm->reco.dead_node and
      dlm->reco.new_master will be set to the second dead node and the new
      master, then in dlm_reset_recovery(), these two variables will be reset
      to default value.  This will cause new recovery master can not finish
      the recovery process and hung, at last the whole cluster will hung for
      recovery.
      
      old recovery master:                                 new recovery master:
      dlm_remaster_locks()
                                                        become recovery master for
                                                        another dead node.
                                                        dlm_send_begin_reco_message()
      dlm_begin_reco_handler()
      {
       if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) {
        return -EAGAIN;
       }
       dlm_set_reco_master(dlm, br->node_idx);
       dlm_set_reco_dead_node(dlm, br->dead_node);
      }
      dlm_reset_recovery()
      {
       dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
       dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM);
      }
                                                        will hang in dlm_remaster_locks() for
                                                        request dlm locks info
      
      Before send FINALIZE_RECO message, recovery master should set
      DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done,
      this can break the race windows as the BEGIN_RECO messages will not be
      handled before DLM_RECO_STATE_FINALIZE flag is cleared.
      
      A similar race may happen between new recovery master and normal node
      which is in dlm_finalize_reco_handler(), also fix it.
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Reviewed-by: NWengang Wang <wen.gang.wang@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ded2cf71
    • J
      ocfs2: dlm: fix lock migration crash · 34aa8dac
      Junxiao Bi 提交于
      This issue was introduced by commit 800deef3 ("ocfs2: use
      list_for_each_entry where benefical") in 2007 where it replaced
      list_for_each with list_for_each_entry.  The variable "lock" will point
      to invalid data if "tmpq" list is empty and a panic will be triggered
      due to this.  Sunil advised reverting it back, but the old version was
      also not right.  At the end of the outer for loop, that
      list_for_each_entry will also set "lock" to an invalid data, then in the
      next loop, if the "tmpq" list is empty, "lock" will be an stale invalid
      data and cause the panic.  So reverting the list_for_each back and reset
      "lock" to NULL to fix this issue.
      
      Another concern is that this seemes can not happen because the "tmpq"
      list should not be empty.  Let me describe how.
      
      old lock resource owner(node 1):                                  migratation target(node 2):
      image there's lockres with a EX lock from node 2 in
      granted list, a NR lock from node x with convert_type
      EX in converting list.
      dlm_empty_lockres() {
       dlm_pick_migration_target() {
         pick node 2 as target as its lock is the first one
         in granted list.
       }
       dlm_migrate_lockres() {
         dlm_mark_lockres_migrating() {
           res->state |= DLM_LOCK_RES_BLOCK_DIRTY;
           wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
      	 //after the above code, we can not dirty lockres any more,
           // so dlm_thread shuffle list will not run
                                                                         downconvert lock from EX to NR
                                                                         upconvert lock from NR to EX
      <<< migration may schedule out here, then
      <<< node 2 send down convert request to convert type from EX to
      <<< NR, then send up convert request to convert type from NR to
      <<< EX, at this time, lockres granted list is empty, and two locks
      <<< in the converting list, node x up convert lock followed by
      <<< node 2 up convert lock.
      
      	 // will set lockres RES_MIGRATING flag, the following
      	 // lock/unlock can not run
           dlm_lockres_release_ast(dlm, res);
         }
      
         dlm_send_one_lockres()
                                                                       dlm_process_recovery_data()
                                                                         for (i=0; i<mres->num_locks; i++)
                                                                           if (ml->node == dlm->node_num)
                                                                             for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) {
                                                                              list_for_each_entry(lock, tmpq, list)
                                                                              if (lock) break; <<< lock is invalid as grant list is empty.
                                                                             }
                                                                             if (lock->ml.node != ml->node)
                                                                               BUG() >>> crash here
       }
      
      I see the above locks status from a vmcore of our internal bug.
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NWengang Wang <wen.gang.wang@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34aa8dac
    • D
      ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file · 2931cdcb
      Darrick J. Wong 提交于
      Currently, ocfs2_sync_file grabs i_mutex and forces the current journal
      transaction to complete.  This isn't terribly efficient, since sync_file
      really only needs to wait for the last transaction involving that inode
      to complete, and this doesn't require i_mutex.
      
      Therefore, implement the necessary bits to track the newest tid
      associated with an inode, and teach sync_file to wait for that instead
      of waiting for everything in the journal to commit.  Furthermore, only
      issue the flush request to the drive if jbd2 hasn't already done so.
      
      This also eliminates the deadlock between ocfs2_file_aio_write() and
      ocfs2_sync_file().  aio_write takes i_mutex then calls
      ocfs2_aiodio_wait() to wait for unaligned dio writes to finish.
      However, if that dio completion involves calling fsync, then we can get
      into trouble when some ocfs2_sync_file tries to take i_mutex.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2931cdcb
    • J
      ocfs2: remove unused variable uuid_net_key in ocfs2_initialize_super · a75fe48c
      joyce.xue 提交于
      Variable uuid_net_key in ocfs2_initialize_super() is not used.  Clean it
      up.
      Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Acked-by: NMark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a75fe48c
    • W
      ocfs2: change ip_unaligned_aio to of type mutex from atomit_t · c18ceab0
      Wengang Wang 提交于
      There is a problem that waitqueue_active() may check stale data thus miss
      a wakeup of threads waiting on ip_unaligned_aio.
      
      The valid value of ip_unaligned_aio is only 0 and 1 so we can change it to
      be of type mutex thus the above prolem is avoid.  Another benifit is that
      mutex which works as FIFO is fairer than wake_up_all().
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c18ceab0
    • Z
      ocfs2: fix null pointer dereference when access dlm_state before launching dlm thread · 181a9a04
      Zongxun Wang 提交于
      When mounting an ocfs2 volume, it will firstly generate a file
      /sys/kernel/debug/o2dlm/<uuid>/dlm_state, and then launch the dlm thread.
      So the following situation will cause a null pointer dereference.
      dlm_debug_init -> access file dlm_state which will call dlm_state_print ->
      dlm_launch_thread
      
      Move dlm_debug_init after dlm_launch_thread and dlm_launch_recovery_thread
      can fix this issue.
      Signed-off-by: NZongxun Wang <wangzongxun@huawei.com>
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      181a9a04
    • J
      fanotify: move unrelated handling from copy_event_to_user() · d507816b
      Jan Kara 提交于
      Move code moving event structure to access_list from copy_event_to_user()
      to fanotify_read() where it is more logical (so that we can immediately
      see in the main loop that we either move the event to a different list
      or free it).  Also move special error handling for permission events
      from copy_event_to_user() to the main loop to have it in one place with
      error handling for normal events.  This makes copy_event_to_user()
      really only copy the event to user without any side effects.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d507816b
    • J
      fanotify: reorganize loop in fanotify_read() · d8aaab4f
      Jan Kara 提交于
      Swap the error / "read ok" branches in the main loop of fanotify_read().
      We will grow the "read ok" part in the next patch and this makes the
      indentation easier.  Also it is more common to have error conditions
      inside an 'if' instead of the fast path.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d8aaab4f
    • J
      fanotify: convert access_mutex to spinlock · 9573f793
      Jan Kara 提交于
      access_mutex is used only to guard operations on access_list.  There's
      no need for sleeping within this lock so just make a spinlock out of it.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9573f793
    • J
      fanotify: use fanotify event structure for permission response processing · f083441b
      Jan Kara 提交于
      Currently, fanotify creates new structure to track the fact that
      permission event has been reported to userspace and someone is waiting
      for a response to it.  As event structures are now completely in the
      hands of each notification framework, we can use the event structure for
      this tracking instead of allocating a new structure.
      
      Since this makes the event structures for normal events and permission
      events even more different and the structures have different lifetime
      rules, we split them into two separate structures (where permission
      event structure contains the structure for a normal event).  This makes
      normal events 8 bytes smaller and the code a tad bit cleaner.
      
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f083441b
    • J
      fanotify: remove useless bypass_perm check · 3298cf37
      Jan Kara 提交于
      The prepare_for_access_response() function checks whether
      group->fanotify_data.bypass_perm is set.  However this test can never be
      true because prepare_for_access_response() is called only from
      fanotify_read() which means fanotify group is alive with an active fd
      while bypass_perm is set from fanotify_release() when all file
      descriptors pointing to the group are closed and the group is going
      away.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3298cf37
    • F
      fs/freevxfs/vxfs_lookup.c: update function comment · ddae82d8
      Fabian Frederick 提交于
      nameidata was replaced by flags in commit 00cd8dd3 ("stop passing
      nameidata to ->lookup()").
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ddae82d8
    • F
      fs/cifs/cifsfs.c: add __init to cifs_init_inodecache() · 9ee108b2
      Fabian Frederick 提交于
      cifs_init_inodecache is only called by __init init_cifs.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ee108b2
    • J
      bdi: avoid oops on device removal · 5acda9d1
      Jan Kara 提交于
      After commit 839a8e86 ("writeback: replace custom worker pool
      implementation with unbound workqueue") when device is removed while we
      are writing to it we crash in bdi_writeback_workfn() ->
      set_worker_desc() because bdi->dev is NULL.
      
      This can happen because even though bdi_unregister() cancels all pending
      flushing work, nothing really prevents new ones from being queued from
      balance_dirty_pages() or other places.
      
      Fix the problem by clearing BDI_registered bit in bdi_unregister() and
      checking it before scheduling of any flushing work.
      
      Fixes: 839a8e86Reviewed-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Derek Basehore <dbasehore@chromium.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5acda9d1
    • D
      backing_dev: fix hung task on sync · 6ca738d6
      Derek Basehore 提交于
      bdi_wakeup_thread_delayed() used the mod_delayed_work() function to
      schedule work to writeback dirty inodes.  The problem with this is that
      it can delay work that is scheduled for immediate execution, such as the
      work from sync_inodes_sb().  This can happen since mod_delayed_work()
      can now steal work from a work_queue.  This fixes the problem by using
      queue_delayed_work() instead.  This is a regression caused by commit
      839a8e86 ("writeback: replace custom worker pool implementation with
      unbound workqueue").
      
      The reason that this causes a problem is that laptop-mode will change
      the delay, dirty_writeback_centisecs, to 60000 (10 minutes) by default.
      In the case that bdi_wakeup_thread_delayed() races with
      sync_inodes_sb(), sync will be stopped for 10 minutes and trigger a hung
      task.  Even if dirty_writeback_centisecs is not long enough to cause a
      hung task, we still don't want to delay sync for that long.
      
      We fix the problem by using queue_delayed_work() when we want to
      schedule writeback sometime in future.  This function doesn't change the
      timer if it is already armed.
      
      For the same reason, we also change bdi_writeback_workfn() to
      immediately queue the work again in the case that the work_list is not
      empty.  The same problem can happen if the sync work is run on the
      rescue worker.
      
      [jack@suse.cz: update changelog, add comment, use bdi_wakeup_thread_delayed()]
      Signed-off-by: NDerek Basehore <dbasehore@chromium.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zento.linux.org.uk>
      Reviewed-by: NTejun Heo <tj@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: Derek Basehore <dbasehore@chromium.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Benson Leung <bleung@chromium.org>
      Cc: Sonny Rao <sonnyrao@chromium.org>
      Cc: Luigi Semenzato <semenzato@chromium.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ca738d6
  2. 31 3月, 2014 5 次提交
  3. 29 3月, 2014 1 次提交
    • S
      ocfs2: check if cluster name exists before deref · d9060742
      Sasha Levin 提交于
      Commit c74a3bdd ("ocfs2: add clustername to cluster connection") is
      trying to strlcpy a string which was explicitly passed as NULL in the
      very same patch, triggering a NULL ptr deref.
      
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: strlcpy (lib/string.c:388 lib/string.c:151)
        CPU: 19 PID: 19426 Comm: trinity-c19 Tainted: G        W     3.14.0-rc7-next-20140325-sasha-00014-g9476368-dirty #274
        RIP:  strlcpy (lib/string.c:388 lib/string.c:151)
        Call Trace:
         ocfs2_cluster_connect (fs/ocfs2/stackglue.c:350)
         ocfs2_cluster_connect_agnostic (fs/ocfs2/stackglue.c:396)
         user_dlm_register (fs/ocfs2/dlmfs/userdlm.c:679)
         dlmfs_mkdir (fs/ocfs2/dlmfs/dlmfs.c:503)
         vfs_mkdir (fs/namei.c:3467)
         SyS_mkdirat (fs/namei.c:3488 fs/namei.c:3472)
         tracesys (arch/x86/kernel/entry_64.S:749)
      
      akpm: this patch probably disables the feature.  A temporary thing to
      avoid triviel oopses.
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
      Cc: Mark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d9060742
  4. 28 3月, 2014 1 次提交
  5. 26 3月, 2014 3 次提交
  6. 24 3月, 2014 1 次提交
  7. 23 3月, 2014 4 次提交
    • A
      rcuwalk: recheck mount_lock after mountpoint crossing attempts · b37199e6
      Al Viro 提交于
      We can get false negative from __lookup_mnt() if an unrelated vfsmount
      gets moved.  In that case legitimize_mnt() is guaranteed to fail,
      and we will fall back to non-RCU walk... unless we end up running
      into a hard error on a filesystem object we wouldn't have reached
      if not for that false negative.  IOW, delaying that check until
      the end of pathname resolution is wrong - we should recheck right
      after we attempt to cross the mountpoint.  We don't need to recheck
      unless we see d_mountpoint() being true - in that case even if
      we have just raced with mount/umount, we can simply go on as if
      we'd come at the moment when the sucker wasn't a mountpoint; if we
      run into a hard error as the result, it was a legitimate outcome.
      __lookup_mnt() returning NULL is different in that respect, since
      it might've happened due to operation on completely unrelated
      mountpoint.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b37199e6
    • A
      make prepend_name() work correctly when called with negative *buflen · e825196d
      Al Viro 提交于
      In all callchains leading to prepend_name(), the value left in *buflen
      is eventually discarded unused if prepend_name() has returned a negative.
      So we are free to do what prepend() does, and subtract from *buflen
      *before* checking for underflow (which turns into checking the sign
      of subtraction result, of course).
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e825196d
    • E
      vfs: Don't let __fdget_pos() get FMODE_PATH files · 99aea681
      Eric Biggers 提交于
      Commit bd2a31d5 ("get rid of fget_light()") introduced the
      __fdget_pos() function, which returns the resulting file pointer and
      fdput flags combined in an 'unsigned long'.  However, it also changed the
      behavior to return files with FMODE_PATH set, which shouldn't happen
      because read(), write(), lseek(), etc. aren't allowed on such files.
      This commit restores the old behavior.
      
      This regression actually had no effect on read() and write() since
      FMODE_READ and FMODE_WRITE are not set on file descriptors opened with
      O_PATH, but it did cause lseek() on a file descriptor opened with O_PATH
      to fail with ESPIPE rather than EBADF.
      Signed-off-by: NEric Biggers <ebiggers3@gmail.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      99aea681
    • E
      vfs: atomic f_pos access in llseek() · d7a15f8d
      Eric Biggers 提交于
      Commit 9c225f26 ("vfs: atomic f_pos accesses as per POSIX") changed
      several system calls to use fdget_pos() instead of fdget(), but missed
      sys_llseek().  Fix it.
      Signed-off-by: NEric Biggers <ebiggers3@gmail.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d7a15f8d
  8. 13 3月, 2014 1 次提交
    • F
      cputime: Default implementation of nsecs -> cputime conversion · bfc3f028
      Frederic Weisbecker 提交于
      The architectures that override cputime_t (s390, ppc) don't provide
      any version of nsecs_to_cputime(). Indeed this cputime_t implementation
      by backend only happens when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y under
      which the core code doesn't make any use of nsecs_to_cputime().
      
      At least for now.
      
      We are going to make a broader use of it so lets provide a default
      version with a per usecs granularity. It should be good enough for most
      usecases.
      
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      bfc3f028
  9. 12 3月, 2014 1 次提交
    • G
      of: remove /proc/device-tree · 8357041a
      Grant Likely 提交于
      The same data is now available in sysfs, so we can remove the code
      that exports it in /proc and replace it with a symlink to the sysfs
      version.
      
      Tested on versatile qemu model and mpc5200 eval board. More testing
      would be appreciated.
      
      v5: Fixed up conflicts with mainline changes
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
      8357041a
  10. 11 3月, 2014 2 次提交
  11. 10 3月, 2014 3 次提交
    • A
      get rid of fget_light() · bd2a31d5
      Al Viro 提交于
      instead of returning the flags by reference, we can just have the
      low-level primitive return those in lower bits of unsigned long,
      with struct file * derived from the rest.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bd2a31d5
    • L
      vfs: atomic f_pos accesses as per POSIX · 9c225f26
      Linus Torvalds 提交于
      Our write() system call has always been atomic in the sense that you get
      the expected thread-safe contiguous write, but we haven't actually
      guaranteed that concurrent writes are serialized wrt f_pos accesses, so
      threads (or processes) that share a file descriptor and use "write()"
      concurrently would quite likely overwrite each others data.
      
      This violates POSIX.1-2008/SUSv4 Section XSI 2.9.7 that says:
      
       "2.9.7 Thread Interactions with Regular File Operations
      
        All of the following functions shall be atomic with respect to each
        other in the effects specified in POSIX.1-2008 when they operate on
        regular files or symbolic links: [...]"
      
      and one of the effects is the file position update.
      
      This unprotected file position behavior is not new behavior, and nobody
      has ever cared.  Until now.  Yongzhi Pan reported unexpected behavior to
      Michael Kerrisk that was due to this.
      
      This resolves the issue with a f_pos-specific lock that is taken by
      read/write/lseek on file descriptors that may be shared across threads
      or processes.
      Reported-by: NYongzhi Pan <panyongzhi@gmail.com>
      Reported-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9c225f26
    • A
      ocfs2 syncs the wrong range... · 1b56e989
      Al Viro 提交于
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1b56e989
  12. 09 3月, 2014 1 次提交
    • T
      kernfs: cache atomic_write_len in kernfs_open_file · b7ce40cf
      Tejun Heo 提交于
      While implementing atomic_write_len, 4d3773c4 ("kernfs: implement
      kernfs_ops->atomic_write_len") moved data copy from userland inside
      kernfs_get_active() and kernfs_open_file->mutex so that
      kernfs_ops->atomic_write_len can be accessed before copying buffer
      from userland; unfortunately, this could lead to locking order
      inversion involving mmap_sem if copy_from_user() takes a page fault.
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26 Tainted: G        W
        -------------------------------------------------------
        trinity-c236/10658 is trying to acquire lock:
         (&of->mutex#2){+.+.+.}, at: [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
      
        but task is already holding lock:
         (&mm->mmap_sem){++++++}, at: [<mm/util.c:397>] vm_mmap_pgoff+0x6e/0xe0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
       -> #1 (&mm->mmap_sem){++++++}:
      	 [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
      	 [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
      	 [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
      	 [<mm/memory.c:4188>] might_fault+0x7e/0xb0
      	 [<arch/x86/include/asm/uaccess.h:713 fs/kernfs/file.c:291>] kernfs_fop_write+0xd8/0x190
      	 [<fs/read_write.c:473>] vfs_write+0xe3/0x1d0
      	 [<fs/read_write.c:523 fs/read_write.c:515>] SyS_write+0x5d/0xa0
      	 [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2
      
       -> #0 (&of->mutex#2){+.+.+.}:
      	 [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560
      	 [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
      	 [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
      	 [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
      	 [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:571>] mutex_lock_nested+0x6a/0x510
      	 [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
      	 [<mm/mmap.c:1573>] mmap_region+0x310/0x5c0
      	 [<mm/mmap.c:1365>] do_mmap_pgoff+0x385/0x430
      	 [<mm/util.c:399>] vm_mmap_pgoff+0x8f/0xe0
      	 [<mm/mmap.c:1416 mm/mmap.c:1374>] SyS_mmap_pgoff+0x1b0/0x210
      	 [<arch/x86/kernel/sys_x86_64.c:72>] SyS_mmap+0x1d/0x20
      	 [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2
      
        other info that might help us debug this:
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(&mm->mmap_sem);
      				 lock(&of->mutex#2);
      				 lock(&mm->mmap_sem);
          lock(&of->mutex#2);
      
         *** DEADLOCK ***
      
        1 lock held by trinity-c236/10658:
         #0:  (&mm->mmap_sem){++++++}, at: [<mm/util.c:397>] vm_mmap_pgoff+0x6e/0xe0
      
        stack backtrace:
        CPU: 2 PID: 10658 Comm: trinity-c236 Tainted: G        W 3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26
         0000000000000000 ffff88011911fa48 ffffffff8438e945 0000000000000000
         0000000000000000 ffff88011911fa98 ffffffff811a0109 ffff88011911fab8
         ffff88011911fab8 ffff88011911fa98 ffff880119128cc0 ffff880119128cf8
        Call Trace:
         [<lib/dump_stack.c:52>] dump_stack+0x52/0x7f
         [<kernel/locking/lockdep.c:1213>] print_circular_bug+0x129/0x160
         [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560
         [<include/linux/spinlock.h:343 mm/slub.c:1933>] ? deactivate_slab+0x511/0x550
         [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
         [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
         [<mm/mmap.c:1552>] ? mmap_region+0x24a/0x5c0
         [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
         [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
         [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:571>] mutex_lock_nested+0x6a/0x510
         [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
         [<kernel/sched/core.c:2477>] ? get_parent_ip+0x11/0x50
         [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
         [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
         [<mm/mmap.c:1573>] mmap_region+0x310/0x5c0
         [<mm/mmap.c:1365>] do_mmap_pgoff+0x385/0x430
         [<mm/util.c:397>] ? vm_mmap_pgoff+0x6e/0xe0
         [<mm/util.c:399>] vm_mmap_pgoff+0x8f/0xe0
         [<kernel/rcu/update.c:97>] ? __rcu_read_unlock+0x44/0xb0
         [<fs/file.c:641>] ? dup_fd+0x3c0/0x3c0
         [<mm/mmap.c:1416 mm/mmap.c:1374>] SyS_mmap_pgoff+0x1b0/0x210
         [<arch/x86/kernel/sys_x86_64.c:72>] SyS_mmap+0x1d/0x20
         [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2
      
      Fix it by caching atomic_write_len in kernfs_open_file during open so
      that it can be determined without accessing kernfs_ops in
      kernfs_fop_write().  This restores the structure of kernfs_fop_write()
      before 4d3773c4 with updated @len determination logic.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      References: http://lkml.kernel.org/g/53113485.2090407@oracle.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7ce40cf