提交 · 466e68c4306317e7d239e3100f612d403e3e2c3c · openeuler / raspberrypi-kernel

04 4月, 2014 40 次提交

ocfs2: __ocfs2_mknod_locked should return error when ocfs2_create_new_inode_locks() failed · 466e68c4

由 Xue jiufei 提交于 4月 03, 2014

When ocfs2_create_new_inode_locks() return error, inode open lock may
not be obtainted for this inode.  So other nodes can remove this file
and free dinode when inode still remain in memory on this node, which is
not correct and may trigger BUG.  So __ocfs2_mknod_locked should return
error when ocfs2_create_new_inode_locks() failed.

              Node_1                              Node_2
create fileA, call ocfs2_mknod()
  -> ocfs2_get_init_inode(), allocate inodeA
  -> ocfs2_claim_new_inode(), claim dinode(dinodeA)
  -> call ocfs2_create_new_inode_locks(),
     create open lock failed, return error
  -> __ocfs2_mknod_locked return success

                                                unlink fileA
                                                try open lock succeed,
                                                and free dinodeA

create another file, call ocfs2_mknod()
  -> ocfs2_get_init_inode(), allocate inodeB
  -> ocfs2_claim_new_inode(), as Node_2 had freed dinodeA,
     so claim dinodeA and update generation for dinodeA

call __ocfs2_drop_dl_inodes()->ocfs2_delete_inode()
to free inodeA, and finally triggers BUG
on(inode->i_generation != le32_to_cpu(fe->i_generation))
in function ocfs2_inode_lock_update().
Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

466e68c4

ocfs2: allow for more than one data extent when creating xattr · 3ed2be71

由 Tariq Saeed 提交于 4月 03, 2014

Orabug: 18108070

ocfs2_xattr_extend_allocation() hits panic when creating xattr during
data extent alloc phase.  The problem occurs if due to local alloc
fragmentation, clusters are spread over multiple extents.  In this case
ocfs2_add_clusters_in_btree() finds no space to store more than one
extent record and therefore fails returning RESTART_META.  The situation
is anticipated for xattr update case but not xattr create case.  This
fix simply ports that code to create case.
Signed-off-by: NTariq Saeed <tariq.x.saeed@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ed2be71

ocfs2: fix deadlock risk when kmalloc failed in dlm_query_region_handler · a35ad97c

由 Zhonghua Guo 提交于 4月 03, 2014

In dlm_query_region_handler(), once kmalloc failed, it will unlock
dlm_domain_lock without lock first, then deadlock happens.
Signed-off-by: NZhonghua Guo <guozhonghua@h3c.com>
Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Tested-by: NJoseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a35ad97c

ocfs2: llseek requires ocfs2 inode lock for the file in SEEK_END · c8d888d9

由 Jensen 提交于 4月 03, 2014

llseek requires ocfs2 inode lock for updating the file size in SEEK_END.
because the file size maybe update on another node.

This bug can be reproduce the following scenario: at first, we dd a test
fileA, the file size is 10k.

on NodeA:
---------
 1) open the test fileA, lseek the end of file. and print the position.
 2) close the test fileA

on NodeB:
 1) open the test fileA, append the 5k data to test FileA.
 2) lseek the end of file. and print the position.
 3) close file.

At first we run the test program1 on NodeA , the result is 10k.  And
then run the test program2 on NodeB, the result is 15k.  At last, we run
the test program1 on NodeA again, the result is 10k.

After applying this patch the three step result is 15k.

test result: 1000000 times lseek call;
index        lseek with inode lock (unit:us)                lseek without inode lock (unit:us)
  1                   1168162                                    555383
  2                   1168011                                    549504
  3                   1170538                                    549396
  4                   1170375                                    551685
  5                   1170444                                    556719
  6                   1174364                                    555307
  7                   1163294                                    551552
  8                   1170080                                    549350
  9                   1162464                                    553700
 10                   1165441                                    552594
 avg                  1168317                                    552519

avg with lock - avg without lock = 615798
(avg with lock - avg without lock)/1000000=0.615798 us
Signed-off-by: NJensen <shencanquan@huawei.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Acked-by: NJoel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c8d888d9

ocfs2: fix type conversion risk when get cluster attributes · 41b63efb

由 Joseph Qi 提交于 4月 03, 2014

In o2nm_cluster, cl_idle_timeout_ms, cl_keepalive_delay_ms, as well as
cl_reconnect_delay_ms, are defined as type of unsigned int.  So we
should also use unsigned int in the helper functions.
Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41b63efb

ocfs2: revert iput deferring code in ocfs2_drop_dentry_lock · 8ed6b237

由 Goldwyn Rodrigues 提交于 4月 03, 2014

The following patches are reverted in this patch because these patches
caused performance regression in the remote unlink() calls.

  ea455f8a - ocfs2: Push out dropping of dentry lock to ocfs2_wq
  f7b1aa69 - ocfs2: Fix deadlock on umount
  5fd13189 - ocfs2: Don't oops in ocfs2_kill_sb on a failed mount

Previous patches in this series removed the possible deadlocks from
downconvert thread so the above patches shouldn't be needed anymore.

The regression is caused because these patches delay the iput() in case
of dentry unlocks.  This also delays the unlocking of the open lockres.
The open lockresource is required to test if the inode can be wiped from
disk or not.  When the deleting node does not get the open lock, it
marks it as orphan (even though it is not in use by another
node/process) and causes a journal checkpoint.  This delays operations
following the inode eviction.  This also moves the inode to the orphaned
inode which further causes more I/O and a lot of unneccessary orphans.

The following script can be used to generate the load causing issues:

  declare -a create
  declare -a remove
  declare -a iterations=(1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384)
  unique="`mktemp -u XXXXX`"
  script="/tmp/idontknow-${unique}.sh"
  cat <<EOF > "${script}"
  for n in {1..8}; do mkdir -p test/dir\${n}
    eval touch test/dir\${n}/foo{1.."\$1"}
  done
  EOF
  chmod 700 "${script}"

  function fcreate ()
  {
    exec 2>&1 /usr/bin/time --format=%E "${script}" "$1"
  }

  function fremove ()
  {
    exec 2>&1 /usr/bin/time --format=%E ssh node2 "cd `pwd`; rm -Rf test*"
  }

  function fcp ()
  {
    exec 2>&1 /usr/bin/time --format=%E ssh node3 "cd `pwd`; cp -R test test.new"
  }

  echo -------------------------------------------------
  echo "| # files | create #s | copy #s | remove #s |"
  echo -------------------------------------------------
  for ((x=0; x < ${#iterations[*]} ; x++)) do
    create[$x]="`fcreate ${iterations[$x]}`"
    copy[$x]="`fcp ${iterations[$x]}`"
    remove[$x]="`fremove`"
    printf "| %8d | %9s | %9s | %9s |\n" ${iterations[$x]} ${create[$x]} ${copy[$x]} ${remove[$x]}
  done
  rm "${script}"
  echo "------------------------"
Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ed6b237

ocfs2: avoid blocking in ocfs2_mark_lockres_freeing() in downconvert thread · 84d86f83

由 Jan Kara 提交于 4月 03, 2014

If we are dropping last inode reference from downconvert thread, we will
end up calling ocfs2_mark_lockres_freeing() which can block if the lock
we are freeing is queued thus creating an A-A deadlock.  Luckily, since
we are the downconvert thread, we can immediately dequeue the lock and
thus avoid waiting in this case.
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMark Fasheh <mfasheh@suse.de>
Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

84d86f83

ocfs2: implement delayed dropping of last dquot reference · e3a767b6

由 Jan Kara 提交于 4月 03, 2014

We cannot drop last dquot reference from downconvert thread as that
creates the following deadlock:

NODE 1                                  NODE2
holds dentry lock for 'foo'
holds inode lock for GLOBAL_BITMAP_SYSTEM_INODE
                                        dquot_initialize(bar)
                                          ocfs2_dquot_acquire()
                                            ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
                                            ...
downconvert thread (triggered from another
node or a different process from NODE2)
  ocfs2_dentry_post_unlock()
    ...
    iput(foo)
      ocfs2_evict_inode(foo)
        ocfs2_clear_inode(foo)
          dquot_drop(inode)
            ...
	    ocfs2_dquot_release()
              ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
               - blocks
                                            finds we need more space in
                                            quota file
                                            ...
                                            ocfs2_extend_no_holes()
                                              ocfs2_inode_lock(GLOBAL_BITMAP_SYSTEM_INODE)
                                                - deadlocks waiting for
                                                  downconvert thread

We solve the problem by postponing dropping of the last dquot reference to
a workqueue if it happens from the downconvert thread.
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMark Fasheh <mfasheh@suse.de>
Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e3a767b6

quota: provide function to grab quota structure reference · 9f985cb6

由 Jan Kara 提交于 4月 03, 2014

Provide dqgrab() function to get quota structure reference when we are
sure it already has at least one active reference.  Make use of this
function inside quota code.
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMark Fasheh <mfasheh@suse.de>
Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9f985cb6

ocfs2: move dquot_initialize() in ocfs2_delete_inode() somewhat later · bd62ad7a

由 Jan Kara 提交于 4月 03, 2014

Move dquot_initalize() call in ocfs2_delete_inode() after the moment we
verify inode is actually a sane one to delete.  We certainly don't want
to initialize quota for system inodes etc.  This also avoids calling
into quota code from downconvert thread.

Add more details into the comment why bailing out from
ocfs2_delete_inode() when we are in downconvert thread is OK.
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMark Fasheh <mfasheh@suse.de>
Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bd62ad7a

ocfs2: remove OCFS2_INODE_SKIP_DELETE flag · 7bf619c1

由 Jan Kara 提交于 4月 03, 2014

The flag was never set, delete it.
Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMark Fasheh <mfasheh@suse.de>
Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7bf619c1

ocfs2: add dlm_recover_callback_support in sysfs · 765aabbb

由 Goldwyn Rodrigues 提交于 4月 03, 2014

This is a part of the nocontrold feature which was incorporated sometime
back.

This is required for backward compatibility of the tools, specifically
the scenario where the tools with recovery callback is used with a
kernel not using the recovery callbacks (older kernel + newer tools).
The tools look for this file to understand if the kernel supports DLM
recovery callbacks.

For kernels which support recovery callbacks but will miss this patch,
ocfs2 will continue to use the older API and would still be able to
mount the filesystem.

[akpm@linux-foundation.org: simplify]
[sfr@canb.auug.org.au: VERIFY_OCTAL_PERMISSIONS fix up]
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

765aabbb

ocfs2: dlm: fix recovery hung · ded2cf71

由 Junxiao Bi 提交于 4月 03, 2014

There is a race window in dlm_do_recovery() between dlm_remaster_locks()
and dlm_reset_recovery() when the recovery master nearly finish the
recovery process for a dead node.  After the master sends FINALIZE_RECO
message in dlm_remaster_locks(), another node may become the recovery
master for another dead node, and then send the BEGIN_RECO message to
all the nodes included the old master, in the handler of this message
dlm_begin_reco_handler() of old master, dlm->reco.dead_node and
dlm->reco.new_master will be set to the second dead node and the new
master, then in dlm_reset_recovery(), these two variables will be reset
to default value.  This will cause new recovery master can not finish
the recovery process and hung, at last the whole cluster will hung for
recovery.

old recovery master:                                 new recovery master:
dlm_remaster_locks()
                                                  become recovery master for
                                                  another dead node.
                                                  dlm_send_begin_reco_message()
dlm_begin_reco_handler()
{
 if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) {
  return -EAGAIN;
 }
 dlm_set_reco_master(dlm, br->node_idx);
 dlm_set_reco_dead_node(dlm, br->dead_node);
}
dlm_reset_recovery()
{
 dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM);
 dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM);
}
                                                  will hang in dlm_remaster_locks() for
                                                  request dlm locks info

Before send FINALIZE_RECO message, recovery master should set
DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done,
this can break the race windows as the BEGIN_RECO messages will not be
handled before DLM_RECO_STATE_FINALIZE flag is cleared.

A similar race may happen between new recovery master and normal node
which is in dlm_finalize_reco_handler(), also fix it.
Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Reviewed-by: NWengang Wang <wen.gang.wang@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ded2cf71

ocfs2: dlm: fix lock migration crash · 34aa8dac

由 Junxiao Bi 提交于 4月 03, 2014

This issue was introduced by commit 800deef3 ("ocfs2: use
list_for_each_entry where benefical") in 2007 where it replaced
list_for_each with list_for_each_entry.  The variable "lock" will point
to invalid data if "tmpq" list is empty and a panic will be triggered
due to this.  Sunil advised reverting it back, but the old version was
also not right.  At the end of the outer for loop, that
list_for_each_entry will also set "lock" to an invalid data, then in the
next loop, if the "tmpq" list is empty, "lock" will be an stale invalid
data and cause the panic.  So reverting the list_for_each back and reset
"lock" to NULL to fix this issue.

Another concern is that this seemes can not happen because the "tmpq"
list should not be empty.  Let me describe how.

old lock resource owner(node 1):                                  migratation target(node 2):
image there's lockres with a EX lock from node 2 in
granted list, a NR lock from node x with convert_type
EX in converting list.
dlm_empty_lockres() {
 dlm_pick_migration_target() {
   pick node 2 as target as its lock is the first one
   in granted list.
 }
 dlm_migrate_lockres() {
   dlm_mark_lockres_migrating() {
     res->state |= DLM_LOCK_RES_BLOCK_DIRTY;
     wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
	 //after the above code, we can not dirty lockres any more,
     // so dlm_thread shuffle list will not run
                                                                   downconvert lock from EX to NR
                                                                   upconvert lock from NR to EX
<<< migration may schedule out here, then
<<< node 2 send down convert request to convert type from EX to
<<< NR, then send up convert request to convert type from NR to
<<< EX, at this time, lockres granted list is empty, and two locks
<<< in the converting list, node x up convert lock followed by
<<< node 2 up convert lock.

	 // will set lockres RES_MIGRATING flag, the following
	 // lock/unlock can not run
     dlm_lockres_release_ast(dlm, res);
   }

   dlm_send_one_lockres()
                                                                 dlm_process_recovery_data()
                                                                   for (i=0; i<mres->num_locks; i++)
                                                                     if (ml->node == dlm->node_num)
                                                                       for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) {
                                                                        list_for_each_entry(lock, tmpq, list)
                                                                        if (lock) break; <<< lock is invalid as grant list is empty.
                                                                       }
                                                                       if (lock->ml.node != ml->node)
                                                                         BUG() >>> crash here
 }

I see the above locks status from a vmcore of our internal bug.
Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: NWengang Wang <wen.gang.wang@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

34aa8dac

ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file · 2931cdcb

由 Darrick J. Wong 提交于 4月 03, 2014

Currently, ocfs2_sync_file grabs i_mutex and forces the current journal
transaction to complete. This isn't terribly efficient, since sync_file
really only needs to wait for the last transaction involving that inode
to complete, and this doesn't require i_mutex.

Therefore, implement the necessary bits to track the newest tid
associated with an inode, and teach sync_file to wait for that instead
of waiting for everything in the journal to commit. Furthermore, only
issue the flush request to the drive if jbd2 hasn't already done so.

This also eliminates the deadlock between ocfs2_file_aio_write() and
ocfs2_sync_file(). aio_write takes i_mutex then calls
ocfs2_aiodio_wait() to wait for unaligned dio writes to finish.
However, if that dio completion involves calling fsync, then we can get
into trouble when some ocfs2_sync_file tries to take i_mutex.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NMark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2931cdcb

ocfs2: remove unused variable uuid_net_key in ocfs2_initialize_super · a75fe48c

由 joyce.xue 提交于 4月 03, 2014

Variable uuid_net_key in ocfs2_initialize_super() is not used.  Clean it
up.
Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
Acked-by: NMark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a75fe48c

ocfs2: change ip_unaligned_aio to of type mutex from atomit_t · c18ceab0

由 Wengang Wang 提交于 4月 03, 2014

There is a problem that waitqueue_active() may check stale data thus miss
a wakeup of threads waiting on ip_unaligned_aio.

The valid value of ip_unaligned_aio is only 0 and 1 so we can change it to
be of type mutex thus the above prolem is avoid.  Another benifit is that
mutex which works as FIFO is fairer than wake_up_all().
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c18ceab0

ocfs2: fix null pointer dereference when access dlm_state before launching dlm thread · 181a9a04

由 Zongxun Wang 提交于 4月 03, 2014

When mounting an ocfs2 volume, it will firstly generate a file
/sys/kernel/debug/o2dlm/<uuid>/dlm_state, and then launch the dlm thread.
So the following situation will cause a null pointer dereference.
dlm_debug_init -> access file dlm_state which will call dlm_state_print ->
dlm_launch_thread

Move dlm_debug_init after dlm_launch_thread and dlm_launch_recovery_thread
can fix this issue.
Signed-off-by: NZongxun Wang <wangzongxun@huawei.com>
Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

181a9a04

arch/sh/drivers/pci/pcie-sh7786.h: remove duplicate SH4A_PCIEPHYCTLR · 0c3d1d62

由 Geert Uytterhoeven 提交于 4月 03, 2014

Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0c3d1d62

sh: sh7757: switch RSPI clock to dev ID match · ba6e8b8f

由 Geert Uytterhoeven 提交于 4月 03, 2014

Switch the RSPI MSTP clock on SH7757 from a con ID match to a dev ID
match, so we can start looking it up using clk_get() with a NULL ID.
Signed-off-by: NGeert Uytterhoeven <geert+renesas@linux-m68k.org>
Tested-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ba6e8b8f

arch/sh/boards/board-sh7757lcr.c: fixup SDHI register size · f0767e89

由 Kuninori Morimoto 提交于 4月 03, 2014

sh7757lcr SDHI register size is 0x100
Signed-off-by: NKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f0767e89

sh: don't pass saved userspace state to exception handlers · a3c19514

由 Bobby Bingham 提交于 4月 03, 2014

The compiler is permitted to generate code which overwrites the
parameters to a function.  If those parameters include the only saved
copy we have of userspace's registers, we're in trouble.
Signed-off-by: NBobby Bingham <koorogi@koorogi.info>
Cc: Paul Mundt <paul.mundt@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a3c19514

sh: remove unused do_fpu_error · 7caf62de

由 Bobby Bingham 提交于 4月 03, 2014

This does not appear to have been used since commit 74d99a5e ("sh:
SH-2A FPU support") in 2007.
Signed-off-by: NBobby Bingham <koorogi@koorogi.info>
Cc: Paul Mundt <paul.mundt@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7caf62de

sh: push extra copy of r0-r2 for syscall parameters · abafe5d9

由 Bobby Bingham 提交于 4月 03, 2014

When invoking syscall handlers on sh32, the saved userspace registers
are at the top of the stack.  This seems to have been intentional, as it
is an easy way to pass r0, r1, ...  to the handler as parameters 5, 6,
...

It causes problems, however, because the compiler is allowed to generate
code for a function which clobbers that function's own parameters.  For
example, gcc generates the following code for clone:

    <SyS_clone>:
        mov.l   8c020714 <SyS_clone+0xc>,r1  ! 8c020540 <do_fork>
        mov.l   r7,@r15
        mov     r6,r7
        jmp     @r1
        mov     #0,r6
        nop
        .word 0x0540
        .word 0x8c02

The `mov.l r7,@r15` clobbers the saved value of r0 passed from
userspace.  For most system calls, this might not be a problem, because
we'll be overwriting r0 with the return value anyway.  But in the case
of clone, copy_thread will need the original value of r0 if the
CLONE_SETTLS flag was specified.

The first patch in this series fixes this issue for system calls by
pushing to the stack and extra copy of r0-r2 before invoking the
handler.  We discard this copy before restoring the userspace registers,
so it is not a problem if they are clobbered.

Exception handlers also receive the userspace register values in a
similar manner, and may hit the same problem.  The second patch removes
the do_fpu_error handler, which looks susceptible to this problem and
which, as far as I can tell, has not been used in some time.  The third
patch addresses other exception handlers.

This patch (of 3):

The userspace registers are stored at the top of the stack when the
syscall handler is invoked, which allows r0-r2 to act as parameters 5-7.
Parameters passed on the stack may be clobbered by the syscall handler.
The solution is to push an extra copy of the registers which might be
used as syscall parameters to the stack, so that the authoritative set
of saved register values does not get clobbered.

A few system call handlers are also updated to get the userspace
registers using current_pt_regs() instead of from the stack.
Signed-off-by: NBobby Bingham <koorogi@koorogi.info>
Cc: Paul Mundt <paul.mundt@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

abafe5d9

score: remove unused CPU_SCORE7 Kconfig parameter · d0df04f7

由 Michael Opdenacker 提交于 4月 03, 2014

This removes the CPU_SCORE7 Kconfig parameter, which is no longer used
anywhere in the source code and Makefiles.
Signed-off-by: NMichael Opdenacker <michael.opdenacker@free-electrons.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d0df04f7

genksyms: fix typeof() handling · dc533240

由 Jan Beulich 提交于 4月 03, 2014

Recent increased use of typeof() throughout the tree resulted in a
number of symbols (25 in a typical distro config of ours) not getting a
proper CRC calculated for them anymore, due to the parser in genksyms
not coping with several of these uses (interestingly in the majority of
[if not all] cases the problem is due to the use of typeof() in code
preceding a certain export, not in the declaration/definition of the
exported function/object itself; I wasn't able to find a way to address
this more general parser shortcoming).

The use of parameter_declaration is a little more relaxed than would be
ideal (permitting not just a bare type specification, but also one with
identifier), but since the same code is being passed through an actual
compiler, there's no apparent risk of allowing through any broken code.

Otoh using parameter_declaration instead of the ad hoc
"decl_specifier_seq '*'" / "decl_specifier_seq" pair allows all types to
be handled rather than just plain ones and pointers to plain ones.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dc533240

fanotify: move unrelated handling from copy_event_to_user() · d507816b

由 Jan Kara 提交于 4月 03, 2014

Move code moving event structure to access_list from copy_event_to_user()
to fanotify_read() where it is more logical (so that we can immediately
see in the main loop that we either move the event to a different list
or free it).  Also move special error handling for permission events
from copy_event_to_user() to the main loop to have it in one place with
error handling for normal events.  This makes copy_event_to_user()
really only copy the event to user without any side effects.
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d507816b

fanotify: reorganize loop in fanotify_read() · d8aaab4f

由 Jan Kara 提交于 4月 03, 2014

Swap the error / "read ok" branches in the main loop of fanotify_read().
We will grow the "read ok" part in the next patch and this makes the
indentation easier.  Also it is more common to have error conditions
inside an 'if' instead of the fast path.
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d8aaab4f

fanotify: convert access_mutex to spinlock · 9573f793

由 Jan Kara 提交于 4月 03, 2014

access_mutex is used only to guard operations on access_list.  There's
no need for sleeping within this lock so just make a spinlock out of it.
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9573f793

fanotify: use fanotify event structure for permission response processing · f083441b

由 Jan Kara 提交于 4月 03, 2014

Currently, fanotify creates new structure to track the fact that
permission event has been reported to userspace and someone is waiting
for a response to it.  As event structures are now completely in the
hands of each notification framework, we can use the event structure for
this tracking instead of allocating a new structure.

Since this makes the event structures for normal events and permission
events even more different and the structures have different lifetime
rules, we split them into two separate structures (where permission
event structure contains the structure for a normal event).  This makes
normal events 8 bytes smaller and the code a tad bit cleaner.

[akpm@linux-foundation.org: fix build]
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f083441b

fanotify: remove useless bypass_perm check · 3298cf37

由 Jan Kara 提交于 4月 03, 2014

The prepare_for_access_response() function checks whether
group->fanotify_data.bypass_perm is set.  However this test can never be
true because prepare_for_access_response() is called only from
fanotify_read() which means fanotify group is alive with an active fd
while bypass_perm is set from fanotify_release() when all file
descriptors pointing to the group are closed and the group is going
away.
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3298cf37

fs/freevxfs/vxfs_lookup.c: update function comment · ddae82d8

由 Fabian Frederick 提交于 4月 03, 2014

nameidata was replaced by flags in commit 00cd8dd3 ("stop passing
nameidata to ->lookup()").
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ddae82d8

fs/cifs/cifsfs.c: add __init to cifs_init_inodecache() · 9ee108b2

由 Fabian Frederick 提交于 4月 03, 2014

cifs_init_inodecache is only called by __init init_cifs.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9ee108b2

kmemleak: change some global variables to int · 8910ae89

由 Li Zefan 提交于 4月 03, 2014

They don't have to be atomic_t, because they are simple boolean toggles.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8910ae89

kmemleak: remove redundant code · 5f3bf19a

由 Li Zefan 提交于 4月 03, 2014

Remove kmemleak_padding() and kmemleak_release().
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5f3bf19a

kmemleak: allow freeing internal objects after kmemleak was disabled · c89da70c

由 Li Zefan 提交于 4月 03, 2014

Currently if kmemleak is disabled, the kmemleak objects can never be
freed, no matter if it's disabled by a user or due to fatal errors.

Those objects can be a big waste of memory.

    OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME
  1200264 1197433  99%    0.30K  46164       26    369312K kmemleak_object

With this patch, after kmemleak was disabled you can reclaim memory
with:

	# echo clear > /sys/kernel/debug/kmemleak

Also inform users about this with a printk.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c89da70c

kmemleak: free internal objects only if there're no leaks to be reported · dc9b3f42

由 Li Zefan 提交于 4月 03, 2014

Currently if you stop kmemleak thread before disabling kmemleak,
kmemleak objects will be freed and so you won't be able to check
previously reported leaks.

With this patch, kmemleak objects won't be freed if there're leaks that
can be reported.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dc9b3f42

kthread: ensure locality of task_struct allocations · 81c98869

由 Nishanth Aravamudan 提交于 4月 03, 2014

In the presence of memoryless nodes, numa_node_id() will return the
current CPU's NUMA node, but that may not be where we expect to allocate
from memory from.  Instead, we should rely on the fallback code in the
memory allocator itself, by using NUMA_NO_NODE.  Also, when calling
kthread_create_on_node(), use the nearest node with memory to the cpu in
question, rather than the node it is running on.
Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
Reviewed-by: NChristoph Lameter <cl@linux.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Ben Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

81c98869

bdi: avoid oops on device removal · 5acda9d1

由 Jan Kara 提交于 4月 03, 2014

After commit 839a8e86 ("writeback: replace custom worker pool
implementation with unbound workqueue") when device is removed while we
are writing to it we crash in bdi_writeback_workfn() ->
set_worker_desc() because bdi->dev is NULL.

This can happen because even though bdi_unregister() cancels all pending
flushing work, nothing really prevents new ones from being queued from
balance_dirty_pages() or other places.

Fix the problem by clearing BDI_registered bit in bdi_unregister() and
checking it before scheduling of any flushing work.

Fixes: 839a8e86Reviewed-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: Derek Basehore <dbasehore@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5acda9d1

backing_dev: fix hung task on sync · 6ca738d6

由 Derek Basehore 提交于 4月 03, 2014

bdi_wakeup_thread_delayed() used the mod_delayed_work() function to
schedule work to writeback dirty inodes.  The problem with this is that
it can delay work that is scheduled for immediate execution, such as the
work from sync_inodes_sb().  This can happen since mod_delayed_work()
can now steal work from a work_queue.  This fixes the problem by using
queue_delayed_work() instead.  This is a regression caused by commit
839a8e86 ("writeback: replace custom worker pool implementation with
unbound workqueue").

The reason that this causes a problem is that laptop-mode will change
the delay, dirty_writeback_centisecs, to 60000 (10 minutes) by default.
In the case that bdi_wakeup_thread_delayed() races with
sync_inodes_sb(), sync will be stopped for 10 minutes and trigger a hung
task.  Even if dirty_writeback_centisecs is not long enough to cause a
hung task, we still don't want to delay sync for that long.

We fix the problem by using queue_delayed_work() when we want to
schedule writeback sometime in future.  This function doesn't change the
timer if it is already armed.

For the same reason, we also change bdi_writeback_workfn() to
immediately queue the work again in the case that the work_list is not
empty.  The same problem can happen if the sync work is run on the
rescue worker.

[jack@suse.cz: update changelog, add comment, use bdi_wakeup_thread_delayed()]
Signed-off-by: NDerek Basehore <dbasehore@chromium.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Alexander Viro <viro@zento.linux.org.uk>
Reviewed-by: NTejun Heo <tj@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Derek Basehore <dbasehore@chromium.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Benson Leung <bleung@chromium.org>
Cc: Sonny Rao <sonnyrao@chromium.org>
Cc: Luigi Semenzato <semenzato@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Dave Chinner <david@fromorbit.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6ca738d6