提交 · b29866aab8489487f11cc4506590ac31bdbae22a · openeuler / Kernel

29 10月, 2010 15 次提交

fsnotify: rename FS_IN_ISDIR to FS_ISDIR · b29866aa

由 Eric Paris 提交于 10月 28, 2010

The _IN_ in the naming is reserved for flags only used by inotify.  Since I
am about to use this flag for fanotify rename it to be generic like the
rest.
Signed-off-by: NEric Paris <eparis@redhat.com>

b29866aa

fanotify: limit number of listeners per user · 4afeff85

由 Eric Paris 提交于 10月 28, 2010

fanotify currently has no limit on the number of listeners a given user can
have open. This patch limits the total number of listeners per user to
128. This is the same as the inotify default limit.
Signed-off-by: NEric Paris <eparis@redhat.com>

4afeff85

fanotify: allow userspace to override max marks · ac7e22dc

由 Eric Paris 提交于 10月 28, 2010

Some fanotify groups, especially those like AV scanners, will need to place
lots of marks, particularly ignore marks. Since ignore marks do not pin
inodes in cache and are cleared if the inode is removed from core (usually
under memory pressure) we expose an interface for listeners, with
CAP_SYS_ADMIN, to override the maximum number of marks and be allowed to
set and 'unlimited' number of marks. Programs which make use of this
feature will be able to OOM a machine.
Signed-off-by: NEric Paris <eparis@redhat.com>

ac7e22dc

fanotify: limit the number of marks in a single fanotify group · e7099d8a

由 Eric Paris 提交于 10月 28, 2010

There is currently no limit on the number of marks a given fanotify group
can have. Since fanotify is gated on CAP_SYS_ADMIN this was not seen as
a serious DoS threat. This patch implements a default of 8192, the same as
inotify to work towards removing the CAP_SYS_ADMIN gating and eliminating
the default DoS'able status.
Signed-off-by: NEric Paris <eparis@redhat.com>

e7099d8a

fanotify: allow userspace to override max queue depth · 5dd03f55

由 Eric Paris 提交于 10月 28, 2010

fanotify has a defualt max queue depth. This patch allows processes which
explicitly request it to have an 'unlimited' queue depth. These processes
need to be very careful to make sure they cannot fall far enough behind
that they OOM the box. Thus this flag is gated on CAP_SYS_ADMIN.
Signed-off-by: NEric Paris <eparis@redhat.com>

5dd03f55

fsnotify: implement a default maximum queue depth · 2529a0df

由 Eric Paris 提交于 10月 28, 2010

Currently fanotify has no maximum queue depth.  Since fanotify is
CAP_SYS_ADMIN only this does not pose a normal user DoS issue, but it
certianly is possible that an fanotify listener which can't keep up could
OOM the box.  This patch implements a default 16k depth.  This is the same
default depth used by inotify, but given fanotify's better queue merging in
many situations this queue will contain many additional useful events by
comparison.
Signed-off-by: NEric Paris <eparis@redhat.com>

2529a0df

fanotify: allow userspace to flush all marks · bbf2aba5

由 Eric Paris 提交于 10月 28, 2010

fanotify is supposed to be able to flush all marks.  This is mostly useful
for the AV community to flush all cached decisions on a security policy
change.  This functionality has existed in the kernel but wasn't correctly
exposed to userspace.
Signed-off-by: NEric Paris <eparis@redhat.com>

bbf2aba5

fsnotify: call fsnotify_parent in perm events · 52420392

由 Eric Paris 提交于 10月 28, 2010

fsnotify perm events do not call fsnotify parent. That means you cannot
register a perm event on a directory and enforce permissions on all inodes in
that directory. This patch fixes that situation.
Signed-off-by: NEric Paris <eparis@redhat.com>

52420392

fsnotify: correctly handle return codes from listeners · ff8bcbd0

由 Eric Paris 提交于 10月 28, 2010

When fsnotify groups return errors they are ignored.  For permissions
events these should be passed back up the stack, but for most events these
should continue to be ignored.
Signed-off-by: NEric Paris <eparis@redhat.com>

ff8bcbd0

fanotify: use __aligned_u64 in fanotify userspace metadata · 28682019

由 Eric Paris 提交于 10月 28, 2010

Currently the userspace struct exposed by fanotify uses
__attribute__((packed)) to make sure that alignment works on multiarch
platforms.  Since this causes a severe performance penalty on some
platforms we are going to switch to using explicit alignment notation on
the 64bit values so we don't have to use 'packed'
Signed-off-by: NEric Paris <eparis@redhat.com>

28682019

fanotify: implement fanotify listener ordering · 4231a235

由 Eric Paris 提交于 10月 28, 2010

The fanotify listeners needs to be able to specify what types of operations
they are going to perform so they can be ordered appropriately between other
listeners doing other types of operations.  They need this to be able to make
sure that things like hierarchichal storage managers will get access to inodes
before processes which need the data.  This patch defines 3 possible uses
which groups must indicate in the fanotify_init() flags.

FAN_CLASS_PRE_CONTENT
FAN_CLASS_CONTENT
FAN_CLASS_NOTIF

Groups will receive notification in that order.  The order between 2 groups in
the same class is undeterministic.

FAN_CLASS_PRE_CONTENT is intended to be used by listeners which need access to
the inode before they are certain that the inode contains it's final data.  A
hierarchical storage manager should choose to use this class.

FAN_CLASS_CONTENT is intended to be used by listeners which need access to the
inode after it contains its intended contents.  This would be the appropriate
level for an AV solution or document control system.

FAN_CLASS_NOTIF is intended for normal async notification about access, much the
same as inotify and dnotify.  Syncronous permissions events are not permitted
at this class.
Signed-off-by: NEric Paris <eparis@redhat.com>

4231a235

fsnotify: implement ordering between notifiers · 6ad2d4e3

由 Eric Paris 提交于 10月 28, 2010

fanotify needs to be able to specify that some groups get events before
others. They use this idea to make sure that a hierarchical storage
manager gets access to files before programs which actually use them. This
is purely infrastructure. Everything will have a priority of 0, but the
infrastructure will exist for it to be non-zero.
Signed-off-by: NEric Paris <eparis@redhat.com>

6ad2d4e3

fanotify: allow fanotify to be built · 9343919c

由 Eric Paris 提交于 10月 28, 2010

We disabled the ability to build fanotify in commit 7c534773.
This reverts that commit and allows people to build fanotify.
Signed-off-by: NEric Paris <eparis@redhat.com>

9343919c

mmu_notifier.h: fix comment spelling · e732ff70

由 Figo.zhang 提交于 10月 26, 2010

Signed-off-by: NFigo.zhang <figo1802@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e732ff70

Fix compile brekage with !CONFIG_BLOCK · b31d42a5

由 Ingo Molnar 提交于 10月 28, 2010

Today's git tree fails to build on !CONFIG_BLOCK, due to upstream commit
367a51a3 ("fs: Add FITRIM ioctl"):

include/linux/fs.h:36: error: expected specifier-qualifier-list before ‘uint64_t’
include/linux/fs.h:36: error: expected specifier-qualifier-list before ‘uint64_t’
include/linux/fs.h:36: error: expected specifier-qualifier-list before ‘uint64_t’

The commit adds uint64_t type usage to fs.h, but linux/types.h is not included
explicitly - it's only included implicitly via linux/blk_types.h, and there only if
CONFIG_BLOCK is enabled.

Add the explicit #include to fix this.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b31d42a5

28 10月, 2010 25 次提交

9p: Use V9FS_MAGIC in statfs · 368c09d2

由 M. Mohan Kumar 提交于 9月 27, 2010

Use V9FS_MAGIC as the file system type while filling kernel statfs
strucutre instead of using host file system magic number. Also move
the definition of V9FS_MAGIC from v9fs.h to standard magic.h file.
Signed-off-by: NM. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: NVenkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>

368c09d2

ext4: fix kernel oops if the journal superblock has a non-zero j_errno · 7f93cff9

由 Theodore Ts'o 提交于 10月 27, 2010

Commit 84061e07 fixed an accounting bug only to introduce the
possibility of a kernel OOPS if the journal has a non-zero j_errno
field indicating that the file system had detected a fs inconsistency.
After the journal replay, if the journal superblock indicates that the
file system has an error, this indication is transfered to the file
system and then ext4_commit_super() is called to write this to the
disk.

But since the percpu counters are now initialized after the journal
replay, the call to ext4_commit_super() will cause a kernel oops since
it needs to use the percpu counters the ext4 superblock structure.

The fix is to skip setting the ext4 free block and free inode fields
if the percpu counter has not been set.

Thanks to Ken Sumrall for reporting and analyzing the root causes of
this bug.

Addresses-Google-Bug: #3054080
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

7f93cff9

ext4: implement writeback livelock avoidance using page tagging · 5b41d924

由 Eric Sandeen 提交于 10月 27, 2010

This is analogous to Jan Kara's commit,
f446daae
mm: implement writeback livelock avoidance using page tagging

but since we forked write_cache_pages, we need to reimplement
it there (and in ext4_da_writepages, since range_cyclic handling
was moved to there)

If you start a large buffered IO to a file, and then set
fsync after it, you'll find that fsync does not complete
until the other IO stops.

If you continue re-dirtying the file (say, putting dd
with conv=notrunc in a loop), when fsync finally completes
(after all IO is done), it reports via tracing that
it has written many more pages than the file contains;
in other words it has synced and re-synced pages in
the file multiple times.

This then leads to problems with our writeback_index
update, since it advances it by pages written, and
essentially sets writeback_index off the end of the
file...

With the following patch, we only sync as much as was
dirty at the time of the sync.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

5b41d924

fs: Add FITRIM ioctl · 367a51a3

由 Lukas Czerner 提交于 10月 27, 2010

Adds an filesystem independent ioctl to allow implementation of file
system batched discard support. I takes fstrim_range structure as an
argument. fstrim_range is definec in the include/fs.h and its
definition is as follows.

struct fstrim_range {
	start;
	len;
	minlen;
}

start	- first Byte to trim
len	- number of Bytes to trim from start
minlen	- minimum extent length to trim, free extents shorter than this
	  number of Bytes will be ignored. This will be rounded up to fs
	  block size.

It is also possible to specify NULL as an argument. In this case the
arguments will set itself as follows:

start = 0;
len = ULLONG_MAX;
minlen = 0;

So it will trim the whole file system at one run.

After the FITRIM is done, the number of actually discarded Bytes is stored
in fstrim_range.len to give the user better insight on how much storage
space has been really released for wear-leveling.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

367a51a3

Add helper function for blkdev_issue_zeroout (sb_issue_discard) · e6fa0be6

由 Lukas Czerner 提交于 10月 27, 2010

This is done the same way as helper sb_issue_discard for
blkdev_issue_discard.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

e6fa0be6

jbd2: Fix I/O hang in jbd2_journal_release_jbd_inode · 39e3ac25

由 Brian King 提交于 10月 27, 2010

This fixes a hang seen in jbd2_journal_release_jbd_inode
on a lot of Power 6 systems running with ext4. When we get
in the hung state, all I/O to the disk in question gets blocked
where we stay indefinitely. Looking at the task list, I can see
we are stuck in jbd2_journal_release_jbd_inode waiting on a
wake up. I added some debug code to detect this scenario and
dump additional data if we were stuck in jbd2_journal_release_jbd_inode
for longer than 30 minutes. When it hit, I was able to see that
i_flags was 0, suggesting we missed the wake up.

This patch changes i_flags to be an unsigned long, uses bit operators
to access it, and adds barriers around the accesses. Prior to applying
this patch, we were regularly hitting this hang on numerous systems
in our test environment. After applying the patch, the hangs no longer
occur.
Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>

39e3ac25

epoll: make epoll_wait() use the hrtimer range feature · 95aac7b1

由 Shawn Bohrer 提交于 10月 27, 2010

This make epoll use hrtimers for the timeout value which prevents
epoll_wait() from timing out up to a millisecond early.

This mirrors the behavior of select() and poll().
Signed-off-by: NShawn Bohrer <shawn.bohrer@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: NDavide Libenzi <davidel@xmailserver.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

95aac7b1

Remove duplicate includes from many files · 61d8e11e

由 Zimny Lech 提交于 10月 27, 2010

Signed-off-by: NZimny Lech <napohybelskurwysynom2010@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

61d8e11e

ramoops: use the platform data structure instead of module params · c3b92ce9

由 Kyungmin Park 提交于 10月 27, 2010

As each board and system has different memory for ramoops.  It's better to
define the platform data instead of module params.

[akpm@linux-foundation.org: fix ramoops_remove() return type]
Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c3b92ce9

kfifo: fix kfifo_alloc() to return a signed int value · 144ecf31

由 Stefani Seibold 提交于 10月 27, 2010

Add a new __kfifo_int_must_check_helper() helper function, which is needed
for kfifo_alloc() to return the right signed integer value.

The origin __kfifo_must_check_helper() helper was renamed into
__kfifo_uint_must_check_helper() to show the sign which is expected and
returned.

(And revert the temporary disabling of __kfifo_must_check_helper())
Signed-off-by: NStefani Seibold <stefani@seibold.net>
Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

144ecf31

rapidio: fix IDLE2 bits corruption · 388c45cc

由 Alexandre Bounine 提交于 10月 27, 2010

RapidIO spec v.2.1 adds Idle Sequence 2 into LP-Serial Physical Layer.
The fix ensures that corresponding bits are not corrupted during error
handling.
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

388c45cc

rapidio: add handling of redundant routes · af84ca38

由 Alexandre Bounine 提交于 10月 27, 2010

Detects RIO link to the already enumerated device and properly sets links
between device objects.  Changes to the enumeration/discovery logic:

1. Use Master Enable bit to signal end of the enumeration - agents may
   start their discovery process as soon as they see this bit set
   (Component Tag register was used before for this purpose).

2. Enumerator sets Component Tag (!= 0) immediately during device
   setup.  This allows to identify the device if the redundant route
   exists in a RIO system.
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

af84ca38

rapidio: add support for IDT CPS Gen2 switches · a3725c45

由 Alexandre Bounine 提交于 10月 27, 2010

Add the RIO switch driver and definitions for IDT CPS-1848 and CPS-1616
Gen2 devices.
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a3725c45

rapidio: modify sysfs initialization for switches · ac38d723

由 Alexandre Bounine 提交于 10月 27, 2010

1. Change to create attribute "routes" only for switches.

2. Add a switch-specific callback to create/remove proprietary attributes.
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ac38d723

rapidio: add default handler for error-stopped state · dd5648c9

由 Alexandre Bounine 提交于 10月 27, 2010

The default error-stopped state handler provides recovery mechanism as
defined by RIO specification.
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dd5648c9

rapidio: add relation links between RIO device structures · 68fe4df5

由 Alexandre Bounine 提交于 10月 27, 2010

Create back and forward links between RIO devices.  These links are
intended for use by error management and hot-plug extensions.  Links for
redundant RIO connections between switches are not set (will be fixed in a
separate patch).
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

68fe4df5

rapidio: use stored ingress port number instead of register read · ae05cbd5

由 Alexandre Bounine 提交于 10月 27, 2010

The switch port information is obtained and stored during RIO device
setup.  Therefore repeated reads from Switch Port Information CAR may be
removed.
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ae05cbd5

rapidio: fix RapidIO sysfs hierarchy · 2c70f022

由 Alexandre Bounine 提交于 10月 27, 2010

This set of RapidIO patches extends support for standard error recovery
mechanism and adds new IDT Gen2 sRIO switch devices - CPS-1848 and
CPS-1616.  Implementation of the standard error-stopped state recovery
mechanism (as defined by the RapidIO specification) is required for the
new switches.

Version 2 of this set of patches addresses received comments and fixes an
error notification setup issue found in the idt_gen2.c after the first
version was released.

This patch:

Make RapidIO devices appear in /sys/devices/rapidio directory instead of
top of /sys/devices directory.
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2c70f022

drivers/char/synclink_gt.c: add extended sync feature · 9807224f

由 Paul Fulghum 提交于 10月 27, 2010

Add support for extended byte synchronous mode feature of hardware.
Signed-off-by: NPaul Fulghum <paulkf@microgate.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9807224f

/proc/stat: fix scalability of irq sum of all cpu · 478735e3

由 KAMEZAWA Hiroyuki 提交于 10月 27, 2010

In /proc/stat, the number of per-IRQ event is shown by making a sum each
irq's events on all cpus.  But we can make use of kstat_irqs().

kstat_irqs() do the same calculation, If !CONFIG_GENERIC_HARDIRQ,
it's not a big cost. (Both of the number of cpus and irqs are small.)

If a system is very big and CONFIG_GENERIC_HARDIRQ, it does

	for_each_irq()
		for_each_cpu()
			- look up a radix tree
			- read desc->irq_stat[cpu]
This seems not efficient. This patch adds kstat_irqs() for
CONFIG_GENRIC_HARDIRQ and change the calculation as

	for_each_irq()
		look up radix tree
		for_each_cpu()
			- read desc->irq_stat[cpu]

This reduces cost.

A test on (4096cpusp, 256 nodes, 4592 irqs) host (by Jack Steiner)

%time cat /proc/stat > /dev/null

Before Patch:	 2.459 sec
After Patch :	  .561 sec

[akpm@linux-foundation.org: unexport kstat_irqs, coding-style tweaks]
[akpm@linux-foundation.org: fix unused variable 'per_irq_sum']
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: NJack Steiner <steiner@sgi.com>
Acked-by: NJack Steiner <steiner@sgi.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

478735e3

/proc/stat: scalability of irq num per cpu · f2c66cd8

由 KAMEZAWA Hiroyuki 提交于 10月 27, 2010

/proc/stat shows the total number of all interrupts to each cpu.  But when
the number of IRQs are very large, it take very long time and 'cat
/proc/stat' takes more than 10 secs.  This is because sum of all irq
events are counted when /proc/stat is read.  This patch adds "sum of all
irq" counter percpu and reduce read costs.

The cost of reading /proc/stat is important because it's used by major
applications as 'top', 'ps', 'w', etc....

A test on a mechin (4096cpu, 256 nodes, 4592 irqs) shows

 %time cat /proc/stat > /dev/null
 Before Patch:  12.627 sec
 After  Patch:  2.459 sec
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: NJack Steiner <steiner@sgi.com>
Acked-by: NJack Steiner <steiner@sgi.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f2c66cd8

signals: move cred_guard_mutex from task_struct to signal_struct · 9b1bf12d

由 KOSAKI Motohiro 提交于 10月 27, 2010

Oleg Nesterov pointed out we have to prevent multiple-threads-inside-exec
itself and we can reuse ->cred_guard_mutex for it.  Yes, concurrent
execve() has no worth.

Let's move ->cred_guard_mutex from task_struct to signal_struct.  It
naturally prevent multiple-threads-inside-exec.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b1bf12d

signals: annotate lock_task_sighand() · b8ed374e

由 Namhyung Kim 提交于 10月 27, 2010

lock_task_sighand() grabs sighand->siglock in case of returning non-NULL
but unlock_task_sighand() releases it unconditionally.  This leads sparse
to complain about the lock context imbalance.  Rename and wrap
lock_task_sighand() using __cond_lock() macro to make sparse happy.
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b8ed374e

ptrace: change signature of arch_ptrace() · 9b05a69e

由 Namhyung Kim 提交于 10月 27, 2010

Fix up the arguments to arch_ptrace() to take account of the fact that
@addr and @data are now unsigned long rather than long as of a preceding
patch in this series.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Acked-by: NRoland McGrath <roland@redhat.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b05a69e

ptrace: change signature of sys_ptrace() and friends · 4abf9869

由 Namhyung Kim 提交于 10月 27, 2010

Since userspace API of ptrace syscall defines @addr and @data as void
pointers, it would be more appropriate to define them as unsigned long in
kernel.  Therefore related functions are changed also.

'unsigned long' is typically used in other places in kernel as an opaque
data type and that using this helps cleaning up a lot of warnings from
sparse.
Suggested-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4abf9869

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功