提交 · ecf081d1a73b077916f514f2ec744ded32b88ca1 · openeuler / Kernel

28 7月, 2010 26 次提交

vfs: introduce FMODE_NONOTIFY · ecf081d1

由 Eric Paris 提交于 12月 17, 2009

This is a new f_mode which can only be set by the kernel.  It indicates
that the fd was opened by fanotify and should not cause future fanotify
events.  This is needed to prevent fanotify livelock.  An example of
obvious livelock is from fanotify close events.

Process A closes file1
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
notice a pattern?

The fix is to add the FMODE_NONOTIFY bit to the open filp done by the kernel
for fanotify.  Thus when that file is used it will not generate future
events.

This patch simply defines the bit.
Signed-off-by: NEric Paris <eparis@redhat.com>

ecf081d1

fsnotify: rename mark_entry to just mark · 841bdc10

由 Eric Paris 提交于 12月 17, 2009

previously I used mark_entry when talking about marks on inodes.  The
_entry is pretty useless.  Just use "mark" instead.
Signed-off-by: NEric Paris <eparis@redhat.com>

841bdc10

E
fsnotify: rename fsnotify_find_mark_entry to fsnotify_find_mark · d0775441
由 Eric Paris 提交于 12月 17, 2009
```
the _entry portion of fsnotify functions is useless.  Drop it.
Signed-off-by: NEric Paris <eparis@redhat.com>
```
d0775441

fsnotify: rename fsnotify_mark_entry to just fsnotify_mark · e61ce867

由 Eric Paris 提交于 12月 17, 2009

The name is long and it serves no real purpose.  So rename
fsnotify_mark_entry to just fsnotify_mark.
Signed-off-by: NEric Paris <eparis@redhat.com>

e61ce867

fsnotify: kill FSNOTIFY_EVENT_FILE · 72acc854

由 Andreas Gruenbacher 提交于 12月 17, 2009

Some fsnotify operations send a struct file.  This is more information than
we technically need.  We instead send a struct path in all cases instead of
sometimes a path and sometimes a file.
Signed-off-by: NAndreas Gruenbacher <agruen@suse.de>
Signed-off-by: NEric Paris <eparis@redhat.com>

72acc854

fsnotify: add flags to fsnotify_mark_entries · 098cf2fc

由 Eric Paris 提交于 12月 17, 2009

To differentiate between inode and vfsmount (or other future) types of
marks we add a flags field and set the inode bit on inode marks (the only
currently supported type of mark)
Signed-off-by: NEric Paris <eparis@redhat.com>

098cf2fc

fsnotify: add vfsmount specific fields to the fsnotify_mark_entry union · 4136510d

由 Eric Paris 提交于 12月 17, 2009

vfsmount marks need mostly the same data as inode specific fields, but for
consistency and understandability we put that data in a vfsmount specific
struct inside a union with inode specific data.
Signed-off-by: NEric Paris <eparis@redhat.com>

4136510d

fsnotify: put inode specific fields in an fsnotify_mark in a union · 2823e04d

由 Eric Paris 提交于 12月 17, 2009

The addition of marks on vfs mounts will be simplified if the inode
specific parts of a mark and the vfsmnt specific parts of a mark are
actually in a union so naming can be easy.  This patch just implements the
inode struct and the union.
Signed-off-by: NEric Paris <eparis@redhat.com>

2823e04d

fsnotify: include vfsmount in should_send_event when appropriate · 3a9fb89f

由 Eric Paris 提交于 12月 17, 2009

To ensure that a group will not duplicate events when it receives it based
on the vfsmount and the inode should_send_event test we should distinguish
those two cases.  We pass a vfsmount to this function so groups can make
their own determinations.
Signed-off-by: NEric Paris <eparis@redhat.com>

3a9fb89f

fsnotify: mount point listeners list and global mask · 7131485a

由 Eric Paris 提交于 12月 17, 2009

currently all of the notification systems implemented select which inodes
they care about and receive messages only about those inodes (or the
children of those inodes.) This patch begins to flesh out fsnotify support
for the concept of listeners that want to hear notification for an inode
accessed below a given monut point. This patch implements a second list
of fsnotify groups to hold these types of groups and a second global mask
to hold the events of interest for this type of group.

The reason we want a second group list and mask is because the inode based
notification should_send_event support which makes each group look for a mark
on the given inode. With one nfsmount listener that means that every group would
have to take the inode->i_lock, look for their mark, not find one, and return
for every operation. By seperating vfsmount from inode listeners only when
there is a inode listener will the inode groups have to look for their
mark and take the inode lock. vfsmount listeners will have to grab the lock and
look for a mark but there should be fewer of them, and one vfsmount listener
won't cause the i_lock to be grabbed and released for every fsnotify group
on every io operation.
Signed-off-by: NEric Paris <eparis@redhat.com>

7131485a

fsnotify: rename fsnotify_groups to fsnotify_inode_groups · 19c2a0e1

由 Eric Paris 提交于 12月 17, 2009

Simple renaming patch. fsnotify is about to support mount point listeners
so I am renaming fsnotify_groups and fsnotify_mask to indicate these are lists
used only for groups which have watches on inodes.
Signed-off-by: NEric Paris <eparis@redhat.com>

19c2a0e1

fsnotify: drop mask argument from fsnotify_alloc_group · 0d2e2a1d

由 Eric Paris 提交于 12月 17, 2009

Nothing uses the mask argument to fsnotify_alloc_group.  This patch drops
that argument.
Signed-off-by: NEric Paris <eparis@redhat.com>

0d2e2a1d

fsnotify: fsnotify_obtain_group should be fsnotify_alloc_group · ffab8340

由 Eric Paris 提交于 12月 17, 2009

fsnotify_obtain_group was intended to be able to find an already existing
group.  Nothing uses that functionality.  This just renames it to
fsnotify_alloc_group so it is clear what it is doing.
Signed-off-by: NEric Paris <eparis@redhat.com>

ffab8340

fsnotify: remove group_num altogether · 74be0cc8

由 Eric Paris 提交于 12月 17, 2009

The original fsnotify interface has a group-num which was intended to be
able to find a group after it was added.  I no longer think this is a
necessary thing to do and so we remove the group_num.
Signed-off-by: NEric Paris <eparis@redhat.com>

74be0cc8

fsnotify: replace an event on a list · 1201a536

由 Eric Paris 提交于 12月 17, 2009

fanotify would like to clone events already on its notification list, make
changes to the new event, and then replace the old event on the list with
the new event.  This patch implements the replace functionality of that
process.
Signed-off-by: NEric Paris <eparis@redhat.com>

1201a536

fsnotify: clone existing events · b4e4e140

由 Eric Paris 提交于 12月 17, 2009

fsnotify_clone_event will take an event, clone it, and return the cloned
event to the caller. Since events may be in use by multiple fsnotify
groups simultaneously certain event entries (such as the mask) cannot be
changed after the event was created. Since fanotify would like to merge
events happening on the same file it needs a new clean event to work with
so it can change any fields it wishes.
Signed-off-by: NEric Paris <eparis@redhat.com>

b4e4e140

fsnotify: per group notification queue merge types · 74766bbf

由 Eric Paris 提交于 12月 17, 2009

inotify only wishes to merge a new event with the last event on the
notification fifo. fanotify is willing to merge any events including by
means of bitwise OR masks of multiple events together. This patch moves
the inotify event merging logic out of the generic fsnotify notification.c
and into the inotify code. This allows each use of fsnotify to provide
their own merge functionality.
Signed-off-by: NEric Paris <eparis@redhat.com>

74766bbf

fsnotify: send struct file when sending events to parents when possible · 28c60e37

由 Eric Paris 提交于 12月 17, 2009

fanotify needs a path in order to open an fd to the object which changed.
Currently notifications to inode's parents are done using only the inode.
For some parental notification we have the entire file, send that so
fanotify can use it.
Signed-off-by: NEric Paris <eparis@redhat.com>

28c60e37

fsnotify: pass a file instead of an inode to open, read, and write · 2a12a9d7

由 Eric Paris 提交于 12月 17, 2009

fanotify, the upcoming notification system actually needs a struct path so it can
do opens in the context of listeners, and it needs a file so it can get f_flags
from the original process. Close was the only operation that already was passing
a struct file to the notification hook. This patch passes a file for access,
modify, and open as well as they are easily available to these hooks.
Signed-off-by: NEric Paris <eparis@redhat.com>

2a12a9d7

fsnotify: include data in should_send calls · 8112e2d6

由 Eric Paris 提交于 12月 17, 2009

fanotify is going to need to look at file->private_data to know if an event
should be sent or not.  This passes the data (which might be a file,
dentry, inode, or none) to the should_send function calls so fanotify can
get that information when available
Signed-off-by: NEric Paris <eparis@redhat.com>

8112e2d6

fsnotify: provide the data type to should_send_event · 7b0a04fb

由 Eric Paris 提交于 12月 17, 2009

fanotify is only interested in event types which contain enough information
to open the original file in the context of the fanotify listener.  Since
fanotify may not want to send events if that data isn't present we pass
the data type to the should_send_event function call so fanotify can express
its lack of interest.
Signed-off-by: NEric Paris <eparis@redhat.com>

7b0a04fb

E
inotify: remove inotify in kernel interface · 2dfc1cae
由 Eric Paris 提交于 12月 17, 2009
```
nothing uses inotify in the kernel, drop it!
Signed-off-by: NEric Paris <eparis@redhat.com>
```
2dfc1cae

audit: reimplement audit_trees using fsnotify rather than inotify · 28a3a7eb

由 Eric Paris 提交于 12月 17, 2009

Simply switch audit_trees from using inotify to using fsnotify for it's
inode pinning and disappearing act information.
Signed-off-by: NEric Paris <eparis@redhat.com>

28a3a7eb

fsnotify: allow addition of duplicate fsnotify marks · 40554c3d

由 Eric Paris 提交于 12月 17, 2009

This patch allows a task to add a second fsnotify mark to an inode for the
same group. This mark will be added to the end of the inode's list and
this will never be found by the stand fsnotify_find_mark() function. This
is useful if a user wants to add a new mark before removing the old one.
Signed-off-by: NEric Paris <eparis@redhat.com>

40554c3d

fsnotify: duplicate fsnotify_mark_entry data between 2 marks · 9e1c7432

由 Eric Paris 提交于 12月 17, 2009

Simple copy fsnotify information from one mark to another in preparation
for the second mark to replace the first.
Signed-off-by: NEric Paris <eparis@redhat.com>

9e1c7432

audit: convert audit watches to use fsnotify instead of inotify · e9fd702a

由 Eric Paris 提交于 12月 17, 2009

Audit currently uses inotify to pin inodes in core and to detect when
watched inodes are deleted or unmounted.  This patch uses fsnotify instead
of inotify.
Signed-off-by: NEric Paris <eparis@redhat.com>

e9fd702a

25 7月, 2010 1 次提交

ACPI / Sleep: Allow the NVS saving to be skipped during suspend to RAM · 72ad5d77

由 Rafael J. Wysocki 提交于 7月 23, 2010

Commit 2a6b6976
(ACPI: Store NVS state even when entering suspend to RAM) caused the
ACPI suspend code save the NVS area during suspend and restore it
during resume unconditionally, although it is known that some systems
need to use acpi_sleep=s4_nonvs for hibernation to work.  To allow
the affected systems to avoid saving and restoring the NVS area
during suspend to RAM and resume, introduce kernel command line
option acpi_sleep=nonvs and make acpi_sleep=s4_nonvs work as its
alias temporarily (add acpi_sleep=s4_nonvs to the feature removal
file).

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16396 .
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Reported-and-tested-by: Ntomas m <tmezzadra@gmail.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

72ad5d77

23 7月, 2010 1 次提交

macvtap: Limit packet queue length · 8a35747a

由 Herbert Xu 提交于 7月 21, 2010

Mark Wagner reported OOM symptoms when sending UDP traffic over
a macvtap link to a kvm receiver.

This appears to be caused by the fact that macvtap packet queues
are unlimited in length.  This means that if the receiver can't
keep up with the rate of flow, then we will hit OOM. Of course
it gets worse if the OOM killer then decides to kill the receiver.

This patch imposes a cap on the packet queue length, in the same
way as the tuntap driver, using the device TX queue length.

Please note that macvtap currently has no way of giving congestion
notification, that means the software device TX queue cannot be
used and packets will always be dropped once the macvtap driver
queue fills up.

This shouldn't be a great problem for the scenario where macvtap
is used to feed a kvm receiver, as the traffic is most likely
external in origin so congestion notification can't be applied
anyway.

Of course, if anybody decides to complain about guest-to-guest
UDP packet loss down the track, then we may have to revisit this.

Incidentally, this patch also fixes a real memory leak when
macvtap_get_queue fails.

Chris Wright noticed that for this patch to work, we need a
non-zero TX queue length.  This patch includes his work to change
the default macvtap TX queue length to 500.
Reported-by: NMark Wagner <mwagner@redhat.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NChris Wright <chrisw@sous-sol.org>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a35747a

22 7月, 2010 1 次提交

sysrq,kdb: Use __handle_sysrq() for kdb's sysrq function · edd63cb6

由 Jason Wessel 提交于 7月 21, 2010

The kdb code should not toggle the sysrq state in case an end user
wants to try and resume the normal kernel execution.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

edd63cb6

21 7月, 2010 2 次提交

include/linux/vgaarb.h: add missing part of include guard · a6a1a095

由 Doug Goldstein 提交于 7月 20, 2010

vgaarb.h was missing the #define of the #ifndef at the top for the guard
to prevent multiple #include's from causing re-define errors
Signed-off-by: NDoug Goldstein <cardoe@gentoo.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDave Airlie <airlied@redhat.com>

a6a1a095

vfs: fix RCU-lockdep false positive due to /proc · 844b9a87

由 Paul E. McKenney 提交于 7月 20, 2010

If a single-threaded process does a file-descriptor operation, and some
other process accesses that same file descriptor via /proc, the current
rcu_dereference_check_fdtable() can give a false-positive RCU-lockdep
splat due to the reference count being increased by the /proc access after
the reference-count check in fget_light() but before the check in
rcu_dereference_check_fdtable().

This commit prevents this false positive by checking for a single-threaded
process.  To avoid #include hell, this commit uses the wrapper for
thread_group_empty(current) defined by rcu_my_thread_group_empty()
provided in a separate commit.
Located-by: NMiles Lane <miles.lane@gmail.com>
Located-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

844b9a87

20 7月, 2010 1 次提交

fb: handle allocation failure in alloc_apertures() · 772a2f9b

由 Dan Carpenter 提交于 7月 15, 2010

If the kzalloc() fails we should return NULL.  All the places that call
alloc_apertures() check for this already.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Acked-by: NJames Simmons <jsimmons@infradead.org>
Acked-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

772a2f9b

19 7月, 2010 1 次提交

mm: add context argument to shrinker callback · 7f8275d0

由 Dave Chinner 提交于 7月 19, 2010

The current shrinker implementation requires the registered callback
to have global state to work from. This makes it difficult to shrink
caches that are not global (e.g. per-filesystem caches). Pass the shrinker
structure to the callback so that users can embed the shrinker structure
in the context the shrinker needs to operate on and get back to it in the
callback via container_of().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

7f8275d0

17 7月, 2010 1 次提交

PCI: fall back to original BIOS BAR addresses · 58c84eda

由 Bjorn Helgaas 提交于 7月 15, 2010

If we fail to assign resources to a PCI BAR, this patch makes us try the
original address from BIOS rather than leaving it disabled.

Linux tries to make sure all PCI device BARs are inside the upstream
PCI host bridge or P2P bridge apertures, reassigning BARs if necessary.
Windows does similar reassignment.

Before this patch, if we could not move a BAR into an aperture, we left
the resource unassigned, i.e., at address zero. Windows leaves such BARs
at the original BIOS addresses, and this patch makes Linux do the same.

This is a bit ugly because we disable the resource long before we try to
reassign it, so we have to keep track of the BIOS BAR address somewhere.
For lack of a better place, I put it in the struct pci_dev.

I think it would be cleaner to attempt the assignment immediately when the
claim fails, so we could easily remember the original address. But we
currently claim motherboard resources in the middle, after attempting to
claim PCI resources and before assigning new PCI resources, and changing
that is a fairly big job.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16263Reported-by: NAndrew <nitr0@seti.kr.ua>
Tested-by: NAndrew <nitr0@seti.kr.ua>
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>

58c84eda

16 7月, 2010 1 次提交

jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions · 13ceef09

由 Jan Kara 提交于 7月 14, 2010

OCFS2 uses t_commit trigger to compute and store checksum of the just
committed blocks. When a buffer has b_frozen_data, checksum is computed
for it instead of b_data but this can result in an old checksum being
written to the filesystem in the following scenario:

1) transaction1 is opened
2) handle1 is opened
3) journal_access(handle1, bh)
    - This sets jh->b_transaction to transaction1
4) modify(bh)
5) journal_dirty(handle1, bh)
6) handle1 is closed
7) start committing transaction1, opening transaction2
8) handle2 is opened
9) journal_access(handle2, bh)
    - This copies off b_frozen_data to make it safe for transaction1 to commit.
      jh->b_next_transaction is set to transaction2.
10) jbd2_journal_write_metadata() checksums b_frozen_data
11) the journal correctly writes b_frozen_data to the disk journal
12) handle2 is closed
    - There was no dirty call for the bh on handle2, so it is never queued for
      any more journal operation
13) Checkpointing finally happens, and it just spools the bh via normal buffer
writeback.  This will write b_data, which was never triggered on and thus
contains a wrong (old) checksum.

This patch fixes the problem by calling the trigger at the moment data is
frozen for journal commit - i.e., either when b_frozen_data is created by
do_get_write_access or just before we write a buffer to the log if
b_frozen_data does not exist. We also rename the trigger to t_frozen as
that better describes when it is called.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

13ceef09

14 7月, 2010 1 次提交

lmb: rename to memblock · 95f72d1e

由 Yinghai Lu 提交于 7月 12, 2010

via following scripts

      FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

      sed -i \
        -e 's/lmb/memblock/g' \
        -e 's/LMB/MEMBLOCK/g' \
        $FILES

      for N in $(find . -name lmb.[ch]); do
        M=$(echo $N | sed 's/lmb/memblock/g')
        mv $N $M
      done

and remove some wrong change like lmbench and dlmb etc.

also move memblock.c from lib/ to mm/
Suggested-by: NIngo Molnar <mingo@elte.hu>
Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

95f72d1e

10 7月, 2010 1 次提交

tracing: Add alignment to syscall metadata declarations · 44a54f78

由 Steven Rostedt 提交于 7月 09, 2010

For some reason if we declare a static variable and then assign it
later, and the assignment contains a __attribute__((__aligned__(#))),
some versions of gcc will ignore it.

This caused the syscall meta data to not be compact in its section
and caused a kernel oops when the section was being read.

The fix for these versions of gcc seems to be to add the aligned
attribute to the declaration as well.

This fixes the BZ regression:

  https://bugzilla.kernel.org/show_bug.cgi?id=16353Reported-by: NZeev Tarantov <zeev.tarantov@gmail.com>
Tested-by: NZeev Tarantov <zeev.tarantov@gmail.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <AANLkTinkKVmB0fpVeqUkMeqe3ZYeXJdI8xDuzJEOjYwh@mail.gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

44a54f78

07 7月, 2010 1 次提交

VFS: introduce s_dirty accessors · 140236b4

由 Artem Bityutskiy 提交于 6月 10, 2010

This patch introduces 3 VFS accessors: 'sb_mark_dirty()',
'sb_mark_clean()', and 'sb_is_dirty()'. They simply
set 'sb->s_dirt' or test 'sb->s_dirt'. The plan is to make
every FS use these accessors later instead of manipulating
the 'sb->s_dirt' flag directly.

Ultimately, this change is a preparation for the periodic
superblock synchronization optimization which is about
preventing the "sync_supers" kernel thread from waking up
even if there is nothing to synchronize.

This patch does not do any functional change, just adds
accessor functions.
Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

140236b4

06 7月, 2010 2 次提交

writeback: simplify the write back thread queue · 83ba7b07

由 Christoph Hellwig 提交于 7月 06, 2010

First remove items from work_list as soon as we start working on them. This
means we don't have to track any pending or visited state and can get
rid of all the RCU magic freeing the work items - we can simply free
them once the operation has finished. Second use a real completion for
tracking synchronous requests - if the caller sets the completion pointer
we complete it, otherwise use it as a boolean indicator that we can free
the work item directly. Third unify struct wb_writeback_args and struct
bdi_work into a single data structure, wb_writeback_work. Previous we
set all parameters into a struct wb_writeback_args, copied it into
struct bdi_work, copied it again on the stack to use it there. Instead
of just allocate one structure dynamically or on the stack and use it
all the way through the stack.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

83ba7b07

writeback: split writeback_inodes_wb · edadfb10

由 Christoph Hellwig 提交于 6月 10, 2010

The case where we have a superblock doesn't require a loop here as we scan
over all inodes in writeback_sb_inodes. Split it out into a separate helper
to make the code simpler.  This also allows to get rid of the sb member in
struct writeback_control, which was rather out of place there.

Also update the comments in writeback_sb_inodes that explain the handling
of inodes from wrong superblocks.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

edadfb10

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功