1. 05 1月, 2009 40 次提交
    • S
      GFS2: Send useful information with uevent messages · 9a776db7
      Steven Whitehouse 提交于
      In order to distinguish between two differing uevent messages
      and to avoid using the (racy) method of reading status from
      sysfs in future, this adds some status information to our
      uevent messages.
      
      Btw, before anybody says "sysfs isn't racy", I'm aware of that,
      but the way that GFS2 was using it (send an ambiugous uevent and
      then expect the receiver to read sysfs to find out the status
      of the reported operation) was.
      
      The additional benefit of using the new interface is that it
      should be possible for a node to recover multiple journals
      at the same time, since there is no longer any confusion as
      to which journal the status belongs to.
      
      At some future stage, when all the userland programs have been
      converted, I intend to remove the old interface.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9a776db7
    • S
      GFS2: Fix use-after-free bug on umount · 3af165ac
      Steven Whitehouse 提交于
      There was a use-after-free with the GFS2 super block during
      umount. This patch moves almost all of the umount code from
      ->put_super into ->kill_sb, the only bit that cannot be moved
      being the glock hash clearing which has to remain as ->put_super
      due to umount ordering requirements. As a result its now obvious
      that the kfree is the final operation, whereas before it was
      hidden in ->put_super.
      
      Also gfs2_jindex_free is then only referenced from a single file
      so thats moved and marked static too.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3af165ac
    • S
      GFS2: Remove ancient, unused code · 2e204703
      Steven Whitehouse 提交于
      Remove code that used to have something to do with initrd
      but has been unused for a long time.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2e204703
    • S
      GFS2: Move four functions from super.c · 2bfb6449
      Steven Whitehouse 提交于
      The functions which are being moved can all be marked
      static in their new locations, since they only have
      a single caller each. Their new locations are more
      logical than before and some of the functions are
      small enough that the compiler might well inline them.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2bfb6449
    • S
      GFS2: Fix bug in gfs2_lock_fs_check_clean() · b5289681
      Steven Whitehouse 提交于
      gfs2_lock_fs_check_clean() should not be calling gfs2_jindex_hold()
      since it doesn't work like rindex hold, despite the comment. That
      allows gfs2_jindex_hold() to be moved into ops_fstype.c where it
      can be made static.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b5289681
    • S
      GFS2: Send some sensible sysfs stuff · fdd1062e
      Steven Whitehouse 提交于
      We ought to inform the user of the locktable and lockproto for each
      uevent we generate.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fdd1062e
    • S
      GFS2: Kill two daemons with one patch · 97cc1025
      Steven Whitehouse 提交于
      This patch removes the two daemons, gfs2_scand and gfs2_glockd
      and replaces them with a shrinker which is called from the VM.
      
      The net result is that GFS2 responds better when there is memory
      pressure, since it shrinks the glock cache at the same rate
      as the VFS shrinks the dcache and icache. There are no longer
      any time based criteria for shrinking glocks, they are kept
      until such time as the VM asks for more memory and then we
      demote just as many glocks as required.
      
      There are potential future changes to this code, including the
      possibility of sorting the glocks which are to be written back
      into inode number order, to get a better I/O ordering. It would
      be very useful to have an elevator based workqueue implementation
      for this, as that would automatically deal with the read I/O cases
      at the same time.
      
      This patch is my answer to Andrew Morton's remark, made during
      the initial review of GFS2, asking why GFS2 needs so many kernel
      threads, the answer being that it doesn't :-) This patch is a
      net loss of about 200 lines of code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      97cc1025
    • S
      GFS2: Move gfs2_recoverd into recovery.c · 9ac1b4d9
      Steven Whitehouse 提交于
      By moving gfs2_recoverd, we can make an additional function static
      and it also leaves only (the already scheduled for removal) gfs2_glockd
      in daemon.c.
      
      At the same time the declaration of gfs2_quotad is moved to quota.h
      to reflect the new location of gfs2_quotad in a previous patch. Also
      the recovery.h and quota.h headers are cleaned up.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9ac1b4d9
    • S
      GFS2: Fix "truncate in progress" hang · 813e0c46
      Steven Whitehouse 提交于
      Following on from the recent clean up of gfs2_quotad, this patch moves
      the processing of "truncate in progress" inodes from the glock workqueue
      into gfs2_quotad. This fixes a hang due to the "truncate in progress"
      processing requiring glocks in order to complete.
      
      It might seem odd to use gfs2_quotad for this particular item, but
      we have to use a pre-existing thread since creating a thread implies
      a GFP_KERNEL memory allocation which is not allowed from the glock
      workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd
      may deadlock if used for this operation. gfs2_scand and gfs2_glockd are
      both scheduled for removal at some (hopefully not too distant) future
      point. That leaves only gfs2_quotad whose workload is generally fairly
      light and is easily adapted for this extra task.
      
      Also, as a result of this change, it opens the way for a future patch to
      make the reading of the inode's information asynchronous with respect to
      the glock workqueue, which is another improvement that has been on the list
      for some time now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      813e0c46
    • S
      GFS2: Clean up & move gfs2_quotad · 37b2c837
      Steven Whitehouse 提交于
      This patch is a clean up of gfs2_quotad prior to giving it an
      extra job to do in addition to the current portfolio of updating
      the quota and statfs information from time to time.
      
      As a result it has been moved into quota.c allowing one of the
      functions it calls to be made static. Also the clean up allows
      the two existing functions to have separate timeouts and also
      to coexist with its future role of dealing with the "truncate in
      progress" inode flag.
      
      The (pointless) setting of gfs2_quotad_secs is removed since we
      arrange to only wake up quotad when one of the two timers expires.
      
      In addition the struct gfs2_quota_data is moved into a slab cache,
      mainly for easier debugging. It should also be possible to use
      a shrinker in the future, rather than the current scheme of scanning
      the quota data entries from time to time.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      37b2c837
    • S
      GFS2: Add more detail to debugfs glock dumps · fa75cedc
      Steven Whitehouse 提交于
      Although the glock dumps print quite a lot of information about
      the glocks themselves, there are more things which can be
      usefully added to the dump realting to the objects themselves.
      
      This patch adds a few more fields to the inode and resource
      group lines, which should be useful for debugging.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fa75cedc
    • S
      GFS2: Banish struct gfs2_rgrpd_host · 73f74948
      Steven Whitehouse 提交于
      This patch moves the final field so that we can get rid
      of struct gfs2_rgrpd_host, as promised some time ago. Also
      by rearranging the fields slightly, we are able to reduce
      the size of the gfs2_rgrpd structure at the same time.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      73f74948
    • S
      GFS2: Move rg_free from gfs2_rgrpd_host to gfs2_rgrpd · cfc8b549
      Steven Whitehouse 提交于
      The second of three fields which need to move, in order
      to remove the struct gfs2_rgrpd_host.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      cfc8b549
    • S
      GFS2: Move rg_igeneration into struct gfs2_rgrpd · d8b71f73
      Steven Whitehouse 提交于
      This moves one of the fields of struct gfs2_rgrpd_host into
      the struct gfs2_rgrpd with the eventual aim of removing
      the struct rgrpd_host completely.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d8b71f73
    • S
      GFS2: Banish struct gfs2_dinode_host · 383f01fb
      Steven Whitehouse 提交于
      The final field in gfs2_dinode_host was the i_flags field. Thats
      renamed to i_diskflags in order to avoid confusion with the existing
      inode flags, and moved into the inode proper at a suitable location
      to avoid creating a "hole".
      
      At that point struct gfs2_dinode_host is no longer needed and as
      promised (quite some time ago!) it can now be removed completely.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      383f01fb
    • S
      GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize · c9e98886
      Steven Whitehouse 提交于
      This patch moved the i_size field from the gfs2_dinode_host and
      following the ext3 convention renames it i_disksize.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c9e98886
    • S
      GFS2: Move di_eattr into "proper" inode · 3767ac21
      Steven Whitehouse 提交于
      This moves the di_eattr field out of gfs2_inode_host and
      into the inode proper.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3767ac21
    • S
      GFS2: Move "entries" into "proper" inode · ad6203f2
      Steven Whitehouse 提交于
      This moves the directory entry count into the proper inode.
      Potentially we could get this to share the space used by
      something else in the future, but this is one more step
      on the way to removing the gfs2_dinode_host structure.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ad6203f2
    • S
      GFS2: Move generation number into "proper" part of inode · bcf0b5b3
      Steven Whitehouse 提交于
      This moves the generation number from the gfs2_dinode_host
      into the gfs2_inode structure. Eventually the plan is to get
      rid of the gfs2_dinode_host structure completely.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bcf0b5b3
    • H
      GFS2: sparse annotation of gl->gl_spin · 55ba474d
      Harvey Harrison 提交于
      fs/gfs2/glock.c:308:5: warning: context problem in 'do_promote': '_spin_unlock' expected different context
      fs/gfs2/glock.c:308:5:    context '*gl+28': wanted >= 1, got 0
      fs/gfs2/glock.c:529:2: warning: context problem in 'do_xmote': '_spin_unlock' expected different context
      fs/gfs2/glock.c:529:2:    context '*gl+28': wanted >= 1, got 0
      fs/gfs2/glock.c:925:3: warning: context problem in 'add_to_queue': '_spin_unlock' expected different context
      fs/gfs2/glock.c:925:3:    context '*gl+28': wanted >= 1, got 0
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      55ba474d
    • S
      GFS2: Fix up jdata writepage/delete_inode · 1bb7322f
      Steven Whitehouse 提交于
      There is a bug in writepage and delete_inode which allows jdata files to
      invalidate pages from the address space without being in a transaction at
      the time. This causes problems in case the pages are in the journal. This
      patch fixes that case and prevents the resulting oops.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1bb7322f
    • S
      GFS2: Rationalise header files · b2760583
      Steven Whitehouse 提交于
      Move the contents of some headers which contained very
      little into more sensible places, and remove the original
      header files. This should make it easier to find things.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b2760583
    • S
      GFS2: Support for FIEMAP ioctl · e9079cce
      Steven Whitehouse 提交于
      This patch implements the FIEMAP ioctl for GFS2. We can use the generic
      code (aside from a lock order issue, solved as per Ted Tso's suggestion)
      for which I've introduced a new variant of the generic function. We also
      have one exception to deal with, namely stuffed files, so we do that
      "by hand", setting all the required flags.
      
      This has been tested with a modified (I could only find an old version) of
      Eric's test program, and appears to work correctly.
      
      This patch does not currently support FIEMAP of xattrs, but the plan is to add
      that feature at some future point.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Eric Sandeen <sandeen@redhat.com>
      e9079cce
    • L
      Merge branch 'audit.b61' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current · fe0bdec6
      Linus Torvalds 提交于
      * 'audit.b61' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
        audit: validate comparison operations, store them in sane form
        clean up audit_rule_{add,del} a bit
        make sure that filterkey of task,always rules is reported
        audit rules ordering, part 2
        fixing audit rule ordering mess, part 1
        audit_update_lsm_rules() misses the audit_inode_hash[] ones
        sanitize audit_log_capset()
        sanitize audit_fd_pair()
        sanitize audit_mq_open()
        sanitize AUDIT_MQ_SENDRECV
        sanitize audit_mq_notify()
        sanitize audit_mq_getsetattr()
        sanitize audit_ipc_set_perm()
        sanitize audit_ipc_obj()
        sanitize audit_socketcall
        don't reallocate buffer in every audit_sockaddr()
      fe0bdec6
    • A
      rtc: add alarm/update irq interfaces · 099e6576
      Alessandro Zummo 提交于
      Add standard interfaces for alarm/update irqs enabling.  Drivers are no
      more required to implement equivalent ioctl code as rtc-dev will provide
      it.
      
      UIE emulation should now be handled correctly and will work even for those
      RTC drivers who cannot be configured to do both UIE and AIE.
      Signed-off-by: NAlessandro Zummo <a.zummo@towertech.it>
      Cc: David Brownell <david-b@pacbell.net>
      Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      099e6576
    • N
      fs: symlink write_begin allocation context fix · 54566b2c
      Nick Piggin 提交于
      With the write_begin/write_end aops, page_symlink was broken because it
      could no longer pass a GFP_NOFS type mask into the point where the
      allocations happened.  They are done in write_begin, which would always
      assume that the filesystem can be entered from reclaim.  This bug could
      cause filesystem deadlocks.
      
      The funny thing with having a gfp_t mask there is that it doesn't really
      allow the caller to arbitrarily tinker with the context in which it can be
      called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
      take the page lock.  The only thing any callers care about is __GFP_FS
      anyway, so turn that into a single flag.
      
      Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
      this flag in their write_begin function.  Change __grab_cache_page to
      accept a nofs argument as well, to honour that flag (while we're there,
      change the name to grab_cache_page_write_begin which is more instructive
      and does away with random leading underscores).
      
      This is really a more flexible way to go in the end anyway -- if a
      filesystem happens to want any extra allocations aside from the pagecache
      ones in ints write_begin function, it may now use GFP_KERNEL (rather than
      GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
      random example).
      
      [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
      [kosaki.motohiro@jp.fujitsu.com: fix fuse]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Cleaned up the calling convention: just pass in the AOP flags
        untouched to the grab_cache_page_write_begin() function.  That
        just simplifies everybody, and may even allow future expansion of the
        logic.   - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54566b2c
    • B
      viafb: fix crashes due to 4k stack overflow · e687d691
      Bruno Prémont 提交于
      The function viafb_cursor() uses 2 stack-variables of CURSOR_SIZE bits;
      CURSOR_SIZE is defined as (8 * 1024).  Using up twice 1k on stack is too
      much for 4k-stack (though it works with 8k-stacks).  Make those two
      variables kzalloc'ed to preserve stack space.
      
      Also merge the whole lot of local struct's in viafb_ioctl into a union so
      the stack usage gets minimized here as well.  (struct's are only accessed
      in their indicidual IOCTL case) This second part is only compile-tested as
      I know of no userspace app using the IOCTLs.
      Signed-off-by: NBruno Prémont <bonbons@linux-vserver.org>
      Cc: <JosephChan@via.com.tw>
      Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e687d691
    • P
      fs: introduce bgl_lock_ptr() · c644f0e4
      Pekka Enberg 提交于
      As suggested by Andreas Dilger, introduce a bgl_lock_ptr() helper in
      <linux/blockgroup_lock.h> and add separate sb_bgl_lock() helpers to
      filesystem specific header files to break the hidden dependency to
      struct ext[234]_sb_info.
      
      Also, while at it, convert the macros to static inlines to try make up
      for all the times I broke Andrew Morton's tree.
      Acked-by: NAndreas Dilger <adilger@sun.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c644f0e4
    • R
      spi.h uses/needs device.h · 0a30c5ce
      Randy Dunlap 提交于
      Include header files as used/needed:
      
        In file included from drivers/leds/leds-dac124s085.c:16:
        include/linux/spi/spi.h:66: error: field 'dev' has incomplete type
        include/linux/spi/spi.h: In function 'to_spi_device':
        include/linux/spi/spi.h:100: warning: type defaults to 'int' in declaration of '__mptr'
        ...
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: David Brownell <dbrownell@users.sourceforge.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a30c5ce
    • A
      vmalloc.c: fix flushing in vmap_page_range() · 2e4e27c7
      Adam Lackorzynski 提交于
      The flush_cache_vmap in vmap_page_range() is called with the end of the
      range twice.  The following patch fixes this for me.
      Signed-off-by: NAdam Lackorzynski <adam@os.inf.tu-dresden.de>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e4e27c7
    • L
      cgroups: fix a race between cgroup_clone and umount · 7b574b7b
      Li Zefan 提交于
      The race is calling cgroup_clone() while umounting the ns cgroup subsys,
      and thus cgroup_clone() might access invalid cgroup_fs, or kill_sb() is
      called after cgroup_clone() created a new dir in it.
      
      The BUG I triggered is BUG_ON(root->number_of_cgroups != 1);
      
        ------------[ cut here ]------------
        kernel BUG at kernel/cgroup.c:1093!
        invalid opcode: 0000 [#1] SMP
        ...
        Process umount (pid: 5177, ti=e411e000 task=e40c4670 task.ti=e411e000)
        ...
        Call Trace:
         [<c0493df7>] ? deactivate_super+0x3f/0x51
         [<c04a3600>] ? mntput_no_expire+0xb3/0xdd
         [<c04a3ab2>] ? sys_umount+0x265/0x2ac
         [<c04a3b06>] ? sys_oldumount+0xd/0xf
         [<c0403911>] ? sysenter_do_call+0x12/0x31
        ...
        EIP: [<c0456e76>] cgroup_kill_sb+0x23/0xe0 SS:ESP 0068:e411ef2c
        ---[ end trace c766c1be3bf944ac ]---
      
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: "Serge E. Hallyn" <serue@us.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b574b7b
    • A
      audit: validate comparison operations, store them in sane form · 5af75d8d
      Al Viro 提交于
      Don't store the field->op in the messy (and very inconvenient for e.g.
      audit_comparator()) form; translate to dense set of values and do full
      validation of userland-submitted value while we are at it.
      
      ->audit_init_rule() and ->audit_match_rule() get new values now; in-tree
      instances updated.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5af75d8d
    • A
      clean up audit_rule_{add,del} a bit · 36c4f1b1
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      36c4f1b1
    • A
      e048e02c
    • A
      audit rules ordering, part 2 · e45aa212
      Al Viro 提交于
      Fix the actual rule listing; add per-type lists _not_ used for matching,
      with all exit,... sitting on one such list.  Simplifies "do something
      for all rules" logics, while we are at it...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e45aa212
    • A
      fixing audit rule ordering mess, part 1 · 0590b933
      Al Viro 提交于
      Problem: ordering between the rules on exit chain is currently lost;
      all watch and inode rules are listed after everything else _and_
      exit,never on one kind doesn't stop exit,always on another from
      being matched.
      
      Solution: assign priorities to rules, keep track of the current
      highest-priority matching rule and its result (always/never).
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0590b933
    • A
      1a9d0797
    • A
      sanitize audit_log_capset() · 57f71a0a
      Al Viro 提交于
      * no allocations
      * return void
      * don't duplicate checked for dummy context
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      57f71a0a
    • A
      sanitize audit_fd_pair() · 157cf649
      Al Viro 提交于
      * no allocations
      * return void
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      157cf649
    • A
      sanitize audit_mq_open() · 564f6993
      Al Viro 提交于
      * don't bother with allocations
      * don't do double copy_from_user()
      * don't duplicate parts of check for audit_dummy_context()
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      564f6993