1. 08 11月, 2005 3 次提交
  2. 07 11月, 2005 2 次提交
  3. 31 10月, 2005 1 次提交
  4. 28 10月, 2005 1 次提交
    • A
      [PATCH] gfp_t: fs/* · 27496a8c
      Al Viro 提交于
       - ->releasepage() annotated (s/int/gfp_t), instances updated
       - missing gfp_t in fs/* added
       - fixed misannotation from the original sweep caught by bitwise checks:
         XFS used __nocast both for gfp_t and for flags used by XFS allocator.
         The latter left with unsigned int __nocast; we might want to add a
         different type for those but for now let's leave them alone.  That,
         BTW, is a case when __nocast use had been actively confusing - it had
         been used in the same code for two different and similar types, with
         no way to catch misuses.  Switch of gfp_t to bitwise had caught that
         immediately...
      
      One tricky bit is left alone to be dealt with later - mapping->flags is
      a mix of gfp_t and error indications.  Left alone for now.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      27496a8c
  5. 11 9月, 2005 1 次提交
  6. 10 9月, 2005 1 次提交
    • D
      [PATCH] files: files struct with RCU · ab2af1f5
      Dipankar Sarma 提交于
      Patch to eliminate struct files_struct.file_lock spinlock on the reader side
      and use rcu refcounting rcuref_xxx api for the f_count refcounter.  The
      updates to the fdtable are done by allocating a new fdtable structure and
      setting files->fdt to point to the new structure.  The fdtable structure is
      protected by RCU thereby allowing lock-free lookup.  For fd arrays/sets that
      are vmalloced, we use keventd to free them since RCU callbacks can't sleep.  A
      global list of fdtable to be freed is not scalable, so we use a per-cpu list.
      If keventd is already handling the current cpu's work, we use a timer to defer
      queueing of that work.
      
      Since the last publication, this patch has been re-written to avoid using
      explicit memory barriers and use rcu_assign_pointer(), rcu_dereference()
      premitives instead.  This required that the fd information is kept in a
      separate structure (fdtable) and updated atomically.
      Signed-off-by: NDipankar Sarma <dipankar@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab2af1f5
  7. 08 9月, 2005 4 次提交
  8. 20 8月, 2005 1 次提交
    • L
      Fix nasty ncpfs symlink handling bug. · cc314eef
      Linus Torvalds 提交于
      This bug could cause oopses and page state corruption, because ncpfs
      used the generic page-cache symlink handlign functions.  But those
      functions only work if the page cache is guaranteed to be "stable", ie a
      page that was installed when the symlink walk was started has to still
      be installed in the page cache at the end of the walk.
      
      We could have fixed ncpfs to not use the generic helper routines, but it
      is in many ways much cleaner to instead improve on the symlink walking
      helper routines so that they don't require that absolute stability.
      
      We do this by allowing "follow_link()" to return a error-pointer as a
      cookie, which is fed back to the cleanup "put_link()" routine.  This
      also simplifies NFS symlink handling.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cc314eef
  9. 28 7月, 2005 1 次提交
    • P
      [PATCH] stale POSIX lock handling · c293621b
      Peter Staubach 提交于
      I believe that there is a problem with the handling of POSIX locks, which
      the attached patch should address.
      
      The problem appears to be a race between fcntl(2) and close(2).  A
      multithreaded application could close a file descriptor at the same time as
      it is trying to acquire a lock using the same file descriptor.  I would
      suggest that that multithreaded application is not providing the proper
      synchronization for itself, but the OS should still behave correctly.
      
      SUS3 (Single UNIX Specification Version 3, read: POSIX) indicates that when
      a file descriptor is closed, that all POSIX locks on the file, owned by the
      process which closed the file descriptor, should be released.
      
      The trick here is when those locks are released.  The current code releases
      all locks which exist when close is processing, but any locks in progress
      are handled when the last reference to the open file is released.
      
      There are three cases to consider.
      
      One is the simple case, a multithreaded (mt) process has a file open and
      races to close it and acquire a lock on it.  In this case, the close will
      release one reference to the open file and when the fcntl is done, it will
      release the other reference.  For this situation, no locks should exist on
      the file when both the close and fcntl operations are done.  The current
      system will handle this case because the last reference to the open file is
      being released.
      
      The second case is when the mt process has dup(2)'d the file descriptor.
      The close will release one reference to the file and the fcntl, when done,
      will release another, but there will still be at least one more reference
      to the open file.  One could argue that the existence of a lock on the file
      after the close has completed is okay, because it was acquired after the
      close operation and there is still a way for the application to release the
      lock on the file, using an existing file descriptor.
      
      The third case is when the mt process has forked, after opening the file
      and either before or after becoming an mt process.  In this case, each
      process would hold a reference to the open file.  For each process, this
      degenerates to first case above.  However, the lock continues to exist
      until both processes have released their references to the open file.  This
      lock could block other lock requests.
      
      The changes to release the lock when the last reference to the open file
      aren't quite right because they would allow the lock to exist as long as
      there was a reference to the open file.  This is too long.
      
      The new proposed solution is to add support in the fcntl code path to
      detect a race with close and then to release the lock which was just
      acquired when such as race is detected.  This causes locks to be released
      in a timely fashion and for the system to conform to the POSIX semantic
      specification.
      
      This was tested by instrumenting a kernel to detect the handling locks and
      then running a program which generates case #3 above.  A dangling lock
      could be reliably generated.  When the changes to detect the close/fcntl
      race were added, a dangling lock could no longer be generated.
      
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c293621b
  10. 14 7月, 2005 1 次提交
    • A
      [PATCH] Fix soft lockup due to NTFS: VFS part and explanation · 88bd5121
      Anton Altaparmakov 提交于
      Something has changed in the core kernel such that we now get concurrent
      inode write outs, one e.g via pdflush and one via sys_sync or whatever.
      This causes a nasty deadlock in ntfs.  The only clean solution
      unfortunately requires a minor vfs api extension.
      
      First the deadlock analysis:
      
      Prerequisive knowledge: NTFS has a file $MFT (inode 0) loaded at mount
      time.  The NTFS driver uses the page cache for storing the file contents as
      usual.  More interestingly this file contains the table of on-disk inodes
      as a sequence of MFT_RECORDs.  Thus NTFS driver accesses the on-disk inodes
      by accessing the MFT_RECORDs in the page cache pages of the loaded inode
      $MFT.
      
      The situation: VFS inode X on a mounted ntfs volume is dirty.  For same
      inode X, the ntfs_inode is dirty and thus corresponding on-disk inode,
      which is as explained above in a dirty PAGE_CACHE_PAGE belonging to the
      table of inodes ($MFT, inode 0).
      
      What happens:
      
      Process 1: sys_sync()/umount()/whatever...  calls __sync_single_inode() for
      $MFT -> do_writepages() -> write_page for the dirty page containing the
      on-disk inode X, the page is now locked -> ntfs_write_mst_block() which
      clears PageUptodate() on the page to prevent anyone else getting hold of it
      whilst it does the write out (this is necessary as the on-disk inode needs
      "fixups" applied before the write to disk which are removed again after the
      write and PageUptodate is then set again).  It then analyses the page
      looking for dirty on-disk inodes and when it finds one it calls
      ntfs_may_write_mft_record() to see if it is safe to write this on-disk
      inode.  This then calls ilookup5() to check if the corresponding VFS inode
      is in icache().  This in turn calls ifind() which waits on the inode lock
      via wait_on_inode whilst holding the global inode_lock.
      
      Process 2: pdflush results in a call to __sync_single_inode for the same
      VFS inode X on the ntfs volume.  This locks the inode (I_LOCK) then calls
      write-inode -> ntfs_write_inode -> map_mft_record() -> read_cache_page() of
      the page (in page cache of table of inodes $MFT, inode 0) containing the
      on-disk inode.  This page has PageUptodate() clear because of Process 1
      (see above) so read_cache_page() blocks when tries to take the page lock
      for the page so it can call ntfs_read_page().
      
      Thus Process 1 is holding the page lock on the page containing the on-disk
      inode X and it is waiting on the inode X to be unlocked in ifind() so it
      can write the page out and then unlock the page.
      
      And Process 2 is holding the inode lock on inode X and is waiting for the
      page to be unlocked so it can call ntfs_readpage() or discover that
      Process 1 set PageUptodate() again and use the page.
      
      Thus we have a deadlock due to ifind() waiting on the inode lock.
      
      The only sensible solution: NTFS does not care whether the VFS inode is
      locked or not when it calls ilookup5() (it doesn't use the VFS inode at
      all, it just uses it to find the corresponding ntfs_inode which is of
      course attached to the VFS inode (both are one single struct); and it uses
      the ntfs_inode which is subject to its own locking so I_LOCK is irrelevant)
      hence we want a modified ilookup5_nowait() which is the same as ilookup5()
      but it does not wait on the inode lock.
      
      Without such functionality I would have to keep my own ntfs_inode cache in
      the NTFS driver just so I can find ntfs_inodes independent of their VFS
      inodes which would be slow, memory and cpu cycle wasting, and incredibly
      stupid given the icache already exists in the VFS.
      
      Below is a patch that does the ilookup5_nowait() implementation in
      fs/inode.c and exports it.
      
      ilookup5_nowait.diff:
      
      Introduce ilookup5_nowait() which is basically the same as ilookup5() but
      it does not wait on the inode's lock (i.e. it omits the wait_on_inode()
      done in ifind()).
      
      This is needed to avoid a nasty deadlock in NTFS.
      Signed-off-by: NAnton Altaparmakov <aia21@cantab.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      88bd5121
  11. 13 7月, 2005 1 次提交
    • R
      [PATCH] inotify · 0eeca283
      Robert Love 提交于
      inotify is intended to correct the deficiencies of dnotify, particularly
      its inability to scale and its terrible user interface:
      
              * dnotify requires the opening of one fd per each directory
                that you intend to watch. This quickly results in too many
                open files and pins removable media, preventing unmount.
              * dnotify is directory-based. You only learn about changes to
                directories. Sure, a change to a file in a directory affects
                the directory, but you are then forced to keep a cache of
                stat structures.
              * dnotify's interface to user-space is awful.  Signals?
      
      inotify provides a more usable, simple, powerful solution to file change
      notification:
      
              * inotify's interface is a system call that returns a fd, not SIGIO.
      	  You get a single fd, which is select()-able.
              * inotify has an event that says "the filesystem that the item
                you were watching is on was unmounted."
              * inotify can watch directories or files.
      
      Inotify is currently used by Beagle (a desktop search infrastructure),
      Gamin (a FAM replacement), and other projects.
      
      See Documentation/filesystems/inotify.txt.
      Signed-off-by: NRobert Love <rml@novell.com>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0eeca283
  12. 08 7月, 2005 1 次提交
    • M
      [PATCH] export generic_drop_inode() to modules · cb2c0233
      Mark Fasheh 提交于
      OCFS2 wants to mark an inode which has been orphaned by another node so
      that during final iput it takes the correct path through the VFS and can
      pass through the OCFS2 delete_inode callback.  Since i_nlink can get out of
      date with other nodes, the best way I see to accomplish this is by clearing
      i_nlink on those inodes at drop_inode time.  Other than this small amount
      of work, nothing different needs to happen, so I think it would be cleanest
      to be able to just call generic_drop_inode at the end of the OCFS2
      drop_inode callback.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cb2c0233
  13. 28 6月, 2005 1 次提交
    • J
      [PATCH] Update cfq io scheduler to time sliced design · 22e2c507
      Jens Axboe 提交于
      This updates the CFQ io scheduler to the new time sliced design (cfq
      v3).  It provides full process fairness, while giving excellent
      aggregate system throughput even for many competing processes.  It
      supports io priorities, either inherited from the cpu nice value or set
      directly with the ioprio_get/set syscalls.  The latter closely mimic
      set/getpriority.
      
      This import is based on my latest from -mm.
      Signed-off-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      22e2c507
  14. 24 6月, 2005 8 次提交
  15. 23 6月, 2005 1 次提交
  16. 21 6月, 2005 1 次提交
    • A
      [PATCH] libfs: add simple attribute files · acaefc25
      Arnd Bergmann 提交于
      Based on the discussion about spufs attributes, this is my suggestion
      for a more generic attribute file support that can be used by both
      debugfs and spufs.
      
      Simple attribute files behave similarly to sequential files from
      a kernel programmers perspective in that a standard set of file
      operations is provided and only an open operation needs to
      be written that registers file specific get() and set() functions.
      
      These operations are defined as
      
      void foo_set(void *data, u64 val); and
      u64 foo_get(void *data);
      
      where data is the inode->u.generic_ip pointer of the file and the
      operations just need to make send of that pointer. The infrastructure
      makes sure this works correctly with concurrent access and partial
      read calls.
      
      A macro named DEFINE_SIMPLE_ATTRIBUTE is provided to further simplify
      using the attributes.
      
      This patch already contains the changes for debugfs to use attributes
      for its internal file operations.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      acaefc25
  17. 06 5月, 2005 1 次提交
  18. 01 5月, 2005 2 次提交
    • M
      [PATCH] DocBook: fix some descriptions · 67be2dd1
      Martin Waitz 提交于
      Some KernelDoc descriptions are updated to match the current code.
      No code changes.
      Signed-off-by: NMartin Waitz <tali@admingilde.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      67be2dd1
    • P
      [PATCH] DocBook: changes and extensions to the kernel documentation · 4dc3b16b
      Pavel Pisa 提交于
      I have recompiled Linux kernel 2.6.11.5 documentation for me and our
      university students again.  The documentation could be extended for more
      sources which are equipped by structured comments for recent 2.6 kernels.  I
      have tried to proceed with that task.  I have done that more times from 2.6.0
      time and it gets boring to do same changes again and again.  Linux kernel
      compiles after changes for i386 and ARM targets.  I have added references to
      some more files into kernel-api book, I have added some section names as well.
       So please, check that changes do not break something and that categories are
      not too much skewed.
      
      I have changed kernel-doc to accept "fastcall" and "asmlinkage" words reserved
      by kernel convention.  Most of the other changes are modifications in the
      comments to make kernel-doc happy, accept some parameters description and do
      not bail out on errors.  Changed <pid> to @pid in the description, moved some
      #ifdef before comments to correct function to comments bindings, etc.
      
      You can see result of the modified documentation build at
        http://cmp.felk.cvut.cz/~pisa/linux/lkdb-2.6.11.tar.gz
      
      Some more sources are ready to be included into kernel-doc generated
      documentation.  Sources has been added into kernel-api for now.  Some more
      section names added and probably some more chaos introduced as result of quick
      cleanup work.
      Signed-off-by: NPavel Pisa <pisa@cmp.felk.cvut.cz>
      Signed-off-by: NMartin Waitz <tali@admingilde.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4dc3b16b
  19. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4