1. 18 9月, 2010 1 次提交
  2. 07 9月, 2010 1 次提交
  3. 05 9月, 2010 1 次提交
    • M
      cgroups: fix API thinko · 73457f0f
      Michael S. Tsirkin 提交于
      cgroup_attach_task_current_cg API that have upstream is backwards: we
      really need an API to attach to the cgroups from another process A to
      the current one.
      
      In our case (vhost), a priveledged user wants to attach it's task to cgroups
      from a less priveledged one, the API makes us run it in the other
      task's context, and this fails.
      
      So let's make the API generic and just pass in 'from' and 'to' tasks.
      Add an inline wrapper for cgroup_attach_task_current_cg to avoid
      breaking bisect.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      73457f0f
  4. 04 9月, 2010 1 次提交
  5. 01 9月, 2010 1 次提交
  6. 29 8月, 2010 1 次提交
  7. 28 8月, 2010 1 次提交
  8. 27 8月, 2010 1 次提交
  9. 25 8月, 2010 4 次提交
    • T
      workqueue: fix cwq->nr_active underflow · 8a2e8e5d
      Tejun Heo 提交于
      cwq->nr_active is used to keep track of how many work items are active
      for the cpu workqueue, where 'active' is defined as either pending on
      global worklist or executing.  This is used to implement the
      max_active limit and workqueue freezing.  If a work item is queued
      after nr_active has already reached max_active, the work item doesn't
      increment nr_active and is put on the delayed queue and gets activated
      later as previous active work items retire.
      
      try_to_grab_pending() which is used in the cancellation path
      unconditionally decremented nr_active whether the work item being
      cancelled is currently active or delayed, so cancelling a delayed work
      item makes nr_active underflow.  This breaks max_active enforcement
      and triggers BUG_ON() in destroy_workqueue() later on.
      
      This patch fixes this bug by adding a flag WORK_STRUCT_DELAYED, which
      is set while a work item in on the delayed list and making
      try_to_grab_pending() decrement nr_active iff the work item is
      currently active.
      
      The addition of the flag enlarges cwq alignment to 256 bytes which is
      getting a bit too large.  It's scheduled to be reduced back to 128
      bytes by merging WORK_STRUCT_PENDING and WORK_STRUCT_CWQ in the next
      devel cycle.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJohannes Berg <johannes@sipsolutions.net>
      8a2e8e5d
    • R
      ACPI/PCI: Negotiate _OSC control bits before requesting them · 75fb60f2
      Rafael J. Wysocki 提交于
      It is possible that the BIOS will not grant control of all _OSC
      features requested via acpi_pci_osc_control_set(), so it is
      recommended to negotiate the final set of _OSC features with the
      query flag set before calling _OSC to request control of these
      features.
      
      To implement it, rework acpi_pci_osc_control_set() so that the caller
      can specify the mask of _OSC control bits to negotiate and the mask
      of _OSC control bits that are absolutely necessary to it.  Then,
      acpi_pci_osc_control_set() will run _OSC queries in a loop until
      the mask of _OSC control bits returned by the BIOS is equal to the
      mask passed to it.  Also, before running the _OSC request
      acpi_pci_osc_control_set() will check if the caller's required
      control bits are present in the final mask.
      
      Using this mechanism we will be able to avoid situations in which the
      BIOS doesn't grant control of certain _OSC features, because they
      depend on some other _OSC features that have not been requested.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      75fb60f2
    • L
      guard page for stacks that grow upwards · 8ca3eb08
      Luck, Tony 提交于
      pa-risc and ia64 have stacks that grow upwards. Check that
      they do not run into other mappings. By making VM_GROWSUP
      0x0 on architectures that do not ever use it, we can avoid
      some unpleasant #ifdefs in check_stack_guard_page().
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ca3eb08
    • T
      workqueue: improve destroy_workqueue() debuggability · e41e704b
      Tejun Heo 提交于
      Now that the worklist is global, having works pending after wq
      destruction can easily lead to oops and destroy_workqueue() have
      several BUG_ON()s to catch these cases.  Unfortunately, BUG_ON()
      doesn't tell much about how the work became pending after the final
      flush_workqueue().
      
      This patch adds WQ_DYING which is set before the final flush begins.
      If a work is requested to be queued on a dying workqueue,
      WARN_ON_ONCE() is triggered and the request is ignored.  This clearly
      indicates which caller is trying to queue a work on a dying workqueue
      and keeps the system working in most cases.
      
      Locking rule comment is updated such that the 'I' rule includes
      modifying the field from destruction path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e41e704b
  10. 24 8月, 2010 2 次提交
  11. 23 8月, 2010 2 次提交
    • C
      header: fix broken headers for user space · 09cd2b99
      Changli Gao 提交于
      __packed is only defined in kernel space, so we should use
      __attribute__((packed)) for the code shared between kernel and user space.
      
      Two __attribute() annotations are replaced with __attribute__() too.
      Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09cd2b99
    • E
      fanotify: flush outstanding perm requests on group destroy · 2eebf582
      Eric Paris 提交于
      When an fanotify listener is closing it may cause a deadlock between the
      listener and the original task doing an fs operation.  If the original task
      is waiting for a permissions response it will be holding the srcu lock.  The
      listener cannot clean up and exit until after that srcu lock is syncronized.
      Thus deadlock.  The fix introduced here is to stop accepting new permissions
      events when a listener is shutting down and to grant permission for all
      outstanding events.  Thus the original task will eventually release the srcu
      lock and the listener can complete shutdown.
      Reported-by: NAndreas Gruenbacher <agruen@suse.de>
      Cc: Andreas Gruenbacher <agruen@suse.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      2eebf582
  12. 21 8月, 2010 5 次提交
  13. 20 8月, 2010 1 次提交
  14. 19 8月, 2010 2 次提交
  15. 18 8月, 2010 10 次提交
    • N
      fs: scale files_lock · 6416ccb7
      Nick Piggin 提交于
      fs: scale files_lock
      
      Improve scalability of files_lock by adding per-cpu, per-sb files lists,
      protected with an lglock. The lglock provides fast access to the per-cpu lists
      to add and remove files. It also provides a snapshot of all the per-cpu lists
      (although this is very slow).
      
      One difficulty with this approach is that a file can be removed from the list
      by another CPU. We must track which per-cpu list the file is on with a new
      variale in the file struct (packed into a hole on 64-bit archs). Scalability
      could suffer if files are frequently removed from different cpu's list.
      
      However loads with frequent removal of files imply short interval between
      adding and removing the files, and the scheduler attempts to avoid moving
      processes too far away. Also, even in the case of cross-CPU removal, the
      hardware has much more opportunity to parallelise cacheline transfers with N
      cachelines than with 1.
      
      A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
      degenerates to contending on a single lock, which is no worse than before. When
      more than one CPU are allocating files, even if they are always freed by
      different CPUs, there will be more parallelism than the single-lock case.
      
      Testing results:
      
      On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
      to remove the file, the number of times it is removed by the same CPU that
      added it, and the number of times it is removed by the same node that added it.
      
      Booting:    locks=  25049 cpu-hits=  23174 (92.5%) node-hits=  23945 (95.6%)
      kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
      dbench 64   locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)
      
      So a file is removed from the same CPU it was added by over 90% of the time.
      It remains within the same node 95% of the time.
      
      Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.
      
                      throughput
      2.6.34-rc2      24.5
      +patch          24.9
      
                      us      sys     idle    IO wait (in %)
      2.6.34-rc2      51.25   28.25   17.25   3.25
      +patch          53.75   18.5    19      8.75
      
      So significantly less CPU time spent in kernel code, higher idle time and
      slightly higher throughput.
      
      Single threaded performance difference was within the noise of microbenchmarks.
      That is not to say penalty does not exist, the code is larger and more memory
      accesses required so it will be slightly slower.
      
      Cc: linux-kernel@vger.kernel.org
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6416ccb7
    • N
      lglock: introduce special lglock and brlock spin locks · 2dc91abe
      Nick Piggin 提交于
      lglock: introduce special lglock and brlock spin locks
      
      This patch introduces "local-global" locks (lglocks). These can be used to:
      
      - Provide fast exclusive access to per-CPU data, with exclusive access to
        another CPU's data allowed but possibly subject to contention, and to provide
        very slow exclusive access to all per-CPU data.
      - Or to provide very fast and scalable read serialisation, and to provide
        very slow exclusive serialisation of data (not necessarily per-CPU data).
      
      Brlocks are also implemented as a short-hand notation for the latter use
      case.
      
      Thanks to Paul for local/global naming convention.
      
      Cc: linux-kernel@vger.kernel.org
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2dc91abe
    • N
      tty: fix fu_list abuse · d996b62a
      Nick Piggin 提交于
      tty: fix fu_list abuse
      
      tty code abuses fu_list, which causes a bug in remount,ro handling.
      
      If a tty device node is opened on a filesystem, then the last link to the inode
      removed, the filesystem will be allowed to be remounted readonly. This is
      because fs_may_remount_ro does not find the 0 link tty inode on the file sb
      list (because the tty code incorrectly removed it to use for its own purpose).
      This can result in a filesystem with errors after it is marked "clean".
      
      Taking idea from Christoph's initial patch, allocate a tty private struct
      at file->private_data and put our required list fields in there, linking
      file and tty. This makes tty nodes behave the same way as other device nodes
      and avoid meddling with the vfs, and avoids this bug.
      
      The error handling is not trivial in the tty code, so for this bugfix, I take
      the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
      This is not a problem because our allocator doesn't fail small allocs as a rule
      anyway. So proper error handling is left as an exercise for tty hackers.
      
      [ Arguably filesystem's device inode would ideally be divorced from the
      driver's pseudo inode when it is opened, but in practice it's not clear whether
      that will ever be worth implementing. ]
      
      Cc: linux-kernel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d996b62a
    • N
      fs: cleanup files_lock locking · ee2ffa0d
      Nick Piggin 提交于
      fs: cleanup files_lock locking
      
      Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
      manipulate the per-sb files list; unexport the files_lock spinlock.
      
      Cc: linux-kernel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ee2ffa0d
    • N
      fs: fs_struct rwlock to spinlock · 2a4419b5
      Nick Piggin 提交于
      fs: fs_struct rwlock to spinlock
      
      struct fs_struct.lock is an rwlock with the read-side used to protect root and
      pwd members while taking references to them. Taking a reference to a path
      typically requires just 2 atomic ops, so the critical section is very small.
      Parallel read-side operations would have cacheline contention on the lock, the
      dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a
      real parallelism increase.
      
      Replace it with a spinlock to avoid one or two atomic operations in typical
      path lookup fastpath.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2a4419b5
    • C
      remove SWRITE* I/O types · 9cb569d6
      Christoph Hellwig 提交于
      These flags aren't real I/O types, but tell ll_rw_block to always
      lock the buffer instead of giving up on a failed trylock.
      
      Instead add a new write_dirty_buffer helper that implements this semantic
      and use it from the existing SWRITE* callers.  Note that the ll_rw_block
      code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
      this patch fixes.
      
      In the ufs code clean up the helper that used to call ll_rw_block
      to mirror sync_dirty_buffer, which is the function it implements for
      compound buffers.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9cb569d6
    • C
      kill BH_Ordered flag · 87e99511
      Christoph Hellwig 提交于
      Instead of abusing a buffer_head flag just add a variant of
      sync_dirty_buffer which allows passing the exact type of write
      flag required.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      87e99511
    • E
      spi.h: missing kernel-doc notation, please fix · 5c79a5ae
      Ernst Schwab 提交于
      Added comments in kernel-doc notation for previously added struct fields.
      Signed-off-by: NErnst Schwab <eschwab@online.de>
      Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      5c79a5ae
    • D
      Make do_execve() take a const filename pointer · d7627467
      David Howells 提交于
      Make do_execve() take a const filename pointer so that kernel_execve() compiles
      correctly on ARM:
      
      arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
      
      This also requires the argv and envp arguments to be consted twice, once for
      the pointer array and once for the strings the array points to.  This is
      because do_execve() passes a pointer to the filename (now const) to
      copy_strings_kernel().  A simpler alternative would be to cast the filename
      pointer in do_execve() when it's passed to copy_strings_kernel().
      
      do_execve() may not change any of the strings it is passed as part of the argv
      or envp lists as they are some of them in .rodata, so marking these strings as
      const should be fine.
      
      Further kernel_execve() and sys_execve() need to be changed to match.
      
      This has been test built on x86_64, frv, arm and mips.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7627467
    • R
      VIDEO: amba clcd: don't disable an already disabled clock · 99c796df
      Russell King 提交于
      Fix the clock enable/disable tracking in the AMBA CLCD driver so
      that the driver doesn't try to disable an already disabled clock,
      thereby causing the clock (if shared) to become unbalanced.
      
      This resolves a problem with CLCD on LPC32xx ARM platforms.
      Reported-by: NKevin Wells <wellsk40@gmail.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      99c796df
  16. 15 8月, 2010 2 次提交
  17. 14 8月, 2010 4 次提交