1. 16 7月, 2010 6 次提交
    • J
      rlimits: implement prlimit64 syscall · c022a0ac
      Jiri Slaby 提交于
      This patch adds the code to support the sys_prlimit64 syscall which
      modifies-and-returns the rlim values of a selected process atomically.
      The first parameter, pid, being 0 means current process.
      
      Unlike the current implementation, it is a generic interface,
      architecture indepentent so that we needn't handle compat stuff
      anymore. In the future, after glibc start to use this we can deprecate
      sys_setrlimit and sys_getrlimit in favor to clean up the code finally.
      
      It also adds a possibility of changing limits of other processes. We
      check the user's permissions to do that and if it succeeds, the new
      limits are propagated online. This is good for large scale
      applications such as SAP or databases where administrators need to
      change limits time by time (e.g. on crashes increase core size). And
      it is unacceptable to restart the service.
      
      For safety, all rlim users now either use accessors or doesn't need
      them due to
      - locking
      - the fact a process was just forked and nobody else knows about it
        yet (and nobody can't thus read/write limits)
      hence it is safe to modify limits now.
      
      The limitation is that we currently stay at ulong internal
      representation. So the rlim64_is_infinity check is used where value is
      compared against ULONG_MAX on 32-bit which is the maximum value there.
      
      And since internally the limits are held in struct rlimit, converters
      which are used before and after do_prlimit call in sys_prlimit64 are
      introduced.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      c022a0ac
    • J
      rlimits: redo do_setrlimit to more generic do_prlimit · 5b41535a
      Jiri Slaby 提交于
      It now allows also reading of limits. I.e. all read and writes will
      later use this function.
      
      It takes two parameters, new and old limits which can be both NULL.
      If new is non-NULL, the value in it is set to rlimits.
      If old is non-NULL, current rlimits are stored there.
      If both are non-NULL, old are stored prior to setting the new ones,
      atomically.
      (Similar to sigaction.)
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      5b41535a
    • J
      rlimits: add rlimit64 structure · 6a1d5e2c
      Jiri Slaby 提交于
      Add a platform independent structure for resource limits to use with
      a new prlimit64 syscall. This structure is the same which uses glibc
      for 64-bit limits.
      
      Also add corresponding infinity which is a 64-bit full of bit-ones.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      6a1d5e2c
    • J
      rlimits: split sys_setrlimit · 7855c35d
      Jiri Slaby 提交于
      Create do_setrlimit from sys_setrlimit and declare do_setrlimit
      in the resource header. This is the first phase to have generic
      do_prlimit which allows to be called from read, write and compat
      rlimits code.
      
      The new do_setrlimit also accepts a task pointer to change the limits
      of. Currently, it cannot be other than current, but this will change
      with locking later.
      
      Also pass tsk->group_leader to security_task_setrlimit to check
      whether current is allowed to change rlimits of the process and not
      its arbitrary thread because it makes more sense given that rlimit are
      per process and not per-thread.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      7855c35d
    • J
      rlimits: add task_struct to update_rlimit_cpu · 5ab46b34
      Jiri Slaby 提交于
      Add task_struct as a parameter to update_rlimit_cpu to be able to set
      rlimit_cpu of different task than current.
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Acked-by: NJames Morris <jmorris@namei.org>
      5ab46b34
    • J
      rlimits: security, add task_struct to setrlimit · 8fd00b4d
      Jiri Slaby 提交于
      Add task_struct to task_setrlimit of security_operations to be able to set
      rlimit of task other than current.
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      Acked-by: NJames Morris <jmorris@namei.org>
      8fd00b4d
  2. 14 7月, 2010 1 次提交
  3. 09 7月, 2010 1 次提交
  4. 07 7月, 2010 2 次提交
    • F
      drm/ttm: Allocate the page pool manager in the heap. · 5870a4d9
      Francisco Jerez 提交于
      Repeated ttm_page_alloc_init/fini fails noisily because the pool
      manager kobj isn't zeroed out between uses (we could do just that but
      statically allocated kobjects are generally considered a bad thing).
      Move it to kzalloc'ed memory.
      
      Note that this patch drops the refcounting behavior of the pool
      allocator init/fini functions: it would have led to a race condition
      in its current form, and anyway it was never exploited.
      
      This fixes a regression with reloading kms modules at runtime, since
      page allocator was introduced.
      Signed-off-by: NFrancisco Jerez <currojerez@riseup.net>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      5870a4d9
    • A
      VFS: introduce s_dirty accessors · 140236b4
      Artem Bityutskiy 提交于
      This patch introduces 3 VFS accessors: 'sb_mark_dirty()',
      'sb_mark_clean()', and 'sb_is_dirty()'. They simply
      set 'sb->s_dirt' or test 'sb->s_dirt'. The plan is to make
      every FS use these accessors later instead of manipulating
      the 'sb->s_dirt' flag directly.
      
      Ultimately, this change is a preparation for the periodic
      superblock synchronization optimization which is about
      preventing the "sync_supers" kernel thread from waking up
      even if there is nothing to synchronize.
      
      This patch does not do any functional change, just adds
      accessor functions.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      140236b4
  5. 06 7月, 2010 4 次提交
    • C
      writeback: simplify the write back thread queue · 83ba7b07
      Christoph Hellwig 提交于
      First remove items from work_list as soon as we start working on them.  This
      means we don't have to track any pending or visited state and can get
      rid of all the RCU magic freeing the work items - we can simply free
      them once the operation has finished.  Second use a real completion for
      tracking synchronous requests - if the caller sets the completion pointer
      we complete it, otherwise use it as a boolean indicator that we can free
      the work item directly.  Third unify struct wb_writeback_args and struct
      bdi_work into a single data structure, wb_writeback_work.  Previous we
      set all parameters into a struct wb_writeback_args, copied it into
      struct bdi_work, copied it again on the stack to use it there.  Instead
      of just allocate one structure dynamically or on the stack and use it
      all the way through the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      83ba7b07
    • C
      writeback: split writeback_inodes_wb · edadfb10
      Christoph Hellwig 提交于
      The case where we have a superblock doesn't require a loop here as we scan
      over all inodes in writeback_sb_inodes. Split it out into a separate helper
      to make the code simpler.  This also allows to get rid of the sb member in
      struct writeback_control, which was rather out of place there.
      
      Also update the comments in writeback_sb_inodes that explain the handling
      of inodes from wrong superblocks.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      edadfb10
    • C
      writeback: remove writeback_inodes_wbc · 9c3a8ee8
      Christoph Hellwig 提交于
      This was just an odd wrapper around writeback_inodes_wb.  Removing this
      also allows to get rid of the bdi member of struct writeback_control
      which was rather out of place there.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      9c3a8ee8
    • B
      net: Fix definition of netif_vdbg() when VERBOSE_DEBUG is defined · bcfcc450
      Ben Hutchings 提交于
      netif_vdbg() was originally defined as entirely equivalent to
      netdev_vdbg(), but I assume that it was intended to take the same
      parameters as netif_dbg() etc.  (Currently it is only used by the
      sfc driver, in which I worked on that assumption.)
      
      In commit a4ed89cb I changed the definition used when VERBOSE_DEBUG is
      not defined, but I failed to notice that the definition used when
      VERBOSE_DEBUG is defined was also not as I expected.  Change that to
      match netif_dbg() as well.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcfcc450
  6. 05 7月, 2010 2 次提交
    • P
      rbtree: Undo augmented trees performance damage and regression · b945d6b2
      Peter Zijlstra 提交于
      Reimplement augmented RB-trees without sprinkling extra branches
      all over the RB-tree code (which lives in the scheduler hot path).
      
      This approach is 'borrowed' from Fabio's BFQ implementation and
      relies on traversing the rebalance path after the RB-tree-op to
      correct the heap property for insertion/removal and make up for
      the damage done by the tree rotations.
      
      For insertion the rebalance path is trivially that from the new
      node upwards to the root, for removal it is that from the deepest
      node in the path from the to be removed node that will still
      be around after the removal.
      
      [ This patch also fixes a video driver regression reported by
        Ali Gholami Rudi - the memtype->subtree_max_end was updated
        incorrectly. ]
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NAli Gholami Rudi <ali@rudi.ir>
      Cc: Fabio Checconi <fabio@gandalf.sssup.it>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1275414172.27810.27961.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b945d6b2
    • Y
      module: initialize module dynamic debug later · ff49d74a
      Yehuda Sadeh 提交于
      We should initialize the module dynamic debug datastructures
      only after determining that the module is not loaded yet. This
      fixes a bug that introduced in 2.6.35-rc2, where when a trying
      to load a module twice, we also load it's dynamic printing data
      twice which causes all sorts of nasty issues. Also handle
      the dynamic debug cleanup later on failure.
      Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (removed a #ifdef)
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff49d74a
  7. 03 7月, 2010 3 次提交
  8. 02 7月, 2010 1 次提交
  9. 01 7月, 2010 4 次提交
  10. 30 6月, 2010 2 次提交
    • M
      compiler-gcc.h: gcc-4.5 needs noclone and noinline on __naked functions · 9c695203
      Mikael Pettersson 提交于
      A __naked function is defined in C but with a body completely implemented
      by asm(), including any prologue and epilogue.  These asm() bodies expect
      standard calling conventions for parameter passing.  Older GCCs implement
      that correctly, but 4.[56] currently do not, see GCC PR44290.  In the
      Linux kernel this breaks ARM, causing most arch/arm/mm/copypage-*.c
      modules to get miscompiled, resulting in kernel crashes during bootup.
      
      Part of the kernel fix is to augment the __naked function attribute to
      also imply noinline and noclone.  This patch implements that, and has been
      verified to fix boot failures with gcc-4.5 compiled 2.6.34 and 2.6.35-rc1
      kernels.  The patch is a no-op with older GCCs.
      Signed-off-by: NMikael Pettersson <mikpe@it.uu.se>
      Signed-off-by: NKhem Raj <raj.khem@gmail.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c695203
    • N
      fs: fix superblock iteration race · 57439f87
      npiggin@suse.de 提交于
      list_for_each_entry_safe is not suitable to protect against concurrent
      modification of the list. 6754af64 introduced a race in sb walking.
      
      list_for_each_entry can use the trick of pinning the current entry in
      the list before we drop and retake the lock because it subsequently
      follows cur->next. However list_for_each_entry_safe saves n=cur->next
      for following before entering the loop body, so when the lock is
      dropped, n may be deleted.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Frank Mayhar <fmayhar@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57439f87
  11. 29 6月, 2010 1 次提交
    • B
      ethtool: Fix potential user buffer overflow for ETHTOOL_{G, S}RXFH · bf988435
      Ben Hutchings 提交于
      struct ethtool_rxnfc was originally defined in 2.6.27 for the
      ETHTOOL_{G,S}RXFH command with only the cmd, flow_type and data
      fields.  It was then extended in 2.6.30 to support various additional
      commands.  These commands should have been defined to use a new
      structure, but it is too late to change that now.
      
      Since user-space may still be using the old structure definition
      for the ETHTOOL_{G,S}RXFH commands, and since they do not need the
      additional fields, only copy the originally defined fields to and
      from user-space.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Cc: stable@kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf988435
  12. 24 6月, 2010 1 次提交
  13. 22 6月, 2010 1 次提交
  14. 15 6月, 2010 1 次提交
  15. 14 6月, 2010 1 次提交
  16. 12 6月, 2010 4 次提交
  17. 11 6月, 2010 2 次提交
    • C
      writeback: simplify and split bdi_start_writeback · c5444198
      Christoph Hellwig 提交于
      bdi_start_writeback now never gets a superblock passed, so we can just remove
      that case.  And to further untangle the code and flatten the call stack
      split it into two trivial helpers for it's two callers.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      c5444198
    • J
      net: deliver skbs on inactive slaves to exact matches · 597a264b
      John Fastabend 提交于
      Currently, the accelerated receive path for VLAN's will
      drop packets if the real device is an inactive slave and
      is not one of the special pkts tested for in
      skb_bond_should_drop().  This behavior is different then
      the non-accelerated path and for pkts over a bonded vlan.
      
      For example,
      
      vlanx -> bond0 -> ethx
      
      will be dropped in the vlan path and not delivered to any
      packet handlers at all.  However,
      
      bond0 -> vlanx -> ethx
      
      and
      
      bond0 -> ethx
      
      will be delivered to handlers that match the exact dev,
      because the VLAN path checks the real_dev which is not a
      slave and netif_recv_skb() doesn't drop frames but only
      delivers them to exact matches.
      
      This patch adds a sk_buff flag which is used for tagging
      skbs that would previously been dropped and allows the
      skb to continue to skb_netif_recv().  Here we add
      logic to check for the deliver_no_wcard flag and if it
      is set only deliver to handlers that match exactly.  This
      makes both paths above consistent and gives pkt handlers
      a way to identify skbs that come from inactive slaves.
      Without this patch in some configurations skbs will be
      delivered to handlers with exact matches and in others
      be dropped out right in the vlan path.
      
      I have tested the following 4 configurations in failover modes
      and load balancing modes.
      
      # bond0 -> ethx
      
      # vlanx -> bond0 -> ethx
      
      # bond0 -> vlanx -> ethx
      
      # bond0 -> ethx
                  |
        vlanx -> --
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      597a264b
  18. 10 6月, 2010 1 次提交
  19. 09 6月, 2010 2 次提交
    • A
      misc: Fix allocation 'borrowed' by vhost_net · 79907d89
      Alan Cox 提交于
      10, 233 is allocated officially to /dev/kmview which is shipping in
      Ubuntu and Debian distributions.  vhost_net seem to have borrowed it
      without making a proper request and this causes regressions in the other
      distributions.
      
      vhost_net can use a dynamic minor so use that instead.  Also update the
      file with a comment to try and avoid future misunderstandings.
      
      cc: stable@kernel.org
      Signed-off-by: NAlan Cox <device@lanana.org>
      [ We should have caught this before 2.6.34 got released.  - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79907d89
    • D
      writeback: pay attention to wbc->nr_to_write in write_cache_pages · 0b564927
      Dave Chinner 提交于
      If a filesystem writes more than one page in ->writepage, write_cache_pages
      fails to notice this and continues to attempt writeback when wbc->nr_to_write
      has gone negative - this trace was captured from XFS:
      
          wbc_writeback_start: towrt=1024
          wbc_writepage: towrt=1024
          wbc_writepage: towrt=0
          wbc_writepage: towrt=-1
          wbc_writepage: towrt=-5
          wbc_writepage: towrt=-21
          wbc_writepage: towrt=-85
      
      This has adverse effects on filesystem writeback behaviour. write_cache_pages()
      needs to terminate after a certain number of pages are written, not after a
      certain number of calls to ->writepage are made.  This is a regression
      introduced by 17bc6c30 ("vfs: Add
      no_nrwrite_index_update writeback control flag"), but cannot be reverted
      directly due to subsequent bug fixes that have gone in on top of it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b564927