1. 25 7月, 2010 1 次提交
  2. 23 7月, 2010 1 次提交
    • H
      macvtap: Limit packet queue length · 8a35747a
      Herbert Xu 提交于
      Mark Wagner reported OOM symptoms when sending UDP traffic over
      a macvtap link to a kvm receiver.
      
      This appears to be caused by the fact that macvtap packet queues
      are unlimited in length.  This means that if the receiver can't
      keep up with the rate of flow, then we will hit OOM. Of course
      it gets worse if the OOM killer then decides to kill the receiver.
      
      This patch imposes a cap on the packet queue length, in the same
      way as the tuntap driver, using the device TX queue length.
      
      Please note that macvtap currently has no way of giving congestion
      notification, that means the software device TX queue cannot be
      used and packets will always be dropped once the macvtap driver
      queue fills up.
      
      This shouldn't be a great problem for the scenario where macvtap
      is used to feed a kvm receiver, as the traffic is most likely
      external in origin so congestion notification can't be applied
      anyway.
      
      Of course, if anybody decides to complain about guest-to-guest
      UDP packet loss down the track, then we may have to revisit this.
      
      Incidentally, this patch also fixes a real memory leak when
      macvtap_get_queue fails.
      
      Chris Wright noticed that for this patch to work, we need a
      non-zero TX queue length.  This patch includes his work to change
      the default macvtap TX queue length to 500.
      Reported-by: NMark Wagner <mwagner@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a35747a
  3. 22 7月, 2010 1 次提交
  4. 21 7月, 2010 2 次提交
  5. 20 7月, 2010 1 次提交
  6. 19 7月, 2010 1 次提交
    • D
      mm: add context argument to shrinker callback · 7f8275d0
      Dave Chinner 提交于
      The current shrinker implementation requires the registered callback
      to have global state to work from. This makes it difficult to shrink
      caches that are not global (e.g. per-filesystem caches). Pass the shrinker
      structure to the callback so that users can embed the shrinker structure
      in the context the shrinker needs to operate on and get back to it in the
      callback via container_of().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      7f8275d0
  7. 17 7月, 2010 2 次提交
  8. 16 7月, 2010 1 次提交
    • J
      jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions · 13ceef09
      Jan Kara 提交于
      OCFS2 uses t_commit trigger to compute and store checksum of the just
      committed blocks. When a buffer has b_frozen_data, checksum is computed
      for it instead of b_data but this can result in an old checksum being
      written to the filesystem in the following scenario:
      
      1) transaction1 is opened
      2) handle1 is opened
      3) journal_access(handle1, bh)
          - This sets jh->b_transaction to transaction1
      4) modify(bh)
      5) journal_dirty(handle1, bh)
      6) handle1 is closed
      7) start committing transaction1, opening transaction2
      8) handle2 is opened
      9) journal_access(handle2, bh)
          - This copies off b_frozen_data to make it safe for transaction1 to commit.
            jh->b_next_transaction is set to transaction2.
      10) jbd2_journal_write_metadata() checksums b_frozen_data
      11) the journal correctly writes b_frozen_data to the disk journal
      12) handle2 is closed
          - There was no dirty call for the bh on handle2, so it is never queued for
            any more journal operation
      13) Checkpointing finally happens, and it just spools the bh via normal buffer
      writeback.  This will write b_data, which was never triggered on and thus
      contains a wrong (old) checksum.
      
      This patch fixes the problem by calling the trigger at the moment data is
      frozen for journal commit - i.e., either when b_frozen_data is created by
      do_get_write_access or just before we write a buffer to the log if
      b_frozen_data does not exist. We also rename the trigger to t_frozen as
      that better describes when it is called.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      13ceef09
  9. 14 7月, 2010 1 次提交
  10. 10 7月, 2010 1 次提交
  11. 07 7月, 2010 1 次提交
    • A
      VFS: introduce s_dirty accessors · 140236b4
      Artem Bityutskiy 提交于
      This patch introduces 3 VFS accessors: 'sb_mark_dirty()',
      'sb_mark_clean()', and 'sb_is_dirty()'. They simply
      set 'sb->s_dirt' or test 'sb->s_dirt'. The plan is to make
      every FS use these accessors later instead of manipulating
      the 'sb->s_dirt' flag directly.
      
      Ultimately, this change is a preparation for the periodic
      superblock synchronization optimization which is about
      preventing the "sync_supers" kernel thread from waking up
      even if there is nothing to synchronize.
      
      This patch does not do any functional change, just adds
      accessor functions.
      Signed-off-by: NArtem Bityutskiy <Artem.Bityutskiy@nokia.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      140236b4
  12. 06 7月, 2010 4 次提交
    • C
      writeback: simplify the write back thread queue · 83ba7b07
      Christoph Hellwig 提交于
      First remove items from work_list as soon as we start working on them.  This
      means we don't have to track any pending or visited state and can get
      rid of all the RCU magic freeing the work items - we can simply free
      them once the operation has finished.  Second use a real completion for
      tracking synchronous requests - if the caller sets the completion pointer
      we complete it, otherwise use it as a boolean indicator that we can free
      the work item directly.  Third unify struct wb_writeback_args and struct
      bdi_work into a single data structure, wb_writeback_work.  Previous we
      set all parameters into a struct wb_writeback_args, copied it into
      struct bdi_work, copied it again on the stack to use it there.  Instead
      of just allocate one structure dynamically or on the stack and use it
      all the way through the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      83ba7b07
    • C
      writeback: split writeback_inodes_wb · edadfb10
      Christoph Hellwig 提交于
      The case where we have a superblock doesn't require a loop here as we scan
      over all inodes in writeback_sb_inodes. Split it out into a separate helper
      to make the code simpler.  This also allows to get rid of the sb member in
      struct writeback_control, which was rather out of place there.
      
      Also update the comments in writeback_sb_inodes that explain the handling
      of inodes from wrong superblocks.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      edadfb10
    • C
      writeback: remove writeback_inodes_wbc · 9c3a8ee8
      Christoph Hellwig 提交于
      This was just an odd wrapper around writeback_inodes_wb.  Removing this
      also allows to get rid of the bdi member of struct writeback_control
      which was rather out of place there.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      9c3a8ee8
    • B
      net: Fix definition of netif_vdbg() when VERBOSE_DEBUG is defined · bcfcc450
      Ben Hutchings 提交于
      netif_vdbg() was originally defined as entirely equivalent to
      netdev_vdbg(), but I assume that it was intended to take the same
      parameters as netif_dbg() etc.  (Currently it is only used by the
      sfc driver, in which I worked on that assumption.)
      
      In commit a4ed89cb I changed the definition used when VERBOSE_DEBUG is
      not defined, but I failed to notice that the definition used when
      VERBOSE_DEBUG is defined was also not as I expected.  Change that to
      match netif_dbg() as well.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcfcc450
  13. 05 7月, 2010 2 次提交
    • P
      rbtree: Undo augmented trees performance damage and regression · b945d6b2
      Peter Zijlstra 提交于
      Reimplement augmented RB-trees without sprinkling extra branches
      all over the RB-tree code (which lives in the scheduler hot path).
      
      This approach is 'borrowed' from Fabio's BFQ implementation and
      relies on traversing the rebalance path after the RB-tree-op to
      correct the heap property for insertion/removal and make up for
      the damage done by the tree rotations.
      
      For insertion the rebalance path is trivially that from the new
      node upwards to the root, for removal it is that from the deepest
      node in the path from the to be removed node that will still
      be around after the removal.
      
      [ This patch also fixes a video driver regression reported by
        Ali Gholami Rudi - the memtype->subtree_max_end was updated
        incorrectly. ]
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NAli Gholami Rudi <ali@rudi.ir>
      Cc: Fabio Checconi <fabio@gandalf.sssup.it>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1275414172.27810.27961.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b945d6b2
    • Y
      module: initialize module dynamic debug later · ff49d74a
      Yehuda Sadeh 提交于
      We should initialize the module dynamic debug datastructures
      only after determining that the module is not loaded yet. This
      fixes a bug that introduced in 2.6.35-rc2, where when a trying
      to load a module twice, we also load it's dynamic printing data
      twice which causes all sorts of nasty issues. Also handle
      the dynamic debug cleanup later on failure.
      Signed-off-by: NYehuda Sadeh <yehuda@hq.newdream.net>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (removed a #ifdef)
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff49d74a
  14. 03 7月, 2010 2 次提交
    • R
      linux/net.h: fix kernel-doc warnings · e2aec372
      Randy Dunlap 提交于
      Fix kernel-doc warnings in linux/net.h:
      
      Warning(include/linux/net.h:151): No description found for parameter 'wq'
      Warning(include/linux/net.h:151): Excess struct/union/enum/typedef member 'fasync_list' description in 'socket'
      Warning(include/linux/net.h:151): Excess struct/union/enum/typedef member 'wait' description in 'socket'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2aec372
    • J
      net: decreasing real_num_tx_queues needs to flush qdisc · f0796d5c
      John Fastabend 提交于
      Reducing real_num_queues needs to flush the qdisc otherwise
      skbs with queue_mappings greater then real_num_tx_queues can
      be sent to the underlying driver.
      
      The flow for this is,
      
      dev_queue_xmit()
      	dev_pick_tx()
      		skb_tx_hash()  => hash using real_num_tx_queues
      		skb_set_queue_mapping()
      	...
      	qdisc_enqueue_root() => enqueue skb on txq from hash
      ...
      dev->real_num_tx_queues -= n
      ...
      sch_direct_xmit()
      	dev_hard_start_xmit()
      		ndo_start_xmit(skb,dev) => skb queue set with old hash
      
      skbs are enqueued on the qdisc with skb->queue_mapping set
      0 < queue_mappings < real_num_tx_queues.  When the driver
      decreases real_num_tx_queues skb's may be dequeued from the
      qdisc with a queue_mapping greater then real_num_tx_queues.
      
      This fixes a case in ixgbe where this was occurring with DCB
      and FCoE. Because the driver is using queue_mapping to map
      skbs to tx descriptor rings we can potentially map skbs to
      rings that no longer exist.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Tested-by: NRoss Brattain <ross.b.brattain@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0796d5c
  15. 02 7月, 2010 1 次提交
  16. 01 7月, 2010 3 次提交
  17. 30 6月, 2010 3 次提交
    • F
      Input: i8042 - mark stubs in i8042.h "static inline" · c59690fa
      Feng Tang 提交于
      Otherwise we may run into following:
      
      drivers/platform/built-in.o: In function `i8042_lock_chip':
      /home/test/ws2/projects/linux-2.6/include/linux/i8042.h:50: multiple definition of `i8042_lock_chip'
      drivers/input/serio/built-in.o:/home/test/ws2/projects/linux-2.6/include/linux/i8042.h:50: first defined here
      ...
      make[1]: *** [drivers/built-in.o] Error 1
      make: *** [drivers] Error 2
      Signed-off-by: NFeng Tang <feng.tang@intel.com>
      Signed-off-by: NDmitry Torokhov <dtor@mail.ru>
      c59690fa
    • M
      compiler-gcc.h: gcc-4.5 needs noclone and noinline on __naked functions · 9c695203
      Mikael Pettersson 提交于
      A __naked function is defined in C but with a body completely implemented
      by asm(), including any prologue and epilogue.  These asm() bodies expect
      standard calling conventions for parameter passing.  Older GCCs implement
      that correctly, but 4.[56] currently do not, see GCC PR44290.  In the
      Linux kernel this breaks ARM, causing most arch/arm/mm/copypage-*.c
      modules to get miscompiled, resulting in kernel crashes during bootup.
      
      Part of the kernel fix is to augment the __naked function attribute to
      also imply noinline and noclone.  This patch implements that, and has been
      verified to fix boot failures with gcc-4.5 compiled 2.6.34 and 2.6.35-rc1
      kernels.  The patch is a no-op with older GCCs.
      Signed-off-by: NMikael Pettersson <mikpe@it.uu.se>
      Signed-off-by: NKhem Raj <raj.khem@gmail.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c695203
    • N
      fs: fix superblock iteration race · 57439f87
      npiggin@suse.de 提交于
      list_for_each_entry_safe is not suitable to protect against concurrent
      modification of the list. 6754af64 introduced a race in sb walking.
      
      list_for_each_entry can use the trick of pinning the current entry in
      the list before we drop and retake the lock because it subsequently
      follows cur->next. However list_for_each_entry_safe saves n=cur->next
      for following before entering the loop body, so when the lock is
      dropped, n may be deleted.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Frank Mayhar <fmayhar@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57439f87
  18. 29 6月, 2010 1 次提交
    • B
      ethtool: Fix potential user buffer overflow for ETHTOOL_{G, S}RXFH · bf988435
      Ben Hutchings 提交于
      struct ethtool_rxnfc was originally defined in 2.6.27 for the
      ETHTOOL_{G,S}RXFH command with only the cmd, flow_type and data
      fields.  It was then extended in 2.6.30 to support various additional
      commands.  These commands should have been defined to use a new
      structure, but it is too late to change that now.
      
      Since user-space may still be using the old structure definition
      for the ETHTOOL_{G,S}RXFH commands, and since they do not need the
      additional fields, only copy the originally defined fields to and
      from user-space.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Cc: stable@kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf988435
  19. 22 6月, 2010 1 次提交
  20. 15 6月, 2010 1 次提交
  21. 14 6月, 2010 1 次提交
  22. 11 6月, 2010 2 次提交
    • C
      writeback: simplify and split bdi_start_writeback · c5444198
      Christoph Hellwig 提交于
      bdi_start_writeback now never gets a superblock passed, so we can just remove
      that case.  And to further untangle the code and flatten the call stack
      split it into two trivial helpers for it's two callers.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      c5444198
    • J
      net: deliver skbs on inactive slaves to exact matches · 597a264b
      John Fastabend 提交于
      Currently, the accelerated receive path for VLAN's will
      drop packets if the real device is an inactive slave and
      is not one of the special pkts tested for in
      skb_bond_should_drop().  This behavior is different then
      the non-accelerated path and for pkts over a bonded vlan.
      
      For example,
      
      vlanx -> bond0 -> ethx
      
      will be dropped in the vlan path and not delivered to any
      packet handlers at all.  However,
      
      bond0 -> vlanx -> ethx
      
      and
      
      bond0 -> ethx
      
      will be delivered to handlers that match the exact dev,
      because the VLAN path checks the real_dev which is not a
      slave and netif_recv_skb() doesn't drop frames but only
      delivers them to exact matches.
      
      This patch adds a sk_buff flag which is used for tagging
      skbs that would previously been dropped and allows the
      skb to continue to skb_netif_recv().  Here we add
      logic to check for the deliver_no_wcard flag and if it
      is set only deliver to handlers that match exactly.  This
      makes both paths above consistent and gives pkt handlers
      a way to identify skbs that come from inactive slaves.
      Without this patch in some configurations skbs will be
      delivered to handlers with exact matches and in others
      be dropped out right in the vlan path.
      
      I have tested the following 4 configurations in failover modes
      and load balancing modes.
      
      # bond0 -> ethx
      
      # vlanx -> bond0 -> ethx
      
      # bond0 -> vlanx -> ethx
      
      # bond0 -> ethx
                  |
        vlanx -> --
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      597a264b
  23. 10 6月, 2010 1 次提交
  24. 09 6月, 2010 3 次提交
    • A
      misc: Fix allocation 'borrowed' by vhost_net · 79907d89
      Alan Cox 提交于
      10, 233 is allocated officially to /dev/kmview which is shipping in
      Ubuntu and Debian distributions.  vhost_net seem to have borrowed it
      without making a proper request and this causes regressions in the other
      distributions.
      
      vhost_net can use a dynamic minor so use that instead.  Also update the
      file with a comment to try and avoid future misunderstandings.
      
      cc: stable@kernel.org
      Signed-off-by: NAlan Cox <device@lanana.org>
      [ We should have caught this before 2.6.34 got released.  - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79907d89
    • D
      writeback: pay attention to wbc->nr_to_write in write_cache_pages · 0b564927
      Dave Chinner 提交于
      If a filesystem writes more than one page in ->writepage, write_cache_pages
      fails to notice this and continues to attempt writeback when wbc->nr_to_write
      has gone negative - this trace was captured from XFS:
      
          wbc_writeback_start: towrt=1024
          wbc_writepage: towrt=1024
          wbc_writepage: towrt=0
          wbc_writepage: towrt=-1
          wbc_writepage: towrt=-5
          wbc_writepage: towrt=-21
          wbc_writepage: towrt=-85
      
      This has adverse effects on filesystem writeback behaviour. write_cache_pages()
      needs to terminate after a certain number of pages are written, not after a
      certain number of calls to ->writepage are made.  This is a regression
      introduced by 17bc6c30 ("vfs: Add
      no_nrwrite_index_update writeback control flag"), but cannot be reverted
      directly due to subsequent bug fixes that have gone in on top of it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b564927
    • P
      sched: Fix PROVE_RCU vs cpu_cgroup · dc61b1d6
      Peter Zijlstra 提交于
      PROVE_RCU has a few issues with the cpu_cgroup because the scheduler
      typically holds rq->lock around the css rcu derefs but the generic
      cgroup code doesn't (and can't) know about that lock.
      
      Provide means to add extra checks to the css dereference and use that
      in the scheduler to annotate its users.
      
      The addition of rq->lock to these checks is correct because the
      cgroup_subsys::attach() method takes the rq->lock for each task it
      moves, therefore by holding that lock, we ensure the task is pinned to
      the current cgroup and the RCU derefence is valid.
      
      That leaves one genuine race in __sched_setscheduler() where we used
      task_group() without holding any of the required locks and thus raced
      with the cgroup code. Solve this by moving the check under the
      appropriate lock.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dc61b1d6
  25. 08 6月, 2010 1 次提交
  26. 05 6月, 2010 1 次提交