1. 02 3月, 2011 2 次提交
    • T
      block: add @force_kblockd to __blk_run_queue() · 1654e741
      Tejun Heo 提交于
      __blk_run_queue() automatically either calls q->request_fn() directly
      or schedules kblockd depending on whether the function is recursed.
      blk-flush implementation needs to be able to explicitly choose
      kblockd.  Add @force_kblockd.
      
      All the current users are converted to specify %false for the
      parameter and this patch doesn't introduce any behavior change.
      
      stable: This is prerequisite for fixing ide oops caused by the new
              blk-flush implementation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      1654e741
    • V
      blk-throttle: Do not use kblockd workqueue for throtl work · 450adcbe
      Vivek Goyal 提交于
      o Dominik Klein reported a system hang issue while doing some blkio
        throttling testing.
      
        https://lkml.org/lkml/2011/2/24/173
      
      o Some tracing revealed that CFQ was not dispatching any more jobs as
        queue unplug was not happening. And queue unplug was not happening
        because unplug work was not being called as there was one throttling
        work on same cpu which as not finished yet. And throttling work had not
        finished as it was tyring to dispatch a bio to CFQ but all the request
        descriptors were consume to it was put to sleep.
      
      o So basically it is a cyclic dependecny between CFQ unplug work and
        throtl dispatch work. Tejun suggested that use separate workqueue for
        such cases.
      
      o This patch uses a separate workqueue for throttle related work and
        does not rely on kblockd workqueue anymore.
      
      Cc: stable@kernel.org
      Reported-by: NDominik Klein <dk@in-telegence.net>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      450adcbe
  2. 01 3月, 2011 2 次提交
    • R
      ACPI: Fix build for CONFIG_NET unset · af06216a
      Rafael J. Wysocki 提交于
      Several ACPI drivers fail to build if CONFIG_NET is unset, because
      they refer to things depending on CONFIG_THERMAL that in turn depends
      on CONFIG_NET.  However, CONFIG_THERMAL doesn't really need to depend
      on CONFIG_NET, because the only part of it requiring CONFIG_NET is
      the netlink interface in thermal_sys.c.
      
      Put the netlink interface in thermal_sys.c under #ifdef CONFIG_NET
      and remove the dependency of CONFIG_THERMAL on CONFIG_NET from
      drivers/thermal/Kconfig.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Luming Yu <luming.yu@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af06216a
    • B
      mm: <asm-generic/pgtable.h> must include <linux/mm_types.h> · fbd71844
      Ben Hutchings 提交于
      Commit e2cda322 ("thp: add pmd mangling generic functions") replaced
      some macros in <asm-generic/pgtable.h> with inline functions.
      
      If the functions are to be defined (not all architectures need them)
      then struct vm_area_struct must be defined first.  So include
      <linux/mm_types.h>.
      
      Fixes a build failure seen in Debian:
      
          CC [M]  drivers/media/dvb/mantis/mantis_pci.o
        In file included from arch/arm/include/asm/pgtable.h:460,
                         from drivers/media/dvb/mantis/mantis_pci.c:25:
        include/asm-generic/pgtable.h: In function 'ptep_test_and_clear_young':
        include/asm-generic/pgtable.h:29: error: dereferencing pointer to incomplete type
      Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fbd71844
  3. 28 2月, 2011 1 次提交
  4. 26 2月, 2011 1 次提交
  5. 25 2月, 2011 1 次提交
  6. 24 2月, 2011 3 次提交
    • N
      Fix over-zealous flush_disk when changing device size. · 93b270f7
      NeilBrown 提交于
      There are two cases when we call flush_disk.
      In one, the device has disappeared (check_disk_change) so any
      data will hold becomes irrelevant.
      In the oter, the device has changed size (check_disk_size_change)
      so data we hold may be irrelevant.
      
      In both cases it makes sense to discard any 'clean' buffers,
      so they will be read back from the device if needed.
      
      In the former case it makes sense to discard 'dirty' buffers
      as there will never be anywhere safe to write the data.  In the
      second case it *does*not* make sense to discard dirty buffers
      as that will lead to file system corruption when you simply enlarge
      the containing devices.
      
      flush_disk calls __invalidate_devices.
      __invalidate_device calls both invalidate_inodes and invalidate_bdev.
      
      invalidate_inodes *does* discard I_DIRTY inodes and this does lead
      to fs corruption.
      
      invalidate_bev *does*not* discard dirty pages, but I don't really care
      about that at present.
      
      So this patch adds a flag to __invalidate_device (calling it
      __invalidate_device2) to indicate whether dirty buffers should be
      killed, and this is passed to invalidate_inodes which can choose to
      skip dirty inodes.
      
      flusk_disk then passes true from check_disk_change and false from
      check_disk_size_change.
      
      dm avoids tripping over this problem by calling i_size_write directly
      rathher than using check_disk_size_change.
      
      md does use check_disk_size_change and so is affected.
      
      This regression was introduced by commit 608aeef1 which causes
      check_disk_size_change to call flush_disk, so it is suitable for any
      kernel since 2.6.27.
      
      Cc: stable@kernel.org
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Andrew Patterson <andrew.patterson@hp.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      93b270f7
    • M
      mm: prevent concurrent unmap_mapping_range() on the same inode · 2aa15890
      Miklos Szeredi 提交于
      Michael Leun reported that running parallel opens on a fuse filesystem
      can trigger a "kernel BUG at mm/truncate.c:475"
      
      Gurudas Pai reported the same bug on NFS.
      
      The reason is, unmap_mapping_range() is not prepared for more than
      one concurrent invocation per inode.  For example:
      
        thread1: going through a big range, stops in the middle of a vma and
           stores the restart address in vm_truncate_count.
      
        thread2: comes in with a small (e.g. single page) unmap request on
           the same vma, somewhere before restart_address, finds that the
           vma was already unmapped up to the restart address and happily
           returns without doing anything.
      
      Another scenario would be two big unmap requests, both having to
      restart the unmapping and each one setting vm_truncate_count to its
      own value.  This could go on forever without any of them being able to
      finish.
      
      Truncate and hole punching already serialize with i_mutex.  Other
      callers of unmap_mapping_range() do not, and it's difficult to get
      i_mutex protection for all callers.  In particular ->d_revalidate(),
      which calls invalidate_inode_pages2_range() in fuse, may be called
      with or without i_mutex.
      
      This patch adds a new mutex to 'struct address_space' to prevent
      running multiple concurrent unmap_mapping_range() on the same mapping.
      
      [ We'll hopefully get rid of all this with the upcoming mm
        preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
        lockbreak" patch in particular.  But that is for 2.6.39 ]
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Reported-by: NMichael Leun <lkml20101129@newton.leun.net>
      Reported-by: NGurudas Pai <gurudas.pai@oracle.com>
      Tested-by: NGurudas Pai <gurudas.pai@oracle.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2aa15890
    • E
      net_sched: long word align struct qdisc_skb_cb data · 9e924cf4
      Eric Dumazet 提交于
      netem_skb_cb() does :
      
      return (struct netem_skb_cb *)qdisc_skb_cb(skb)->data;
      
      Unfortunatly struct qdisc_skb_cb data is not long word aligned, so
      access to psched_time_t time_to_send uses a non aligned access.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e924cf4
  7. 23 2月, 2011 1 次提交
  8. 22 2月, 2011 1 次提交
  9. 20 2月, 2011 2 次提交
  10. 19 2月, 2011 1 次提交
    • L
      Expand CONFIG_DEBUG_LIST to several other list operations · 3c18d4de
      Linus Torvalds 提交于
      When list debugging is enabled, we aim to readably show list corruption
      errors, and the basic list_add/list_del operations end up having extra
      debugging code in them to do some basic validation of the list entries.
      
      However, "list_del_init()" and "list_move[_tail]()" ended up avoiding
      the debug code due to how they were written. This fixes that.
      
      So the _next_ time we have list_move() problems with stale list entries,
      we'll hopefully have an easier time finding them..
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c18d4de
  11. 18 2月, 2011 2 次提交
  12. 17 2月, 2011 2 次提交
  13. 16 2月, 2011 1 次提交
  14. 14 2月, 2011 2 次提交
  15. 13 2月, 2011 2 次提交
  16. 11 2月, 2011 2 次提交
  17. 09 2月, 2011 2 次提交
  18. 05 2月, 2011 2 次提交
    • T
      genirq: Add missing status flags to modification mask · 872434d6
      Thomas Gleixner 提交于
      The mask which filters out the valid bits which can be set via
      irq_modify_status() is missing IRQ_NO_BALANCING, which breaks UV.
      
      Add IRQ_PER_CPU as well to avoid another one line patch for 39.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      872434d6
    • P
      USB: Fix trout build failure with ci13xxx_msm gadget · 8cf28f1f
      Pavankumar Kondeti 提交于
      This patch fixes the below compilation errors.
      
        CC      drivers/usb/gadget/ci13xxx_msm.o
        CC      net/mac80211/led.o
        drivers/usb/gadget/ci13xxx_msm.c: In function 'ci13xxx_msm_notify_event':
        drivers/usb/gadget/ci13xxx_msm.c:42: error: 'USB_AHBBURST' undeclared (first use in this function)
        drivers/usb/gadget/ci13xxx_msm.c:42: error: (Each undeclared identifier is reported only once
        drivers/usb/gadget/ci13xxx_msm.c:42: error: for each function it appears in.)
        drivers/usb/gadget/ci13xxx_msm.c:43: error: 'USB_AHBMODE' undeclared (first use in this function)
      make[4]: *** [drivers/usb/gadget/ci13xxx_msm.o] Error 1
      make[3]: *** [drivers/usb/gadget] Error 2
      
      MSM USB driver is not supported on boards like trout (MSM7201) which
      has an external PHY.
      Signed-off-by: NPavankumar Kondeti <pkondeti@codeaurora.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      8cf28f1f
  19. 04 2月, 2011 2 次提交
  20. 03 2月, 2011 6 次提交
    • S
      tracing: Replace syscall_meta_data struct array with pointer array · 3d56e331
      Steven Rostedt 提交于
      Currently the syscall_meta structures for the syscall tracepoints are
      placed in the __syscall_metadata section, and at link time, the linker
      makes one large array of all these syscall metadata structures. On boot
      up, this array is read (much like the initcall sections) and the syscall
      data is processed.
      
      The problem is that there is no guarantee that gcc will place complex
      structures nicely together in an array format. Two structures in the
      same file may be placed awkwardly, because gcc has no clue that they
      are suppose to be in an array.
      
      A hack was used previous to force the alignment to 4, to pack the
      structures together. But this caused alignment issues with other
      architectures (sparc).
      
      Instead of packing the structures into an array, the structures' addresses
      are now put into the __syscall_metadata section. As pointers are always the
      natural alignment, gcc should always pack them tightly together
      (otherwise initcall, extable, etc would also fail).
      
      By having the pointers to the structures in the section, we can still
      iterate the trace_events without causing unnecessary alignment problems
      with other architectures, or depending on the current behaviour of
      gcc that will likely change in the future just to tick us kernel developers
      off a little more.
      
      The __syscall_metadata section is also moved into the .init.data section
      as it is now only needed at boot up.
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3d56e331
    • M
      tracepoints: Fix section alignment using pointer array · 65498646
      Mathieu Desnoyers 提交于
      Make the tracepoints more robust, making them solid enough to handle compiler
      changes by not relying on anything based on compiler-specific behavior with
      respect to structure alignment. Implement an approach proposed by David Miller:
      use an array of const pointers to refer to the individual structures, and export
      this pointer array through the linker script rather than the structures per se.
      It will consume 32 extra bytes per tracepoint (24 for structure padding and 8
      for the pointers), but are less likely to break due to compiler changes.
      
      History:
      
      commit 7e066fb8 tracepoints: add DECLARE_TRACE() and DEFINE_TRACE()
      added the aligned(32) type and variable attribute to the tracepoint structures
      to deal with gcc happily aligning statically defined structures on 32-byte
      multiples.
      
      One attempt was to use a 8-byte alignment for tracepoint structures by applying
      both the variable and type attribute to tracepoint structures definitions and
      declarations. It worked fine with gcc 4.5.1, but broke with gcc 4.4.4 and 4.4.5.
      
      The reason is that the "aligned" attribute only specify the _minimum_ alignment
      for a structure, leaving both the compiler and the linker free to align on
      larger multiples. Because tracepoint.c expects the structures to be placed as an
      array within each section, up-alignment cause NULL-pointer exceptions due to the
      extra unexpected padding.
      
      (this patch applies on top of -tip)
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      LKML-Reference: <20110126222622.GA10794@Krystal>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      65498646
    • S
      tracing: Replace trace_event struct array with pointer array · e4a9ea5e
      Steven Rostedt 提交于
      Currently the trace_event structures are placed in the _ftrace_events
      section, and at link time, the linker makes one large array of all
      the trace_event structures. On boot up, this array is read (much like
      the initcall sections) and the events are processed.
      
      The problem is that there is no guarantee that gcc will place complex
      structures nicely together in an array format. Two structures in the
      same file may be placed awkwardly, because gcc has no clue that they
      are suppose to be in an array.
      
      A hack was used previous to force the alignment to 4, to pack the
      structures together. But this caused alignment issues with other
      architectures (sparc).
      
      Instead of packing the structures into an array, the structures' addresses
      are now put into the _ftrace_event section. As pointers are always the
      natural alignment, gcc should always pack them tightly together
      (otherwise initcall, extable, etc would also fail).
      
      By having the pointers to the structures in the section, we can still
      iterate the trace_events without causing unnecessary alignment problems
      with other architectures, or depending on the current behaviour of
      gcc that will likely change in the future just to tick us kernel developers
      off a little more.
      
      The _ftrace_event section is also moved into the .init.data section
      as it is now only needed at boot up.
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e4a9ea5e
    • N
      vfs: sparse: add __FMODE_EXEC · 3cd90ea4
      Namhyung Kim 提交于
      FMODE_EXEC is a constant type of fmode_t but was used with normal integer
      constants.  This results in following warnings from sparse.  Fix it using
      new macro __FMODE_EXEC.
      
       fs/exec.c:116:58: warning: restricted fmode_t degrades to integer
       fs/exec.c:689:58: warning: restricted fmode_t degrades to integer
       fs/fcntl.c:777:9: warning: restricted fmode_t degrades to integer
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3cd90ea4
    • N
      vfs: sparse: remove a warning on OPEN_FMODE() · 1a44bc8c
      Namhyung Kim 提交于
      AND-ing FMODE_* constant with normal integer results in following
      sparse warnings. Fix it.
      
       fs/open.c:662:21: warning: restricted fmode_t degrades to integer
       fs/anon_inodes.c:123:34: warning: restricted fmode_t degrades to integer
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1a44bc8c
    • J
      memcg: prevent endless loop when charging huge pages to near-limit group · 19942822
      Johannes Weiner 提交于
      If reclaim after a failed charging was unsuccessful, the limits are
      checked again, just in case they settled by means of other tasks.
      
      This is all fine as long as every charge is of size PAGE_SIZE, because in
      that case, being below the limit means having at least PAGE_SIZE bytes
      available.
      
      But with transparent huge pages, we may end up in an endless loop where
      charging and reclaim fail, but we keep going because the limits are not
      yet exceeded, although not allowing for a huge page.
      
      Fix this up by explicitely checking for enough room, not just whether we
      are within limits.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19942822
  21. 02 2月, 2011 1 次提交
  22. 01 2月, 2011 1 次提交
    • P
      netfilter: ecache: always set events bits, filter them later · 3db7e93d
      Pablo Neira Ayuso 提交于
      For the following rule:
      
      iptables -I PREROUTING -t raw -j CT --ctevents assured
      
      The event delivered looks like the following:
      
       [UPDATE] tcp      6 src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED]
      
      Note that the TCP protocol state is not included. For that reason
      the CT event filtering is not very useful for conntrackd.
      
      To resolve this issue, instead of conditionally setting the CT events
      bits based on the ctmask, we always set them and perform the filtering
      in the late stage, just before the delivery.
      
      Thus, the event delivered looks like the following:
      
       [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED]
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      3db7e93d