1. 19 3月, 2014 2 次提交
  2. 18 3月, 2014 7 次提交
  3. 26 2月, 2014 1 次提交
    • D
      ipc,mqueue: remove limits for the amount of system-wide queues · f3713fd9
      Davidlohr Bueso 提交于
      Commit 93e6f119 ("ipc/mqueue: cleanup definition names and
      locations") added global hardcoded limits to the amount of message
      queues that can be created.  While these limits are per-namespace,
      reality is that it ends up breaking userspace applications.
      Historically users have, at least in theory, been able to create up to
      INT_MAX queues, and limiting it to just 1024 is way too low and dramatic
      for some workloads and use cases.  For instance, Madars reports:
      
       "This update imposes bad limits on our multi-process application.  As
        our app uses approaches that each process opens its own set of queues
        (usually something about 3-5 queues per process).  In some scenarios
        we might run up to 3000 processes or more (which of-course for linux
        is not a problem).  Thus we might need up to 9000 queues or more.  All
        processes run under one user."
      
      Other affected users can be found in launchpad bug #1155695:
        https://bugs.launchpad.net/ubuntu/+source/manpages/+bug/1155695
      
      Instead of increasing this limit, revert it entirely and fallback to the
      original way of dealing queue limits -- where once a user's resource
      limit is reached, and all memory is used, new queues cannot be created.
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reported-by: NMadars Vitolins <m@silodev.com>
      Acked-by: NDoug Ledford <dledford@redhat.com>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: <stable@vger.kernel.org>	[3.5+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3713fd9
  4. 25 2月, 2014 2 次提交
    • L
      sysfs: fix namespace refcnt leak · fed95bab
      Li Zefan 提交于
      As mount() and kill_sb() is not a one-to-one match, we shoudn't get
      ns refcnt unconditionally in sysfs_mount(), and instead we should
      get the refcnt only when kernfs_mount() allocated a new superblock.
      
      v2:
      - Changed the name of the new argument, suggested by Tejun.
      - Made the argument optional, suggested by Tejun.
      
      v3:
      - Make the new argument as second-to-last arg, suggested by Tejun.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NTejun Heo <tj@kernel.org>
       ---
       fs/kernfs/mount.c      | 8 +++++++-
       fs/sysfs/mount.c       | 5 +++--
       include/linux/kernfs.h | 9 +++++----
       3 files changed, 15 insertions(+), 7 deletions(-)
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fed95bab
    • J
      fsnotify: Allocate overflow events with proper type · ff57cd58
      Jan Kara 提交于
      Commit 7053aee2 "fsnotify: do not share events between notification
      groups" used overflow event statically allocated in a group with the
      size of the generic notification event. This causes problems because
      some code looks at type specific parts of event structure and gets
      confused by a random data it sees there and causes crashes.
      
      Fix the problem by allocating overflow event with type corresponding to
      the group type so code cannot get confused.
      Signed-off-by: NJan Kara <jack@suse.cz>
      ff57cd58
  5. 22 2月, 2014 2 次提交
  6. 19 2月, 2014 3 次提交
  7. 18 2月, 2014 2 次提交
  8. 17 2月, 2014 2 次提交
  9. 14 2月, 2014 5 次提交
    • L
      workqueue: add args to workqueue lockdep name · fada94ee
      Li Zhong 提交于
      Tommi noticed a 'funny' lock class name: "%s#5" from a lock acquired in
      process_one_work().
      
      Maybe #fmt plus #args could be used as the lock_name to give some more
      information for some fmt string like the above.
      
      __builtin_constant_p() check is removed (as there seems no good way to
      check all the variables in args list). However, by removing the check,
      it only adds two additional "s for those constants.
      
      Some lockdep name examples printed out after the change:
      
      lockdep name                    wq->name
      
      "events_long"                   events_long
      "%s"("khelper")                 khelper
      "xfs-data/%s"mp->m_fsname       xfs-data/dm-3
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      fada94ee
    • R
      mlx5: Add include of <linux/slab.h> because of kzalloc()/kfree() use · 6ecde51d
      Roland Dreier 提交于
      On some architectures (for example, arm), we don't end up indirectly
      pulling in the declaration of kzalloc() and kfree(), and so building
      anything that includes <linux/mlx5/driver.h> breaks.  Fix this by adding
      an explicit include to get the declaration.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      6ecde51d
    • F
      net: ip, ipv6: handle gso skbs in forwarding path · fe6cc55f
      Florian Westphal 提交于
      Marcelo Ricardo Leitner reported problems when the forwarding link path
      has a lower mtu than the incoming one if the inbound interface supports GRO.
      
      Given:
      Host <mtu1500> R1 <mtu1200> R2
      
      Host sends tcp stream which is routed via R1 and R2.  R1 performs GRO.
      
      In this case, the kernel will fail to send ICMP fragmentation needed
      messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu
      checks in forward path. Instead, Linux tries to send out packets exceeding
      the mtu.
      
      When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does
      not fragment the packets when forwarding, and again tries to send out
      packets exceeding R1-R2 link mtu.
      
      This alters the forwarding dstmtu checks to take the individual gso
      segment lengths into account.
      
      For ipv6, we send out pkt too big error for gso if the individual
      segments are too big.
      
      For ipv4, we either send icmp fragmentation needed, or, if the DF bit
      is not set, perform software segmentation and let the output path
      create fragments when the packet is leaving the machine.
      It is not 100% correct as the error message will contain the headers of
      the GRO skb instead of the original/segmented one, but it seems to
      work fine in my (limited) tests.
      
      Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid
      sofware segmentation.
      
      However it turns out that skb_segment() assumes skb nr_frags is related
      to mss size so we would BUG there.  I don't want to mess with it considering
      Herbert and Eric disagree on what the correct behavior should be.
      
      Hannes Frederic Sowa notes that when we would shrink gso_size
      skb_segment would then also need to deal with the case where
      SKB_MAX_FRAGS would be exceeded.
      
      This uses sofware segmentation in the forward path when we hit ipv4
      non-DF packets and the outgoing link mtu is too small.  Its not perfect,
      but given the lack of bug reports wrt. GRO fwd being broken this is a
      rare case anyway.  Also its not like this could not be improved later
      once the dust settles.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Reported-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe6cc55f
    • F
      net: core: introduce netif_skb_dev_features · d2069403
      Florian Westphal 提交于
      Will be used by upcoming ipv4 forward path change that needs to
      determine feature mask using skb->dst->dev instead of skb->dev.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2069403
    • A
      PCI/MSI: Add pci_enable_msi_exact() and pci_enable_msix_exact() · 3ce4e860
      Alexander Gordeev 提交于
      The new functions are special cases for pci_enable_msi_range() and
      pci_enable_msix_range() when a particular number of MSI or MSI-X
      is needed.
      
      By contrast with pci_enable_msi_range() and pci_enable_msix_range()
      functions, pci_enable_msi_exact() and pci_enable_msix_exact()
      return zero in case of success, which indicates MSI or MSI-X
      interrupts have been successfully allocated.
      Signed-off-by: NAlexander Gordeev <agordeev@redhat.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      3ce4e860
  10. 13 2月, 2014 2 次提交
  11. 11 2月, 2014 5 次提交
    • K
      block: Fix cloning of discard/write same bios · 8423ae3d
      Kent Overstreet 提交于
      Immutable biovecs changed the way bio segments are treated in such a way that
      bio_for_each_segment() cannot now do what we want for discard/write same bios,
      since bi_size means something completely different for them.
      
      Fortunately discard and write same bios never have more than a single biovec, so
      bio_for_each_segment() is unnecessary and not terribly meaningful for them, but
      we still have to special case them in a few places.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Tested-by: NRichard W.M. Jones <rjones@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8423ae3d
    • L
      cgroup: protect modifications to cgroup_idr with cgroup_mutex · 0ab02ca8
      Li Zefan 提交于
      Setup cgroupfs like this:
        # mount -t cgroup -o cpuacct xxx /cgroup
        # mkdir /cgroup/sub1
        # mkdir /cgroup/sub2
      
      Then run these two commands:
        # for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } &
        # for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } &
      
      After seconds you may see this warning:
      
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0()
      idr_remove called for id=6 which is not allocated.
      ...
      Call Trace:
       [<ffffffff8156063c>] dump_stack+0x7a/0x96
       [<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0
       [<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50
       [<ffffffff81300aa7>] sub_remove+0x87/0x1b0
       [<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0
       [<ffffffff81300bf5>] idr_remove+0x25/0xd0
       [<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0
       [<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0
       [<ffffffff8107cdbc>] process_one_work+0x26c/0x550
       [<ffffffff8107eefe>] worker_thread+0x12e/0x3b0
       [<ffffffff81085f96>] kthread+0xe6/0xf0
       [<ffffffff81570bac>] ret_from_fork+0x7c/0xb0
      ---[ end trace 2d1577ec10cf80d0 ]---
      
      It's because allocating/removing cgroup ID is not properly synchronized.
      
      The bug was introduced when we converted cgroup_ida to cgroup_idr.
      While synchronization is already done inside ida_simple_{get,remove}(),
      users are responsible for concurrent calls to idr_{alloc,remove}().
      
      tj: Refreshed on top of b58c8998 ("cgroup: fix error return from
      cgroup_create()").
      
      Fixes: 4e96ee8e ("cgroup: convert cgroup_ida to cgroup_idr")
      Cc: <stable@vger.kernel.org> #3.12+
      Reported-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      0ab02ca8
    • P
      smp.h: fix x86+cpu.c sparse warnings about arch nonboot CPU calls · fb37bb04
      Paul Gortmaker 提交于
      Use what we already do for arch_disable_smp_support() to fix these:
      
        arch/x86/kernel/smpboot.c:1155:6: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static?
        arch/x86/kernel/smpboot.c:1160:6: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static?
        kernel/cpu.c:512:13: warning: symbol 'arch_enable_nonboot_cpus_begin' was not declared. Should it be static?
        kernel/cpu.c:516:13: warning: symbol 'arch_enable_nonboot_cpus_end' was not declared. Should it be static?
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb37bb04
    • C
      blk-mq: rework flush sequencing logic · 18741986
      Christoph Hellwig 提交于
      Witch to using a preallocated flush_rq for blk-mq similar to what's done
      with the old request path.  This allows us to set up the request properly
      with a tag from the actually allowed range and ->rq_disk as needed by
      some drivers.  To make life easier we also switch to dynamic allocation
      of ->flush_rq for the old path.
      
      This effectively reverts most of
      
          "blk-mq: fix for flush deadlock"
      
      and
      
          "blk-mq: Don't reserve a tag for flush request"
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      18741986
    • C
      blk-mq: rework I/O completions · 30a91cb4
      Christoph Hellwig 提交于
      Rework I/O completions to work more like the old code path.  blk_mq_end_io
      now stays out of the business of deferring completions to others CPUs
      and calling blk_mark_rq_complete.  The latter is very important to allow
      completing requests that have timed out and thus are already marked completed,
      the former allows using the IPI callout even for driver specific completions
      instead of having to reimplement them.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      30a91cb4
  12. 10 2月, 2014 2 次提交
    • R
      fs: Add prototype declaration to appropriate header file include/linux/bio.h · c4540a7d
      Rashika Kheria 提交于
      Add prototype declaration to header file include/linux/bio.h because it
      is used by more than one file.
      
      This eliminates the following warning in bio-integrity.c:
      fs/bio-integrity.c:214:14: warning: no previous prototype for ‘bio_integrity_tag_size’ [-Wmissing-prototypes]
      Signed-off-by: NRashika Kheria <rashika.kheria@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c4540a7d
    • A
      fix O_SYNC|O_APPEND syncing the wrong range on write() · d311d79d
      Al Viro 提交于
      It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
      when sync_page_range() had been introduced; generic_file_write{,v}() correctly
      synced
      	pos_after_write - written .. pos_after_write - 1
      but generic_file_aio_write() synced
      	pos_before_write .. pos_before_write + written - 1
      instead.  Which is not the same thing with O_APPEND, obviously.
      A couple of years later correct variant had been killed off when
      everything switched to use of generic_file_aio_write().
      
      All users of generic_file_aio_write() are affected, and the same bug
      has been copied into other instances of ->aio_write().
      
      The fix is trivial; the only subtle point is that generic_write_sync()
      ought to be inlined to avoid calculations useless for the majority of
      calls.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d311d79d
  13. 09 2月, 2014 1 次提交
  14. 08 2月, 2014 3 次提交
  15. 07 2月, 2014 1 次提交
    • E
      IB/mlx5: Fix binary compatibility with libmlx5 · 78c0f98c
      Eli Cohen 提交于
      Commit c1be5232 ("Fix micro UAR allocator") broke binary compatibility
      between libmlx5 and mlx5_ib since it defines a different value to the number
      of micro UARs per page, leading to wrong calculation in libmlx5. This patch
      defines struct mlx5_ib_alloc_ucontext_req_v2 as an extension to struct
      mlx5_ib_alloc_ucontext_req.  The extended size is determined in mlx5_ib_alloc_ucontext()
      and in case of old library we use uuarn 0 which works fine -- this is
      acheived due to create_user_qp() falling back from high to medium then to
      low class where low class will return 0.  For new libraries we use the
      more sophisticated allocation algorithm.
      Signed-off-by: NEli Cohen <eli@mellanox.com>
      Reviewed-by: NYann Droneaud <ydroneaud@opteya.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      78c0f98c