1. 27 9月, 2012 1 次提交
  2. 26 9月, 2012 2 次提交
    • M
      blockdev: turn a rw semaphore into a percpu rw semaphore · 62ac665f
      Mikulas Patocka 提交于
      This avoids cache line bouncing when many processes lock the semaphore
      for read.
      
      New percpu lock implementation
      
      The lock consists of an array of percpu unsigned integers, a boolean
      variable and a mutex.
      
      When we take the lock for read, we enter rcu read section, check for a
      "locked" variable. If it is false, we increase a percpu counter on the
      current cpu and exit the rcu section. If "locked" is true, we exit the
      rcu section, take the mutex and drop it (this waits until a writer
      finished) and retry.
      
      Unlocking for read just decreases percpu variable. Note that we can
      unlock on a difference cpu than where we locked, in this case the
      counter underflows. The sum of all percpu counters represents the number
      of processes that hold the lock for read.
      
      When we need to lock for write, we take the mutex, set "locked" variable
      to true and synchronize rcu. Since RCU has been synchronized, no
      processes can create new read locks. We wait until the sum of percpu
      counters is zero - when it is, there are no readers in the critical
      section.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      62ac665f
    • M
      Fix a crash when block device is read and block size is changed at the same time · b87570f5
      Mikulas Patocka 提交于
      The kernel may crash when block size is changed and I/O is issued
      simultaneously.
      
      Because some subsystems (udev or lvm) may read any block device anytime,
      the bug actually puts any code that changes a block device size in
      jeopardy.
      
      The crash can be reproduced if you place "msleep(1000)" to
      blkdev_get_blocks just before "bh->b_size = max_blocks <<
      inode->i_blkbits;".
      Then, run "dd if=/dev/ram0 of=/dev/null bs=4k count=1 iflag=direct"
      While it is waiting in msleep, run "blockdev --setbsz 2048 /dev/ram0"
      You get a BUG.
      
      The direct and non-direct I/O is written with the assumption that block
      size does not change. It doesn't seem practical to fix these crashes
      one-by-one there may be many crash possibilities when block size changes
      at a certain place and it is impossible to find them all and verify the
      code.
      
      This patch introduces a new rw-lock bd_block_size_semaphore. The lock is
      taken for read during I/O. It is taken for write when changing block
      size. Consequently, block size can't be changed while I/O is being
      submitted.
      
      For asynchronous I/O, the patch only prevents block size change while
      the I/O is being submitted. The block size can change when the I/O is in
      progress or when the I/O is being finished. This is acceptable because
      there are no accesses to block size when asynchronous I/O is being
      finished.
      
      The patch prevents block size changing while the device is mapped with
      mmap.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b87570f5
  3. 20 9月, 2012 4 次提交
  4. 09 9月, 2012 6 次提交
    • K
      block: Add bio_clone_bioset(), bio_clone_kmalloc() · bf800ef1
      Kent Overstreet 提交于
      Previously, there was bio_clone() but it only allocated from the fs bio
      set; as a result various users were open coding it and using
      __bio_clone().
      
      This changes bio_clone() to become bio_clone_bioset(), and then we add
      bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of
      the functionality the last patch adedd.
      
      This will also help in a later patch changing how bio cloning works.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: Boaz Harrosh <bharrosh@panasas.com>
      CC: Jeff Garzik <jeff@garzik.org>
      Acked-by: NJeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bf800ef1
    • K
      block: Consolidate bio_alloc_bioset(), bio_kmalloc() · 3f86a82a
      Kent Overstreet 提交于
      Previously, bio_kmalloc() and bio_alloc_bioset() behaved slightly
      different because there was some almost-duplicated code - this fixes
      some of that.
      
      The important change is that previously bio_kmalloc() always set
      bi_io_vec = bi_inline_vecs, even if nr_iovecs == 0 - unlike
      bio_alloc_bioset(). This would cause bio_has_data() to return true; I
      don't know if this resulted in any actual bugs but it was certainly
      wrong.
      
      bio_kmalloc() and bio_alloc_bioset() also have different arbitrary
      limits on nr_iovecs - 1024 (UIO_MAXIOV) for bio_kmalloc(), 256
      (BIO_MAX_PAGES) for bio_alloc_bioset(). This patch doesn't fix that, but
      at least they're enforced closer together and hopefully they will be
      fixed in a later patch.
      
      This'll also help with some future cleanups - there are a fair number of
      functions that allocate bios (e.g. bio_clone()), and now they don't have
      to be duplicated for bio_alloc(), bio_alloc_bioset(), and bio_kmalloc().
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      v7: Re-add dropped comments, improv patch description
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3f86a82a
    • K
      block: Kill bi_destructor · 4254bba1
      Kent Overstreet 提交于
      Now that we've got generic code for freeing bios allocated from bio
      pools, this isn't needed anymore.
      
      This patch also makes bio_free() static, since without bi_destructor
      there should be no need for it to be called anywhere else.
      
      bio_free() is now only called from bio_put, so we can refactor those a
      bit - move some code from bio_put() to bio_free() and kill the redundant
      bio->bi_next = NULL.
      
      v5: Switch to BIO_KMALLOC_POOL ((void *)~0), per Boaz
      v6: BIO_KMALLOC_POOL now NULL, drop bio_free's EXPORT_SYMBOL
      v7: No #define BIO_KMALLOC_POOL anymore
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4254bba1
    • K
      block: Add bio_reset() · f44b48c7
      Kent Overstreet 提交于
      Reusing bios is something that's been highly frowned upon in the past,
      but driver code keeps doing it anyways. If it's going to happen anyways,
      we should provide a generic method.
      
      This'll help with getting rid of bi_destructor - drivers/block/pktcdvd.c
      was open coding it, by doing a bio_init() and resetting bi_destructor.
      
      This required reordering struct bio, but the block layer is not yet
      nearly fast enough for any cacheline effects to matter here.
      
      v5: Add a define BIO_RESET_BITS, to be very explicit about what parts of
      bio->bi_flags are saved.
      v6: Further commenting verbosity, per Tejun
      v9: Add a function comment
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f44b48c7
    • K
      block: Ues bi_pool for bio_integrity_alloc() · 1e2a410f
      Kent Overstreet 提交于
      Now that bios keep track of where they were allocated from,
      bio_integrity_alloc_bioset() becomes redundant.
      
      Remove bio_integrity_alloc_bioset() and drop bio_set argument from the
      related functions and make them use bio->bi_pool.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: Martin K. Petersen <martin.petersen@oracle.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1e2a410f
    • K
      block: Generalized bio pool freeing · 395c72a7
      Kent Overstreet 提交于
      With the old code, when you allocate a bio from a bio pool you have to
      implement your own destructor that knows how to find the bio pool the
      bio was originally allocated from.
      
      This adds a new field to struct bio (bi_pool) and changes
      bio_alloc_bioset() to use it. This makes various bio destructors
      unnecessary, so they're then deleted.
      
      v6: Explain the temporary if statement in bio_put
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      CC: Jens Axboe <axboe@kernel.dk>
      CC: NeilBrown <neilb@suse.de>
      CC: Alasdair Kergon <agk@redhat.com>
      CC: Nicholas Bellinger <nab@linux-iscsi.org>
      CC: Lars Ellenberg <lars.ellenberg@linbit.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNicholas Bellinger <nab@linux-iscsi.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      395c72a7
  5. 05 9月, 2012 1 次提交
  6. 02 9月, 2012 2 次提交
  7. 31 8月, 2012 1 次提交
  8. 23 8月, 2012 1 次提交
    • A
      ARM: omap: allow building omap44xx without SMP · c7a9b09b
      Arnd Bergmann 提交于
      The new omap4 cpuidle implementation currently requires
      ARCH_NEEDS_CPU_IDLE_COUPLED, which only works on SMP.
      
      This patch makes it possible to build a non-SMP kernel
      for that platform. This is not normally desired for
      end-users but can be useful for testing.
      
      Without this patch, building rand-0y2jSKT results in:
      
      drivers/cpuidle/coupled.c: In function 'cpuidle_coupled_poke':
      drivers/cpuidle/coupled.c:317:3: error: implicit declaration of function '__smp_call_function_single' [-Werror=implicit-function-declaration]
      
      It's not clear if this patch is the best solution for
      the problem at hand. I have made sure that we can now
      build the kernel in all configurations, but that does
      not mean it will actually work on an OMAP44xx.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Kevin Hilman <khilman@ti.com>
      Cc: Tony Lindgren <tony@atomide.com>
      c7a9b09b
  9. 22 8月, 2012 3 次提交
    • A
      introduce kref_put_mutex() · 8ad5db8a
      Al Viro 提交于
      equivalent of
      	mutex_lock(mutex);
      	if (!kref_put(kref, release))
      		mutex_unlock(mutex);
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8ad5db8a
    • M
      mm: compaction: Abort async compaction if locks are contended or taking too long · c67fe375
      Mel Gorman 提交于
      Jim Schutt reported a problem that pointed at compaction contending
      heavily on locks.  The workload is straight-forward and in his own words;
      
      	The systems in question have 24 SAS drives spread across 3 HBAs,
      	running 24 Ceph OSD instances, one per drive.  FWIW these servers
      	are dual-socket Intel 5675 Xeons w/48 GB memory.  I've got ~160
      	Ceph Linux clients doing dd simultaneously to a Ceph file system
      	backed by 12 of these servers.
      
      Early in the test everything looks fine
      
        procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
         r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
        31 15          0     287216        576   38606628    0    0     2  1158    2   14   1  3  95  0  0
        27 15          0     225288        576   38583384    0    0    18 2222016 203357 134876  11 56  17 15  0
        28 17          0     219256        576   38544736    0    0    11 2305932 203141 146296  11 49  23 17  0
         6 18          0     215596        576   38552872    0    0     7 2363207 215264 166502  12 45  22 20  0
        22 18          0     226984        576   38596404    0    0     3 2445741 223114 179527  12 43  23 22  0
      
      and then it goes to pot
      
        procs -------------------memory------------------ ---swap-- -----io---- --system-- -----cpu-------
         r  b       swpd       free       buff      cache   si   so    bi    bo   in   cs  us sy  id wa st
        163  8          0     464308        576   36791368    0    0    11 22210  866  536   3 13  79  4  0
        207 14          0     917752        576   36181928    0    0   712 1345376 134598 47367   7 90   1  2  0
        123 12          0     685516        576   36296148    0    0   429 1386615 158494 60077   8 84   5  3  0
        123 12          0     598572        576   36333728    0    0  1107 1233281 147542 62351   7 84   5  4  0
        622  7          0     660768        576   36118264    0    0   557 1345548 151394 59353   7 85   4  3  0
        223 11          0     283960        576   36463868    0    0    46 1107160 121846 33006   6 93   1  1  0
      
      Note that system CPU usage is very high blocks being written out has
      dropped by 42%. He analysed this with perf and found
      
        perf record -g -a sleep 10
        perf report --sort symbol --call-graph fractal,5
          34.63%  [k] _raw_spin_lock_irqsave
                  |
                  |--97.30%-- isolate_freepages
                  |          compaction_alloc
                  |          unmap_and_move
                  |          migrate_pages
                  |          compact_zone
                  |          compact_zone_order
                  |          try_to_compact_pages
                  |          __alloc_pages_direct_compact
                  |          __alloc_pages_slowpath
                  |          __alloc_pages_nodemask
                  |          alloc_pages_vma
                  |          do_huge_pmd_anonymous_page
                  |          handle_mm_fault
                  |          do_page_fault
                  |          page_fault
                  |          |
                  |          |--87.39%-- skb_copy_datagram_iovec
                  |          |          tcp_recvmsg
                  |          |          inet_recvmsg
                  |          |          sock_recvmsg
                  |          |          sys_recvfrom
                  |          |          system_call
                  |          |          __recv
                  |          |          |
                  |          |           --100.00%-- (nil)
                  |          |
                  |           --12.61%-- memcpy
                   --2.70%-- [...]
      
      There was other data but primarily it is all showing that compaction is
      contended heavily on the zone->lock and zone->lru_lock.
      
      commit [b2eef8c0: mm: compaction: minimise the time IRQs are disabled
      while isolating pages for migration] noted that it was possible for
      migration to hold the lru_lock for an excessive amount of time. Very
      broadly speaking this patch expands the concept.
      
      This patch introduces compact_checklock_irqsave() to check if a lock
      is contended or the process needs to be scheduled. If either condition
      is true then async compaction is aborted and the caller is informed.
      The page allocator will fail a THP allocation if compaction failed due
      to contention. This patch also introduces compact_trylock_irqsave()
      which will acquire the lock only if it is not contended and the process
      does not need to schedule.
      Reported-by: NJim Schutt <jaschut@sandia.gov>
      Tested-by: NJim Schutt <jaschut@sandia.gov>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c67fe375
    • W
      string: do not export memweight() to userspace · c3a5ce04
      WANG Cong 提交于
      Fix the following warning:
      
        usr/include/linux/string.h:8: userspace cannot reference function or variable defined in the kernel
      Signed-off-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Acked-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c3a5ce04
  10. 20 8月, 2012 2 次提交
  11. 17 8月, 2012 2 次提交
  12. 15 8月, 2012 10 次提交
  13. 10 8月, 2012 2 次提交
    • K
      Yama: higher restrictions should block PTRACE_TRACEME · 9d8dad74
      Kees Cook 提交于
      The higher ptrace restriction levels should be blocking even
      PTRACE_TRACEME requests. The comments in the LSM documentation are
      misleading about when the checks happen (the parent does not go through
      security_ptrace_access_check() on a PTRACE_TRACEME call).
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org # 3.5.x and later
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      9d8dad74
    • P
      netfilter: nf_ct_sip: fix IPv6 address parsing · 02b69cbd
      Patrick McHardy 提交于
      Within SIP messages IPv6 addresses are enclosed in square brackets in most
      cases, with the exception of the "received=" header parameter. Currently
      the helper fails to parse enclosed addresses.
      
      This patch:
      
      - changes the SIP address parsing function to enforce square brackets
        when required, and accept them when not required but present, as
        recommended by RFC 5118.
      
      - adds a new SDP address parsing function that never accepts square
        brackets since SDP doesn't use them.
      
      With these changes, the SIP helper correctly parses all test messages
      from RFC 5118 (Session Initiation Protocol (SIP) Torture Test Messages
      for Internet Protocol Version 6 (IPv6)).
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      02b69cbd
  14. 09 8月, 2012 3 次提交
    • S
      block: disable discard request merge temporarily · 276f0f5d
      Shaohua Li 提交于
      The SCSI discard request merge never worked, and looks no solution
      for in future, let's disable it temporarily.
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      276f0f5d
    • A
      Input: eeti_ts: pass gpio value instead of IRQ · 4eef6cbf
      Arnd Bergmann 提交于
      The EETI touchscreen asserts its IRQ line as soon as it has data in its
      internal buffers. The line is automatically deasserted once all data has
      been read via I2C. Hence, the driver has to monitor the GPIO line and
      cannot simply rely on the interrupt handler reception.
      
      In the current implementation of the driver, irq_to_gpio() is used to
      determine the GPIO number from the i2c_client's IRQ value.
      
      As irq_to_gpio() is not available on all platforms, this patch changes
      this and makes the driver ignore the passed in IRQ. Instead, a GPIO is
      added to the platform_data struct and gpio_to_irq is used to derive the
      IRQ from that GPIO. If this fails, bail out. The driver is only able to
      work in environments where the touchscreen GPIO can be mapped to an
      IRQ.
      
      Without this patch, building raumfeld_defconfig results in:
      
      drivers/input/touchscreen/eeti_ts.c: In function 'eeti_ts_irq_active':
      drivers/input/touchscreen/eeti_ts.c:65:2: error: implicit declaration of function 'irq_to_gpio' [-Werror=implicit-function-declaration]
      Signed-off-by: NDaniel Mack <zonque@gmail.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: stable@vger.kernel.org (v3.2+)
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Sven Neumann <s.neumann@raumfeld.com>
      Cc: linux-input@vger.kernel.org
      Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
      4eef6cbf
    • A
      ARM: pxa: remove irq_to_gpio from ezx-pcap driver · 59ee93a5
      Arnd Bergmann 提交于
      The irq_to_gpio function was removed from the pxa platform
      in linux-3.2, and this driver has been broken since.
      
      There is actually no in-tree user of this driver that adds
      this platform device, but the driver can and does get enabled
      on some platforms.
      
      Without this patch, building ezx_defconfig results in:
      
      drivers/mfd/ezx-pcap.c: In function 'pcap_isr_work':
      drivers/mfd/ezx-pcap.c:205:2: error: implicit declaration of function 'irq_to_gpio' [-Werror=implicit-function-declaration]
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NHaojian Zhuang <haojian.zhuang@gmail.com>
      Cc: stable@vger.kernel.org (v3.2+)
      Cc: Samuel Ortiz <sameo@linux.intel.com>
      Cc: Daniel Ribeiro <drwyrm@gmail.com>
      59ee93a5