1. 21 10月, 2011 3 次提交
  2. 22 9月, 2011 1 次提交
    • W
      drm: support routines for HDMI/DP ELD · 76adaa34
      Wu Fengguang 提交于
      ELD (EDID-Like Data) describes to the HDMI/DP audio driver the audio
      capabilities of the plugged monitor.
      
      This adds drm_edid_to_eld() for converting EDID to ELD. The converted
      ELD will be saved in a new drm_connector.eld[128] data field. This is
      necessary because the graphics driver will need to fixup some of the
      data fields (eg. HDMI/DP connection type, AV sync delay) before writing
      to the hardware ELD buffer. drm_av_sync_delay() will help the graphics
      drivers dynamically compute the AV sync delay for fixing-up the ELD.
      
      ELD selection policy: it's possible for one encoder to be associated
      with multiple connectors (ie. monitors), in which case the first found
      ELD will be returned by drm_select_eld(). This policy may not be
      suitable for all users, but let's start it simple first.
      
      The impact of ELD selection policy: assume there are two monitors, one
      supports stereo playback and the other has 8-channel output; cloned
      display mode is used, so that the two monitors are associated with the
      same internal encoder. If only the stereo playback capability is reported,
      the user won't be able to start 8-channel playback; if the 8-channel ELD
      is reported, then user space applications may send 8-channel samples
      down, however the user may actually be listening to the 2-channel
      monitor and not connecting speakers to the 8-channel monitor.
      
      According to James, many TVs will either refuse the display anything or
      pop-up an OSD warning whenever they receive hdmi audio which they cannot
      handle. Eventually we will require configurability and/or per-monitor
      audio control even when the video is cloned.
      
      CC: Zhao Yakui <yakui.zhao@intel.com>
      CC: Wang Zhenyu <zhenyu.z.wang@intel.com>
      CC: Jeremy Bush <contractfrombelow@gmail.com>
      CC: Christopher White <c.white@pulseforce.com>
      CC: Pierre-Louis Bossart <pierre-louis.bossart@intel.com>
      CC: Paul Menzel <paulepanter@users.sourceforge.net>
      CC: James Cloos <cloos@jhcloos.com>
      CC: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NKeith Packard <keithp@keithp.com>
      76adaa34
  3. 09 9月, 2011 2 次提交
  4. 06 9月, 2011 9 次提交
  5. 01 9月, 2011 2 次提交
    • M
      drm/radeon/kms: add a new gem_wait ioctl with read/write flags · d3ed7402
      Marek Olšák 提交于
      The new DRM_RADEON_GEM_WAIT ioctl combines GEM_WAIT_IDLE and GEM_BUSY (there
      is a NO_WAIT flag to get the latter) with USAGE_READ and USAGE_WRITE flags
      to take advantage of the new ttm_bo_wait changes.
      
      Also bump the DRM version.
      Signed-off-by: NMarek Olšák <maraeo@gmail.com>
      Reviewed-by: NJerome Glisse <jglisse@redhat.com>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      d3ed7402
    • M
      drm/ttm: add a way to bo_wait for either the last read or last write · dfadbbdb
      Marek Olšák 提交于
      Sometimes we want to know whether a buffer is busy and wait for it (bo_wait).
      However, sometimes it would be more useful to be able to query whether
      a buffer is busy and being either read or written, and wait until it's stopped
      being either read or written. The point of this is to be able to avoid
      unnecessary waiting, e.g. if a GPU has written something to a buffer and is now
      reading that buffer, and a CPU wants to map that buffer for read, it needs to
      only wait for the last write. If there were no write, there wouldn't be any
      waiting needed.
      
      This, or course, requires user space drivers to send read/write flags
      with each relocation (like we have read/write domains in radeon, so we can
      actually use those for something useful now).
      
      Now how this patch works:
      
      The read/write flags should passed to ttm_validate_buffer. TTM maintains
      separate sync objects of the last read and write for each buffer, in addition
      to the sync object of the last use of a buffer. ttm_bo_wait then operates
      with one the sync objects.
      Signed-off-by: NMarek Olšák <maraeo@gmail.com>
      Reviewed-by: NJerome Glisse <jglisse@redhat.com>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      dfadbbdb
  6. 30 8月, 2011 1 次提交
  7. 29 8月, 2011 1 次提交
    • S
      perf events: Fix slow and broken cgroup context switch code · a8d757ef
      Stephane Eranian 提交于
      The current cgroup context switch code was incorrect leading
      to bogus counts. Furthermore, as soon as there was an active
      cgroup event on a CPU, the context switch cost on that CPU
      would increase by a significant amount as demonstrated by a
      simple ping/pong example:
      
       $ ./pong
       Both processes pinned to CPU1, running for 10s
       10684.51 ctxsw/s
      
      Now start a cgroup perf stat:
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100
      
      $ ./pong
       Both processes pinned to CPU1, running for 10s
       6674.61 ctxsw/s
      
      That's a 37% penalty.
      
      Note that pong is not even in the monitored cgroup.
      
      The results shown by perf stat are bogus:
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100
      
       Performance counter stats for 'sleep 100':
      
       CPU1 <not counted> cycles   test
       CPU1 16,984,189,138 cycles  #    0.000 GHz
      
      The second 'cycles' event should report a count @ CPU clock
      (here 2.4GHz) as it is counting across all cgroups.
      
      The patch below fixes the bogus accounting and bypasses any
      cgroup switches in case the outgoing and incoming tasks are
      in the same cgroup.
      
      With this patch the same test now yields:
       $ ./pong
       Both processes pinned to CPU1, running for 10s
       10775.30 ctxsw/s
      
      Start perf stat with cgroup:
      
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
      Run pong outside the cgroup:
       $ /pong
       Both processes pinned to CPU1, running for 10s
       10687.80 ctxsw/s
      
      The penalty is now less than 2%.
      
      And the results for perf stat are correct:
      
      $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
       Performance counter stats for 'sleep 10':
      
       CPU1 <not counted> cycles test #    0.000 GHz
       CPU1 23,933,981,448 cycles      #    0.000 GHz
      
      Now perf stat reports the correct counts for
      for the non cgroup event.
      
      If we run pong inside the cgroup, then we also get the
      correct counts:
      
      $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
       Performance counter stats for 'sleep 10':
      
       CPU1 22,297,726,205 cycles test #    0.000 GHz
       CPU1 23,933,981,448 cycles      #    0.000 GHz
      
            10.001457237 seconds time elapsed
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110825135803.GA4697@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      a8d757ef
  8. 27 8月, 2011 1 次提交
  9. 26 8月, 2011 5 次提交
    • D
      backlight: add a callback 'notify_after' for backlight control · cc7993f6
      Dilan Lee 提交于
      We need a callback to do some things after pwm_enable, pwm_disable
      and pwm_config.
      Signed-off-by: NDilan Lee <dilee@nvidia.com>
      Reviewed-by: NRobert Morell <rmorell@nvidia.com>
      Reviewed-by: NArun Murthy <arun.murthy@stericsson.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc7993f6
    • A
      rapidio: fix use of non-compatible registers · 284fb68d
      Alexandre Bounine 提交于
      Replace/remove use of RIO v.1.2 registers/bits that are not
      forward-compatible with newer versions of RapidIO specification.
      
      RapidIO specification v.1.3 removed Write Port CSR, Doorbell CSR,
      Mailbox CSR and Mailbox and Doorbell bits of the PEF CAR.
      
      Use of removed (since RIO v.1.3) register bits affects users of
      currently available 1.3 and 2.x compliant devices who may use not so
      recent kernel versions.
      
      Removing checks for unsupported bits makes corresponding routines
      compatible with all versions of RapidIO specification.  Therefore,
      backporting makes stable kernel versions compliant with RIO v.1.3 and
      later as well.
      Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Cc: Thomas Moll <thomas.moll@sysgo.com>
      Cc: Chul Kim <chul.kim@idt.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      284fb68d
    • E
      a8018766
    • J
      lockdep: Add helper function for dir vs file i_mutex annotation · e096d0c7
      Josh Boyer 提交于
      Purely in-memory filesystems do not use the inode hash as the dcache
      tells us if an entry already exists.  As a result, they do not call
      unlock_new_inode, and thus directory inodes do not get put into a
      different lockdep class for i_sem.
      
      We need the different lockdep classes, because the locking order for
      i_mutex is different for directory inodes and regular inodes.  Directory
      inodes can do "readdir()", which takes i_mutex *before* possibly taking
      mm->mmap_sem (due to a page fault while copying the directory entry to
      user space).
      
      In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem
      before accessing i_mutex.
      
      The two cases can never happen for the same inode, so no real deadlock
      can occur, but without the different lockdep classes, lockdep cannot
      understand that.  As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this
      can lead to false positives from lockdep like below:
      
          find/645 is trying to acquire lock:
           (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac
      
          but task is already holding lock:
           (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>]
          vfs_readdir+0x5b/0xb4
      
          which lock already depends on the new lock.
      
          the existing dependency chain (in reverse order) is:
      
          -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}:
                [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
                [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361
                [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45
                [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110
                [<ffffffff81111557>] mmap_region+0x258/0x432
                [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306
                [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a
                [<ffffffff8100c858>] sys_mmap+0x22/0x24
                [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b
      
          -> #0 (&mm->mmap_sem){++++++}:
                [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7
                [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
                [<ffffffff81109541>] might_fault+0x89/0xac
                [<ffffffff81149cff>] filldir+0x6f/0xc7
                [<ffffffff811586ea>] dcache_readdir+0x67/0x205
                [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4
                [<ffffffff8114a073>] sys_getdents+0x7e/0xd1
                [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b
      
      This patch moves the directory vs file lockdep annotation into a helper
      function that can be called by in-memory filesystems and has hugetlbfs
      call it.
      Signed-off-by: NJosh Boyer <jwboyer@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e096d0c7
    • A
      Add a personality to report 2.6.x version numbers · be27425d
      Andi Kleen 提交于
      I ran into a couple of programs which broke with the new Linux 3.0
      version.  Some of those were binary only.  I tried to use LD_PRELOAD to
      work around it, but it was quite difficult and in one case impossible
      because of a mix of 32bit and 64bit executables.
      
      For example, all kind of management software from HP doesnt work, unless
      we pretend to run a 2.6 kernel.
      
        $ uname -a
        Linux svivoipvnx001 3.0.0-08107-g97cd98f #1062 SMP Fri Aug 12 18:11:45 CEST 2011 i686 i686 i386 GNU/Linux
      
        $ hpacucli ctrl all show
      
        Error: No controllers detected.
      
        $ rpm -qf /usr/sbin/hpacucli
        hpacucli-8.75-12.0
      
      Another notable case is that Python now reports "linux3" from
      sys.platform(); which in turn can break things that were checking
      sys.platform() == "linux2":
      
        https://bugzilla.mozilla.org/show_bug.cgi?id=664564
      
      It seems pretty clear to me though it's a bug in the apps that are using
      '==' instead of .startswith(), but this allows us to unbreak broken
      programs.
      
      This patch adds a UNAME26 personality that makes the kernel report a
      2.6.40+x version number instead.  The x is the x in 3.x.
      
      I know this is somewhat ugly, but I didn't find a better workaround, and
      compatibility to existing programs is important.
      
      Some programs also read /proc/sys/kernel/osrelease.  This can be worked
      around in user space with mount --bind (and a mount namespace)
      
      To use:
      
        wget ftp://ftp.kernel.org/pub/linux/kernel/people/ak/uname26/uname26.c
        gcc -o uname26 uname26.c
        ./uname26 program
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be27425d
  10. 24 8月, 2011 1 次提交
    • J
      TTY: pty, fix pty counting · 24d406a6
      Jiri Slaby 提交于
      tty_operations->remove is normally called like:
      queue_release_one_tty
       ->tty_shutdown
         ->tty_driver_remove_tty
           ->tty_operations->remove
      
      However tty_shutdown() is called from queue_release_one_tty() only if
      tty_operations->shutdown is NULL. But for pty, it is not.
      pty_unix98_shutdown() is used there as ->shutdown.
      
      So tty_operations->remove of pty (i.e. pty_unix98_remove()) is never
      called. This results in invalid pty_count. I.e. what can be seen in
      /proc/sys/kernel/pty/nr.
      
      I see this was already reported at:
        https://lkml.org/lkml/2009/11/5/370
      But it was not fixed since then.
      
      This patch is kind of a hackish way. The problem lies in ->install. We
      allocate there another tty (so-called tty->link). So ->install is
      called once, but ->remove twice, for both tty and tty->link. The fix
      here is to count both tty and tty->link and divide the count by 2 for
      user.
      
      And to have ->remove called, let's make tty_driver_remove_tty() global
      and call that from pty_unix98_shutdown() (tty_operations->shutdown).
      
      While at it, let's document that when ->shutdown is defined,
      tty_shutdown() is not called.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: stable <stable@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      24d406a6
  11. 23 8月, 2011 3 次提交
    • P
      drivers:misc:ti-st: platform hooks for chip states · 0d7c5f25
      Pavan Savoy 提交于
      Certain platform specific or Host-WiLink Interface specific actions would be
      required to be taken when the chip is being enabled and after the chip is
      disabled such as configuration of the mux modes for the GPIO of host connected
      to the nshutdown of the chip or relinquishing UART after the chip is disabled.
      
      Similar actions can also be taken when the chip is in deep sleep or when the
      chip is awake. Performance enhancements such as configuring the host to run
      faster when chip is awake and slower when chip is asleep can also be made
      here.
      Signed-off-by: NPavan Savoy <pavan_savoy@ti.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      0d7c5f25
    • N
      target: Make standard INQUIRY return 'not connected' for tpg_virt_lun0 · 052605c6
      Nicholas Bellinger 提交于
      This patch changes target_emulate_inquiry_std() to set the 'not connected'
      (0x35) bit in standard INQUIRY response data when we are processing a
      request to a virtual LUN=0 mapping from struct se_device *g_lun0_dev that
      have been setup for us in transport_lookup_cmd_lun().
      
      This addresses an issue where qla2xxx FC clients need to be able
      to create demo-mode I_T FC Nexuses by default, but should not be
      exposing the default set of TPG LUNs to all FC clients.  This includes
      adding an new optional target_core_fabric_ops->tpg_check_demo_mode_login_only()
      caller to allow demo_mode nexuses to skip the old default of bulding
      a demo-mode MappedLUNs list via core_tpg_add_node_to_devs().
      
      (roland: Add missing tpg_check_demo_mode_login_only check in core_dev_add_lun)
      Reported-by: NRoland Dreier <roland@purestorage.com>
      Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
      Signed-off-by: NNicholas Bellinger <nab@risingtidesystems.com>
      052605c6
    • S
      mac80211: fix suspend/resume races with unregister hw · ecb44335
      Stanislaw Gruszka 提交于
      Do not call ->suspend, ->resume methods after we unregister wiphy. Also
      delete sta_clanup timer after we finish wiphy unregister to avoid this:
      
      WARNING: at lib/debugobjects.c:262 debug_print_object+0x85/0xa0()
      Hardware name: 6369CTO
      ODEBUG: free active (active state 0) object type: timer_list hint: sta_info_cleanup+0x0/0x180 [mac80211]
      Modules linked in: aes_i586 aes_generic fuse bridge stp llc autofs4 sunrpc cpufreq_ondemand acpi_cpufreq mperf ext2 dm_mod uinput thinkpad_acpi hwmon sg arc4 rt2800usb rt2800lib crc_ccitt rt2x00usb rt2x00lib mac80211 cfg80211 i2c_i801 iTCO_wdt iTCO_vendor_support e1000e ext4 mbcache jbd2 sd_mod crc_t10dif sr_mod cdrom yenta_socket ahci libahci pata_acpi ata_generic ata_piix i915 drm_kms_helper drm i2c_algo_bit video [last unloaded: microcode]
      Pid: 5663, comm: pm-hibernate Not tainted 3.1.0-rc1-wl+ #19
      Call Trace:
       [<c0454cfd>] warn_slowpath_common+0x6d/0xa0
       [<c05e05e5>] ? debug_print_object+0x85/0xa0
       [<c05e05e5>] ? debug_print_object+0x85/0xa0
       [<c0454dae>] warn_slowpath_fmt+0x2e/0x30
       [<c05e05e5>] debug_print_object+0x85/0xa0
       [<f8a808e0>] ? sta_info_alloc+0x1a0/0x1a0 [mac80211]
       [<c05e0bd2>] debug_check_no_obj_freed+0xe2/0x180
       [<c051175b>] kfree+0x8b/0x150
       [<f8a126ae>] cfg80211_dev_free+0x7e/0x90 [cfg80211]
       [<f8a13afd>] wiphy_dev_release+0xd/0x10 [cfg80211]
       [<c068d959>] device_release+0x19/0x80
       [<c05d06ba>] kobject_release+0x7a/0x1c0
       [<c07646a8>] ? rtnl_unlock+0x8/0x10
       [<f8a13adb>] ? wiphy_resume+0x6b/0x80 [cfg80211]
       [<c05d0640>] ? kobject_del+0x30/0x30
       [<c05d1a6d>] kref_put+0x2d/0x60
       [<c05d056d>] kobject_put+0x1d/0x50
       [<c08015f4>] ? mutex_lock+0x14/0x40
       [<c068d60f>] put_device+0xf/0x20
       [<c069716a>] dpm_resume+0xca/0x160
       [<c04912bd>] hibernation_snapshot+0xcd/0x260
       [<c04903df>] ? freeze_processes+0x3f/0x90
       [<c049151b>] hibernate+0xcb/0x1e0
       [<c048fdc0>] ? pm_async_store+0x40/0x40
       [<c048fe60>] state_store+0xa0/0xb0
       [<c048fdc0>] ? pm_async_store+0x40/0x40
       [<c05d0200>] kobj_attr_store+0x20/0x30
       [<c0575ea4>] sysfs_write_file+0x94/0xf0
       [<c051e26a>] vfs_write+0x9a/0x160
       [<c0575e10>] ? sysfs_open_file+0x200/0x200
       [<c051e3fd>] sys_write+0x3d/0x70
       [<c080959f>] sysenter_do_call+0x12/0x28
      
      Cc: stable@kernel.org
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      ecb44335
  12. 19 8月, 2011 1 次提交
    • W
      squeeze max-pause area and drop pass-good area · bb082295
      Wu Fengguang 提交于
      Revert the pass-good area introduced in ffd1f609 ("writeback:
      introduce max-pause and pass-good dirty limits") and make the max-pause
      area smaller and safe.
      
      This fixes ~30% performance regression in the ext3 data=writeback
      fio_mmap_randwrite_64k/fio_mmap_randrw_64k test cases, where there are
      12 JBOD disks, on each disk runs 8 concurrent tasks doing reads+writes.
      
      Using deadline scheduler also has a regression, but not that big as CFQ,
      so this suggests we have some write starvation.
      
      The test logs show that
      
      - the disks are sometimes under utilized
      
      - global dirty pages sometimes rush high to the pass-good area for
        several hundred seconds, while in the mean time some bdi dirty pages
        drop to very low value (bdi_dirty << bdi_thresh).  Then suddenly the
        global dirty pages dropped under global dirty threshold and bdi_dirty
        rush very high (for example, 2 times higher than bdi_thresh). During
        which time balance_dirty_pages() is not called at all.
      
      So the problems are
      
      1) The random writes progress so slow that they break the assumption of
         the max-pause logic that "8 pages per 200ms is typically more than
         enough to curb heavy dirtiers".
      
      2) The max-pause logic ignored task_bdi_thresh and thus opens the possibility
         for some bdi's to over dirty pages, leading to (bdi_dirty >> bdi_thresh)
         and then (bdi_thresh >> bdi_dirty) for others.
      
      3) The higher max-pause/pass-good thresholds somehow leads to the bad
         swing of dirty pages.
      
      The fix is to allow the task to slightly dirty over task_bdi_thresh, but
      no way to exceed bdi_dirty and/or global dirty_thresh.
      
      Tests show that it fixed the JBOD regression completely (both behavior
      and performance), while still being able to cut down large pause times
      in balance_dirty_pages() for single-disk cases.
      Reported-by: NLi Shaohua <shaohua.li@intel.com>
      Tested-by: NLi Shaohua <shaohua.li@intel.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      bb082295
  13. 18 8月, 2011 2 次提交
  14. 16 8月, 2011 1 次提交
    • J
      block: fix flush machinery for stacking drivers with differring flush flags · 4853abaa
      Jeff Moyer 提交于
      Commit ae1b1539, block: reimplement
      FLUSH/FUA to support merge, introduced a performance regression when
      running any sort of fsyncing workload using dm-multipath and certain
      storage (in our case, an HP EVA).  The test I ran was fs_mark, and it
      dropped from ~800 files/sec on ext4 to ~100 files/sec.  It turns out
      that dm-multipath always advertised flush+fua support, and passed
      commands on down the stack, where those flags used to get stripped off.
      The above commit changed that behavior:
      
      static inline struct request *__elv_next_request(struct request_queue *q)
      {
              struct request *rq;
      
              while (1) {
      -               while (!list_empty(&q->queue_head)) {
      +               if (!list_empty(&q->queue_head)) {
                              rq = list_entry_rq(q->queue_head.next);
      -                       if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
      -                           (rq->cmd_flags & REQ_FLUSH_SEQ))
      -                               return rq;
      -                       rq = blk_do_flush(q, rq);
      -                       if (rq)
      -                               return rq;
      +                       return rq;
                      }
      
      Note that previously, a command would come in here, have
      REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:
      
      struct request *blk_do_flush(struct request_queue *q, struct request *rq)
      {
              unsigned int fflags = q->flush_flags; /* may change, cache it */
              bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
              bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
              bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
              REQ_FUA);
              unsigned skip = 0;
      ...
              if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
                      rq->cmd_flags &= ~REQ_FLUSH;
      		if (!has_fua)
      			rq->cmd_flags &= ~REQ_FUA;
      	        return rq;
      	}
      
      So, the flush machinery was bypassed in such cases (q->flush_flags == 0
      && rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).
      
      Now, however, we don't get into the flush machinery at all.  Instead,
      __elv_next_request just hands a request with flush and fua bits set to
      the scsi_request_fn, even if the underlying request_queue does not
      support flush or fua.
      
      The agreed upon approach is to fix the flush machinery to allow
      stacking.  While this isn't used in practice (since there is only one
      request-based dm target, and that target will now reflect the flush
      flags of the underlying device), it does future-proof the solution, and
      make it function as designed.
      
      In order to make this work, I had to add a field to the struct request,
      inside the flush structure (to store the original req->end_io).  Shaohua
      had suggested overloading the union with rb_node and completion_data,
      but the completion data is used by device mapper and can also be used by
      other drivers.  So, I didn't see a way around the additional field.
      
      I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
      the lost performance.  Comments and other testers, as always, are
      appreciated.
      
      Cheers,
      Jeff
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      4853abaa
  15. 14 8月, 2011 2 次提交
  16. 12 8月, 2011 2 次提交
    • J
      ASoC: omap: Update e-mail address of Jarkko Nikula · 7ec41ee5
      Jarkko Nikula 提交于
      My gmail account got disabled and I'm not going to reopen it.
      Signed-off-by: NJarkko Nikula <jarkko.nikula@bitmer.com>
      Acked-by: NLiam Girdwood <lrg@ti.com>
      Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      7ec41ee5
    • V
      move RLIMIT_NPROC check from set_user() to do_execve_common() · 72fa5997
      Vasiliy Kulikov 提交于
      The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
      check in set_user() to check for NPROC exceeding via setuid() and
      similar functions.
      
      Before the check there was a possibility to greatly exceed the allowed
      number of processes by an unprivileged user if the program relied on
      rlimit only.  But the check created new security threat: many poorly
      written programs simply don't check setuid() return code and believe it
      cannot fail if executed with root privileges.  So, the check is removed
      in this patch because of too often privilege escalations related to
      buggy programs.
      
      The NPROC can still be enforced in the common code flow of daemons
      spawning user processes.  Most of daemons do fork()+setuid()+execve().
      The check introduced in execve() (1) enforces the same limit as in
      setuid() and (2) doesn't create similar security issues.
      
      Neil Brown suggested to track what specific process has exceeded the
      limit by setting PF_NPROC_EXCEEDED process flag.  With the change only
      this process would fail on execve(), and other processes' execve()
      behaviour is not changed.
      
      Solar Designer suggested to re-check whether NPROC limit is still
      exceeded at the moment of execve().  If the process was sleeping for
      days between set*uid() and execve(), and the NPROC counter step down
      under the limit, the defered execve() failure because NPROC limit was
      exceeded days ago would be unexpected.  If the limit is not exceeded
      anymore, we clear the flag on successful calls to execve() and fork().
      
      The flag is also cleared on successful calls to set_user() as the limit
      was exceeded for the previous user, not the current one.
      
      Similar check was introduced in -ow patches (without the process flag).
      
      v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72fa5997
  17. 11 8月, 2011 3 次提交