1. 15 10月, 2011 7 次提交
    • J
      PCI: Add support for PASID capability · 086ac11f
      Joerg Roedel 提交于
      Devices supporting Process Address Space Identifiers
      (PASIDs) can use an IOMMU to access multiple IO address
      spaces at the same time. A PCIe device indicates support for
      this feature by implementing the PASID capability. This
      patch adds support for the capability to the Linux kernel.
      Reviewed-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      086ac11f
    • J
      PCI: Add implementation for PRI capability · c320b976
      Joerg Roedel 提交于
      Implement the necessary functions to handle PRI capabilities
      on PCIe devices. With PRI devices behind an IOMMU can signal
      page fault conditions to software and recover from such
      faults.
      Reviewed-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      c320b976
    • J
      PCI: Move ATS implementation into own file · db3c33c6
      Joerg Roedel 提交于
      ATS does not depend on IOV support, so move the code into
      its own file. This file will also include support for the
      PRI and PASID capabilities later.
      Also give ATS its own Kconfig variable to allow selecting it
      without IOV support.
      Reviewed-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      db3c33c6
    • P
      PCI hotplug: acpiphp: Prevent deadlock on PCI-to-PCI bridge remove · 6af8bef1
      Prarit Bhargava 提交于
      I originally submitted a patch to workaround this by pushing all Ejection
      Requests and Device Checks onto the kacpi_hotplug queue.
      
      http://marc.info/?l=linux-acpi&m=131678270930105&w=2
      
      The patch is still insufficient in that Bus Checks also need to be added.
      
      Rather than add all events, including non-PCI-hotplug events, to the
      hotplug queue, mjg suggested that a better approach would be to modify
      the acpiphp driver so only acpiphp events would be added to the
      kacpi_hotplug queue.
      
      It's a longer patch, but at least we maintain the benefit of having separate
      queues in ACPI.  This, of course, is still only a workaround the problem.
      As Bjorn and mjg pointed out, we have to refactor a lot of this code to do
      the right thing but at this point it is a better to have this code working.
      
      The acpi core places all events on the kacpi_notify queue.  When the acpiphp
      driver is loaded and a PCI card with a PCI-to-PCI bridge is removed the
      following call sequence occurs:
      
      cleanup_p2p_bridge()
      	    -> cleanup_bridge()
      		    -> acpi_remove_notify_handler()
      			    -> acpi_os_wait_events_complete()
      				    -> flush_workqueue(kacpi_notify_wq)
      
      which is the queue we are currently executing on and the process will hang.
      
      Move all hotplug acpiphp events onto the kacpi_hotplug workqueue.  In
      handle_hotplug_event_bridge() and handle_hotplug_event_func() we can simply
      push the rest of the work onto the kacpi_hotplug queue and then avoid the
      deadlock.
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: mjg@redhat.com
      Cc: bhelgaas@google.com
      Cc: linux-acpi@vger.kernel.org
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      6af8bef1
    • R
      PCI / PM: Extend PME polling to all PCI devices · 379021d5
      Rafael J. Wysocki 提交于
      The land of PCI power management is a land of sorrow and ugliness,
      especially in the area of signaling events by devices.  There are
      devices that set their PME Status bits, but don't really bother
      to send a PME message or assert PME#.  There are hardware vendors
      who don't connect PME# lines to the system core logic (they know
      who they are).  There are PCI Express Root Ports that don't bother
      to trigger interrupts when they receive PME messages from the devices
      below.  There are ACPI BIOSes that forget to provide _PRW methods for
      devices capable of signaling wakeup.  Finally, there are BIOSes that
      do provide _PRW methods for such devices, but then don't bother to
      call Notify() for those devices from the corresponding _Lxx/_Exx
      GPE-handling methods.  In all of these cases the kernel doesn't have
      a chance to receive a proper notification that it should wake up a
      device, so devices stay in low-power states forever.  Worse yet, in
      some cases they continuously send PME Messages that are silently
      ignored, because the kernel simply doesn't know that it should clear
      the device's PME Status bit.
      
      This problem was first observed for "parallel" (non-Express) PCI
      devices on add-on cards and Matthew Garrett addressed it by adding
      code that polls PME Status bits of such devices, if they are enabled
      to signal PME, to the kernel.  Recently, however, it has turned out
      that PCI Express devices are also affected by this issue and that it
      is not limited to add-on devices, so it seems necessary to extend
      the PME polling to all PCI devices, including PCI Express and planar
      ones.  Still, it would be wasteful to poll the PME Status bits of
      devices that are known to receive proper PME notifications, so make
      the kernel (1) poll the PME Status bits of all PCI and PCIe devices
      enabled to signal PME and (2) disable the PME Status polling for
      devices for which correct PME notifications are received.
      Tested-by: NSarah Sharp <sarah.a.sharp@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      379021d5
    • B
      PCI: Make pci_setup_bridge() non-static for use by arch code · e2444273
      Benjamin Herrenschmidt 提交于
      The "powernv" platform of the powerpc architecture needs to assign PCI
      resources using a specific algorithm to fit some HW constraints of
      the IBM "IODA" architecture (related to the ability to create error
      handling domains that encompass specific segments of MMIO space).
      
      For doing so, it wants to call pci_setup_bridge() from architecture
      specific resource management in order to configure bridges after all
      resources have been assigned. So make it non-static.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      e2444273
    • B
      PCI: Add Solarflare vendor ID and SFC4000 device IDs · 937383a5
      Ben Hutchings 提交于
      These will be shared between the sfc driver and a PCI quirk.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      937383a5
  2. 05 10月, 2011 1 次提交
  3. 30 9月, 2011 1 次提交
    • P
      posix-cpu-timers: Cure SMP wobbles · d670ec13
      Peter Zijlstra 提交于
      David reported:
      
        Attached below is a watered-down version of rt/tst-cpuclock2.c from
        GLIBC.  Just build it with "gcc -o test test.c -lpthread -lrt" or
        similar.
      
        Run it several times, and you will see cases where the main thread
        will measure a process clock difference before and after the nanosleep
        which is smaller than the cpu-burner thread's individual thread clock
        difference.  This doesn't make any sense since the cpu-burner thread
        is part of the top-level process's thread group.
      
        I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
        64-bit binaries).
      
        For example:
      
        [davem@boricha build-x86_64-linux]$ ./test
        process: before(0.001221967) after(0.498624371) diff(497402404)
        thread:  before(0.000081692) after(0.498316431) diff(498234739)
        self:    before(0.001223521) after(0.001240219) diff(16698)
        [davem@boricha build-x86_64-linux]$ 
      
        The diff of 'process' should always be >= the diff of 'thread'.
      
        I make sure to wrap the 'thread' clock measurements the most tightly
        around the nanosleep() call, and that the 'process' clock measurements
        are the outer-most ones.
      
        ---
        #include <unistd.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <time.h>
        #include <fcntl.h>
        #include <string.h>
        #include <errno.h>
        #include <pthread.h>
      
        static pthread_barrier_t barrier;
      
        static void *chew_cpu(void *arg)
        {
      	  pthread_barrier_wait(&barrier);
      	  while (1)
      		  __asm__ __volatile__("" : : : "memory");
      	  return NULL;
        }
      
        int main(void)
        {
      	  clockid_t process_clock, my_thread_clock, th_clock;
      	  struct timespec process_before, process_after;
      	  struct timespec me_before, me_after;
      	  struct timespec th_before, th_after;
      	  struct timespec sleeptime;
      	  unsigned long diff;
      	  pthread_t th;
      	  int err;
      
      	  err = clock_getcpuclockid(0, &process_clock);
      	  if (err)
      		  return 1;
      
      	  err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
      	  if (err)
      		  return 1;
      
      	  pthread_barrier_init(&barrier, NULL, 2);
      	  err = pthread_create(&th, NULL, chew_cpu, NULL);
      	  if (err)
      		  return 1;
      
      	  err = pthread_getcpuclockid(th, &th_clock);
      	  if (err)
      		  return 1;
      
      	  pthread_barrier_wait(&barrier);
      
      	  err = clock_gettime(process_clock, &process_before);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(my_thread_clock, &me_before);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(th_clock, &th_before);
      	  if (err)
      		  return 1;
      
      	  sleeptime.tv_sec = 0;
      	  sleeptime.tv_nsec = 500000000;
      	  nanosleep(&sleeptime, NULL);
      
      	  err = clock_gettime(th_clock, &th_after);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(my_thread_clock, &me_after);
      	  if (err)
      		  return 1;
      
      	  err = clock_gettime(process_clock, &process_after);
      	  if (err)
      		  return 1;
      
      	  diff = process_after.tv_nsec - process_before.tv_nsec;
      	  printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 process_before.tv_sec, process_before.tv_nsec,
      		 process_after.tv_sec, process_after.tv_nsec, diff);
      	  diff = th_after.tv_nsec - th_before.tv_nsec;
      	  printf("thread:  before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 th_before.tv_sec, th_before.tv_nsec,
      		 th_after.tv_sec, th_after.tv_nsec, diff);
      	  diff = me_after.tv_nsec - me_before.tv_nsec;
      	  printf("self:    before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
      		 me_before.tv_sec, me_before.tv_nsec,
      		 me_after.tv_sec, me_after.tv_nsec, diff);
      
      	  return 0;
        }
      
      This is due to us using p->se.sum_exec_runtime in
      thread_group_cputime() where we iterate the thread group and sum all
      data. This does not take time since the last schedule operation (tick
      or otherwise) into account. We can cure this by using
      task_sched_runtime() at the cost of having to take locks.
      
      This also means we can (and must) do away with
      thread_group_sched_runtime() since the modified thread_group_cputime()
      is now more accurate and would deadlock when called from
      thread_group_sched_runtime().
      
      Aside of that it makes the function safe on 32 bit systems. The old
      code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
      64bit value and could be changed on another cpu at the same time.
      Reported-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twinsTested-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d670ec13
  4. 29 9月, 2011 1 次提交
    • R
      ptp: fix L2 event message recognition · f75159e9
      Richard Cochran 提交于
      The IEEE 1588 standard defines two kinds of messages, event and general
      messages. Event messages require time stamping, and general do not. When
      using UDP transport, two separate ports are used for the two message
      types.
      
      The BPF designed to recognize event messages incorrectly classifies L2
      general messages as event messages. This commit fixes the issue by
      extending the filter to check the message type field for L2 PTP packets.
      Event messages are be distinguished from general messages by testing
      the "general" bit.
      Signed-off-by: NRichard Cochran <richard.cochran@omicron.at>
      Cc: <stable@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f75159e9
  5. 27 9月, 2011 2 次提交
    • L
      vfs: remove LOOKUP_NO_AUTOMOUNT flag · b6c8069d
      Linus Torvalds 提交于
      That flag no longer makes sense, since we don't look up automount points
      as eagerly any more.  Additionally, it turns out that the NO_AUTOMOUNT
      handling was buggy to begin with: it would avoid automounting even for
      cases where we really *needed* to do the automount handling, and could
      return ENOENT for autofs entries that hadn't been instantiated yet.
      
      With our new non-eager automount semantics, one discussion has been
      about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the
      newfstatat() and fstatat64() system calls), but it's probably not worth
      it: you can always force at least directory automounting by simply
      adding the final '/' to the filename, which works for *all* of the stat
      family system calls, old and new.
      
      So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a
      result of our bad default behavior.
      Acked-by: NIan Kent <raven@themaw.net>
      Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6c8069d
    • L
      vfs pathname lookup: Add LOOKUP_AUTOMOUNT flag · d94c177b
      Linus Torvalds 提交于
      Since we've now turned around and made LOOKUP_FOLLOW *not* force an
      automount, we want to add the ability to force an automount event on
      lookup even if we don't happen to have one of the other flags that force
      it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..)
      
      Most cases will never want to use this, since you'd normally want to
      delay automounting as long as possible, which usually implies
      LOOKUP_OPEN (when we open a file or directory, we really cannot avoid
      the automount any more).
      
      But Trond argued sufficiently forcefully that at a minimum bind mounting
      a file and quotactl will want to force the automount lookup.  Some other
      cases (like nfs_follow_remote_path()) could use it too, although
      LOOKUP_DIRECTORY would work there as well.
      
      This commit just adds the flag and logic, no users yet, though.  It also
      doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and
      was made irrelevant by the same change that made us not follow on
      LOOKUP_FOLLOW.
      
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Ian Kent <raven@themaw.net>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Greg KH <gregkh@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d94c177b
  6. 26 9月, 2011 1 次提交
    • M
      dm crypt: always disable discard_zeroes_data · 983c7db3
      Milan Broz 提交于
      If optional discard support in dm-crypt is enabled, discards requests
      bypass the crypt queue and blocks of the underlying device are discarded.
      For the read path, discarded blocks are handled the same as normal
      ciphertext blocks, thus decrypted.
      
      So if the underlying device announces discarded regions return zeroes,
      dm-crypt must disable this flag because after decryption there is just
      random noise instead of zeroes.
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      983c7db3
  7. 20 9月, 2011 2 次提交
  8. 19 9月, 2011 1 次提交
  9. 17 9月, 2011 3 次提交
  10. 16 9月, 2011 2 次提交
    • M
      net: copy userspace buffers on device forwarding · 48c83012
      Michael S. Tsirkin 提交于
      dev_forward_skb loops an skb back into host networking
      stack which might hang on the memory indefinitely.
      In particular, this can happen in macvtap in bridged mode.
      Copy the userspace fragments to avoid blocking the
      sender in that case.
      
      As this patch makes skb_copy_ubufs extern now,
      I also added some documentation and made it clear
      the SKBTX_DEV_ZEROCOPY flag automatically instead
      of doing it in all callers. This can be made into a separate
      patch if people feel it's worth it.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48c83012
    • E
      tcp: Change possible SYN flooding messages · 946cedcc
      Eric Dumazet 提交于
      "Possible SYN flooding on port xxxx " messages can fill logs on servers.
      
      Change logic to log the message only once per listener, and add two new
      SNMP counters to track :
      
      TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client
      
      TCPReqQFullDrop : number of times a SYN request was dropped because
      syncookies were not enabled.
      
      Based on a prior patch from Tom Herbert, and suggestions from David.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Tom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      946cedcc
  11. 15 9月, 2011 2 次提交
  12. 09 9月, 2011 2 次提交
  13. 06 9月, 2011 3 次提交
  14. 31 8月, 2011 2 次提交
  15. 29 8月, 2011 1 次提交
    • S
      perf events: Fix slow and broken cgroup context switch code · a8d757ef
      Stephane Eranian 提交于
      The current cgroup context switch code was incorrect leading
      to bogus counts. Furthermore, as soon as there was an active
      cgroup event on a CPU, the context switch cost on that CPU
      would increase by a significant amount as demonstrated by a
      simple ping/pong example:
      
       $ ./pong
       Both processes pinned to CPU1, running for 10s
       10684.51 ctxsw/s
      
      Now start a cgroup perf stat:
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100
      
      $ ./pong
       Both processes pinned to CPU1, running for 10s
       6674.61 ctxsw/s
      
      That's a 37% penalty.
      
      Note that pong is not even in the monitored cgroup.
      
      The results shown by perf stat are bogus:
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100
      
       Performance counter stats for 'sleep 100':
      
       CPU1 <not counted> cycles   test
       CPU1 16,984,189,138 cycles  #    0.000 GHz
      
      The second 'cycles' event should report a count @ CPU clock
      (here 2.4GHz) as it is counting across all cgroups.
      
      The patch below fixes the bogus accounting and bypasses any
      cgroup switches in case the outgoing and incoming tasks are
      in the same cgroup.
      
      With this patch the same test now yields:
       $ ./pong
       Both processes pinned to CPU1, running for 10s
       10775.30 ctxsw/s
      
      Start perf stat with cgroup:
      
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
      Run pong outside the cgroup:
       $ /pong
       Both processes pinned to CPU1, running for 10s
       10687.80 ctxsw/s
      
      The penalty is now less than 2%.
      
      And the results for perf stat are correct:
      
      $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
       Performance counter stats for 'sleep 10':
      
       CPU1 <not counted> cycles test #    0.000 GHz
       CPU1 23,933,981,448 cycles      #    0.000 GHz
      
      Now perf stat reports the correct counts for
      for the non cgroup event.
      
      If we run pong inside the cgroup, then we also get the
      correct counts:
      
      $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
       Performance counter stats for 'sleep 10':
      
       CPU1 22,297,726,205 cycles test #    0.000 GHz
       CPU1 23,933,981,448 cycles      #    0.000 GHz
      
            10.001457237 seconds time elapsed
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110825135803.GA4697@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      a8d757ef
  16. 27 8月, 2011 1 次提交
  17. 26 8月, 2011 5 次提交
    • D
      backlight: add a callback 'notify_after' for backlight control · cc7993f6
      Dilan Lee 提交于
      We need a callback to do some things after pwm_enable, pwm_disable
      and pwm_config.
      Signed-off-by: NDilan Lee <dilee@nvidia.com>
      Reviewed-by: NRobert Morell <rmorell@nvidia.com>
      Reviewed-by: NArun Murthy <arun.murthy@stericsson.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc7993f6
    • A
      rapidio: fix use of non-compatible registers · 284fb68d
      Alexandre Bounine 提交于
      Replace/remove use of RIO v.1.2 registers/bits that are not
      forward-compatible with newer versions of RapidIO specification.
      
      RapidIO specification v.1.3 removed Write Port CSR, Doorbell CSR,
      Mailbox CSR and Mailbox and Doorbell bits of the PEF CAR.
      
      Use of removed (since RIO v.1.3) register bits affects users of
      currently available 1.3 and 2.x compliant devices who may use not so
      recent kernel versions.
      
      Removing checks for unsupported bits makes corresponding routines
      compatible with all versions of RapidIO specification.  Therefore,
      backporting makes stable kernel versions compliant with RIO v.1.3 and
      later as well.
      Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Cc: Thomas Moll <thomas.moll@sysgo.com>
      Cc: Chul Kim <chul.kim@idt.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      284fb68d
    • E
      a8018766
    • J
      lockdep: Add helper function for dir vs file i_mutex annotation · e096d0c7
      Josh Boyer 提交于
      Purely in-memory filesystems do not use the inode hash as the dcache
      tells us if an entry already exists.  As a result, they do not call
      unlock_new_inode, and thus directory inodes do not get put into a
      different lockdep class for i_sem.
      
      We need the different lockdep classes, because the locking order for
      i_mutex is different for directory inodes and regular inodes.  Directory
      inodes can do "readdir()", which takes i_mutex *before* possibly taking
      mm->mmap_sem (due to a page fault while copying the directory entry to
      user space).
      
      In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem
      before accessing i_mutex.
      
      The two cases can never happen for the same inode, so no real deadlock
      can occur, but without the different lockdep classes, lockdep cannot
      understand that.  As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this
      can lead to false positives from lockdep like below:
      
          find/645 is trying to acquire lock:
           (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac
      
          but task is already holding lock:
           (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>]
          vfs_readdir+0x5b/0xb4
      
          which lock already depends on the new lock.
      
          the existing dependency chain (in reverse order) is:
      
          -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}:
                [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
                [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361
                [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45
                [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110
                [<ffffffff81111557>] mmap_region+0x258/0x432
                [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306
                [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a
                [<ffffffff8100c858>] sys_mmap+0x22/0x24
                [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b
      
          -> #0 (&mm->mmap_sem){++++++}:
                [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7
                [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
                [<ffffffff81109541>] might_fault+0x89/0xac
                [<ffffffff81149cff>] filldir+0x6f/0xc7
                [<ffffffff811586ea>] dcache_readdir+0x67/0x205
                [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4
                [<ffffffff8114a073>] sys_getdents+0x7e/0xd1
                [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b
      
      This patch moves the directory vs file lockdep annotation into a helper
      function that can be called by in-memory filesystems and has hugetlbfs
      call it.
      Signed-off-by: NJosh Boyer <jwboyer@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e096d0c7
    • A
      Add a personality to report 2.6.x version numbers · be27425d
      Andi Kleen 提交于
      I ran into a couple of programs which broke with the new Linux 3.0
      version.  Some of those were binary only.  I tried to use LD_PRELOAD to
      work around it, but it was quite difficult and in one case impossible
      because of a mix of 32bit and 64bit executables.
      
      For example, all kind of management software from HP doesnt work, unless
      we pretend to run a 2.6 kernel.
      
        $ uname -a
        Linux svivoipvnx001 3.0.0-08107-g97cd98f #1062 SMP Fri Aug 12 18:11:45 CEST 2011 i686 i686 i386 GNU/Linux
      
        $ hpacucli ctrl all show
      
        Error: No controllers detected.
      
        $ rpm -qf /usr/sbin/hpacucli
        hpacucli-8.75-12.0
      
      Another notable case is that Python now reports "linux3" from
      sys.platform(); which in turn can break things that were checking
      sys.platform() == "linux2":
      
        https://bugzilla.mozilla.org/show_bug.cgi?id=664564
      
      It seems pretty clear to me though it's a bug in the apps that are using
      '==' instead of .startswith(), but this allows us to unbreak broken
      programs.
      
      This patch adds a UNAME26 personality that makes the kernel report a
      2.6.40+x version number instead.  The x is the x in 3.x.
      
      I know this is somewhat ugly, but I didn't find a better workaround, and
      compatibility to existing programs is important.
      
      Some programs also read /proc/sys/kernel/osrelease.  This can be worked
      around in user space with mount --bind (and a mount namespace)
      
      To use:
      
        wget ftp://ftp.kernel.org/pub/linux/kernel/people/ak/uname26/uname26.c
        gcc -o uname26 uname26.c
        ./uname26 program
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be27425d
  18. 24 8月, 2011 2 次提交
    • S
      block: simplify force plug flush code a little bit · 56ebdaf2
      Shaohua Li 提交于
      Cleaning up the code a little bit. attempt_plug_merge() traverses the plug
      list anyway, we can do the request counting there, so stack size is reduced
      a little bit.
      The motivation here is I suspect if we should count the requests for each
      queue (task could handle multiple disks in the meantime), but my test doesn't
      show it's worthy doing. If somebody proves we should do it, below change
      will make that more easier.
      Signed-off-by: NShaohua Li <shli@kernel.org>
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      56ebdaf2
    • J
      TTY: pty, fix pty counting · 24d406a6
      Jiri Slaby 提交于
      tty_operations->remove is normally called like:
      queue_release_one_tty
       ->tty_shutdown
         ->tty_driver_remove_tty
           ->tty_operations->remove
      
      However tty_shutdown() is called from queue_release_one_tty() only if
      tty_operations->shutdown is NULL. But for pty, it is not.
      pty_unix98_shutdown() is used there as ->shutdown.
      
      So tty_operations->remove of pty (i.e. pty_unix98_remove()) is never
      called. This results in invalid pty_count. I.e. what can be seen in
      /proc/sys/kernel/pty/nr.
      
      I see this was already reported at:
        https://lkml.org/lkml/2009/11/5/370
      But it was not fixed since then.
      
      This patch is kind of a hackish way. The problem lies in ->install. We
      allocate there another tty (so-called tty->link). So ->install is
      called once, but ->remove twice, for both tty and tty->link. The fix
      here is to count both tty and tty->link and divide the count by 2 for
      user.
      
      And to have ->remove called, let's make tty_driver_remove_tty() global
      and call that from pty_unix98_shutdown() (tty_operations->shutdown).
      
      While at it, let's document that when ->shutdown is defined,
      tty_shutdown() is not called.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: stable <stable@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      24d406a6
  19. 23 8月, 2011 1 次提交