1. 12 6月, 2011 1 次提交
  2. 09 6月, 2011 1 次提交
    • L
      vfs: reorganize 'struct inode' layout a bit · 13e12d14
      Linus Torvalds 提交于
      This tries to make the 'struct inode' accesses denser in the data cache
      by moving a commonly accessed field (i_security) closer to other fields
      that are accessed often.
      
      It also makes 'i_state' just an 'unsigned int' rather than 'unsigned
      long', since we only use a few bits of that field, and moves it next to
      the existing 'i_flags' so that we potentially get better structure
      layout (although depending on config options, i_flags may already have
      packed in the same word as i_lock, so this improves packing only for the
      case of spinlock debugging)
      
      Out 'struct inode' is still way too big, and we should probably move
      some other fields around too (the acl fields in particular) for better
      data cache access density.  Other fields (like the inode hash) are
      likely to be entirely irrelevant under most loads.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13e12d14
  3. 08 6月, 2011 2 次提交
    • G
      Staging: altera: move .h file to proper place · 85ab9ee9
      Greg Kroah-Hartman 提交于
      Staging drivers should be self-contained, without files in the include/
      directories.  So move the altera.h file back to the driver directory for
      now, until it moves out of the staging tree.
      
      Cc: Igor M. Liplianin <liplianin@netup.ru>
      Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      85ab9ee9
    • A
      usb-storage: redo incorrect reads · 21c13a4f
      Alan Stern 提交于
      Some USB mass-storage devices have bugs that cause them not to handle
      the first READ(10) command they receive correctly.  The Corsair
      Padlock v2 returns completely bogus data for its first read (possibly
      it returns the data in encrypted form even though the device is
      supposed to be unlocked).  The Feiya SD/SDHC card reader fails to
      complete the first READ(10) command after it is plugged in or after a
      new card is inserted, returning a status code that indicates it thinks
      the command was invalid, which prevents the kernel from retrying the
      read.
      
      Since the first read of a new device or a new medium is for the
      partition sector, the kernel is unable to retrieve the device's
      partition table.  Users have to manually issue an "hdparm -z" or
      "blockdev --rereadpt" command before they can access the device.
      
      This patch (as1470) works around the problem.  It adds a new quirk
      flag, US_FL_INVALID_READ10, indicating that the first READ(10) should
      always be retried immediately, as should any failing READ(10) commands
      (provided the preceding READ(10) command succeeded, to avoid getting
      stuck in a loop).  The patch also adds appropriate unusual_devs
      entries containing the new flag.
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Tested-by: NSven Geggus <sven-usbst@geggus.net>
      Tested-by: NPaul Hartman <paul.hartman+linux@gmail.com>
      CC: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
      CC: <stable@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      21c13a4f
  4. 07 6月, 2011 1 次提交
    • F
      swiotlb: Export swioltb_nr_tbl and utilize it as appropiate. · 5f98ecdb
      FUJITA Tomonori 提交于
      By default the io_tlb_nslabs is set to zero, and gets set to
      whatever value is passed in via swiotlb_init_with_tbl function.
      The default value passed in is 64MB. However, if the user provides
      the 'swiotlb=<nslabs>' the default value is ignored and
      the value provided by the user is used... Except when the SWIOTLB
      is used under Xen - there the default value of 64MB is used and
      the Xen-SWIOTLB has no mechanism to get the 'io_tlb_nslabs' filled
      out by setup_io_tlb_npages functions. This patch provides a function
      for the Xen-SWIOTLB to call to see if the io_tlb_nslabs is set
      and if so use that value.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      5f98ecdb
  5. 04 6月, 2011 3 次提交
    • V
      perf: Fix comments in include/linux/perf_event.h · d7ebe75b
      Vince Weaver 提交于
      Fix include/linux/perf_event.h comments to be consistent with
      the actual #define names. This is trivial, but it can be a bit
      confusing when first  reading through the file.
      Signed-off-by: NVince Weaver <vweaver1@eecs.utk.edu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106031757090.29381@cl320.eecs.utk.eduSigned-off-by: NIngo Molnar <mingo@elte.hu>
      d7ebe75b
    • A
      more conservative S_NOSEC handling · 9e1f1de0
      Al Viro 提交于
      Caching "we have already removed suid/caps" was overenthusiastic as merged.
      On network filesystems we might have had suid/caps set on another client,
      silently picked by this client on revalidate, all of that *without* clearing
      the S_NOSEC flag.
      
      AFAICS, the only reasonably sane way to deal with that is
      	* new superblock flag; unless set, S_NOSEC is not going to be set.
      	* local block filesystems set it in their ->mount() (more accurately,
      mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than
      local block ones clear it)
      	* if any network filesystem (or a cluster one) wants to use S_NOSEC,
      it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when
      inode attribute changes are picked from other clients.
      
      It's not an earth-shattering hole (anybody that can set suid on another client
      will almost certainly be able to write to the file before doing that anyway),
      but it's a bug that needs fixing.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9e1f1de0
    • L
      Revert "tty: make receive_buf() return the amout of bytes received" · 55db4c64
      Linus Torvalds 提交于
      This reverts commit b1c43f82.
      
      It was broken in so many ways, and results in random odd pty issues.
      
      It re-introduced the buggy schedule_work() in flush_to_ldisc() that can
      cause endless work-loops (see commit a5660b41: "tty: fix endless
      work loop when the buffer fills up").
      
      It also used an "unsigned int" return value fo the ->receive_buf()
      function, but then made multiple functions return a negative error code,
      and didn't actually check for the error in the caller.
      
      And it didn't actually work at all.  BenH bisected down odd tty behavior
      to it:
        "It looks like the patch is causing some major malfunctions of the X
         server for me, possibly related to PTYs.  For example, cat'ing a
         large file in a gnome terminal hangs the kernel for -minutes- in a
         loop of what looks like flush_to_ldisc/workqueue code, (some ftrace
         data in the quoted bits further down).
      
         ...
      
         Some more data: It -looks- like what happens is that the
         flush_to_ldisc work queue entry constantly re-queues itself (because
         the PTY is full ?) and the workqueue thread will basically loop
         forver calling it without ever scheduling, thus starving the consumer
         process that could have emptied the PTY."
      
      which is pretty much exactly the problem we fixed in a5660b41.
      
      Milton Miller pointed out the 'unsigned int' issue.
      Reported-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reported-by: NMilton Miller <miltonm@bga.com>
      Cc: Stefan Bigler <stefan.bigler@keymile.com>
      Cc: Toby Gray <toby.gray@realvnc.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      55db4c64
  6. 03 6月, 2011 3 次提交
  7. 02 6月, 2011 3 次提交
  8. 01 6月, 2011 4 次提交
    • L
      [media] v4l: Fix media_entity_to_video_device macro argument name · 6e3ea0e7
      Laurent Pinchart 提交于
      The name 'entity' is used twice in the macro body, once as the macro
      argument, and once as a structure field name. This breaks compilation if
      the macro is called with its argument not named 'entity'.
      
      Fix this by renaming the macro argument '__e'. This should avoid
      namespace clashes.
      Signed-off-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      6e3ea0e7
    • Y
      intel-iommu: Enable super page (2MiB, 1GiB, etc.) support · 6dd9a7c7
      Youquan Song 提交于
      There are no externally-visible changes with this. In the loop in the
      internal __domain_mapping() function, we simply detect if we are mapping:
        - size >= 2MiB, and
        - virtual address aligned to 2MiB, and
        - physical address aligned to 2MiB, and
        - on hardware that supports superpages.
      
      (and likewise for larger superpages).
      
      We automatically use a superpage for such mappings. We never have to
      worry about *breaking* superpages, since we trust that we will always
      *unmap* the same range that was mapped. So all we need to do is ensure
      that dma_pte_clear_range() will also cope with superpages.
      
      Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so
      it can return a PTE at the appropriate level rather than always
      extending the page tables all the way down to level 1. Again, this is
      simplified by the fact that we should never encounter existing small
      pages when we're creating a mapping; any old mapping that used the same
      virtual range will have been entirely removed and its obsolete page
      tables freed.
      
      Provide an 'intel_iommu=sp_off' argument on the command line as a
      chicken bit. Not that it should ever be required.
      
      ==
      
      The original commit seen in the iommu-2.6.git was Youquan's
      implementation (and completion) of my own half-baked code which I'd
      typed into an email. Followed by half a dozen subsequent 'fixes'.
      
      I've taken the unusual step of rewriting history and collapsing the
      original commits in order to keep the main history simpler, and make
      life easier for the people who are going to have to backport this to
      older kernels. And also so I can give it a more coherent commit comment
      which (hopefully) gives a better explanation of what's going on.
      
      The original sequence of commits leading to identical code was:
      
      Youquan Song (3):
            intel-iommu: super page support
            intel-iommu: Fix superpage alignment calculation error
            intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte()
      
      David Woodhouse (4):
            intel-iommu: Precalculate superpage support for dmar_domain
            intel-iommu: Fix hardware_largepage_caps()
            intel-iommu: Fix inappropriate use of superpages in __domain_mapping()
            intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      6dd9a7c7
    • R
      mtd: fix physmap.h warnings · 63da0290
      Randy Dunlap 提交于
      Fix build warnings in physmap.h:
      
      include/linux/mtd/physmap.h:25: warning: 'struct platform_device' declared inside parameter list
      include/linux/mtd/physmap.h:25: warning: its scope is only this definition or declaration, which is probably not what you want
      include/linux/mtd/physmap.h:26: warning: 'struct platform_device' declared inside parameter list
      include/linux/mtd/physmap.h:27: warning: 'struct platform_device' declared inside parameter list
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      63da0290
    • W
      sctp: stop pending timers and purge queues when peer restart asoc · a000c01e
      Wei Yongjun 提交于
      If the peer restart the asoc, we should not only fail any unsent/unacked
      data, but also stop the T3-rtx, SACK, T4-rto timers, and teardown ASCONF
      queues.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a000c01e
  9. 31 5月, 2011 2 次提交
  10. 30 5月, 2011 11 次提交
  11. 29 5月, 2011 9 次提交
    • M
      dm kcopyd: return client directly and not through a pointer · fa34ce73
      Mikulas Patocka 提交于
      Return client directly from dm_kcopyd_client_create, not through a
      parameter, making it consistent with dm_io_client_create.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fa34ce73
    • M
      dm kcopyd: reserve fewer pages · 5f43ba29
      Mikulas Patocka 提交于
      Reserve just the minimum of pages needed to process one job.
      
      Because we allocate pages from page allocator, we don't need to reserve
      a large number of pages.  The maximum job size is SUB_JOB_SIZE and we
      calculate the number of reserved pages based on this.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      5f43ba29
    • M
      dm io: use fixed initial mempool size · bda8efec
      Mikulas Patocka 提交于
      Replace the arbitrary calculation of an initial io struct mempool size
      with a constant.
      
      The code calculated the number of reserved structures based on the request
      size and used a "magic" multiplication constant of 4.  This patch changes
      it to reserve a fixed number - itself still chosen quite arbitrarily.
      Further testing might show if there is a better number to choose.
      
      Note that if there is no memory pressure, we can still allocate an
      arbitrary number of "struct io" structures.  One structure is enough to
      process the whole request.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      bda8efec
    • M
      dm table: allow targets to support discards internally · 4c259327
      Mike Snitzer 提交于
      Permit a target to support discards regardless of whether or not all its
      underlying devices do.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      4c259327
    • H
      [S390] mm: fix storage key handling · a43a9d93
      Heiko Carstens 提交于
      page_get_storage_key() and page_set_storage_key() expect a page address
      and not its page frame number. This got inconsistent with 2d42552d
      "[S390] merge page_test_dirty and page_clear_dirty".
      
      Result is that we read/write storage keys from random pages and do not
      have a working dirty bit tracking at all.
      E.g. SetPageUpdate() doesn't clear the dirty bit of requested pages, which
      for example ext4 doesn't like very much and panics after a while.
      
      Unable to handle kernel paging request at virtual user address (null)
      Oops: 0004 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      Modules linked in:
      CPU: 1 Not tainted 2.6.39-07551-g139f37f5-dirty #152
      Process flush-94:0 (pid: 1576, task: 000000003eb34538, ksp: 000000003c287b70)
      Krnl PSW : 0704c00180000000 0000000000316b12 (jbd2_journal_file_inode+0x10e/0x138)
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
      Krnl GPRS: 0000000000000000 0000000000000000 0000000000000000 0700000000000000
                 0000000000316a62 000000003eb34cd0 0000000000000025 000000003c287b88
                 0000000000000001 000000003c287a70 000000003f1ec678 000000003f1ec000
                 0000000000000000 000000003e66ec00 0000000000316a62 000000003c287988
      Krnl Code: 0000000000316b04: f0a0000407f4       srp     4(11,%r0),2036,0
                 0000000000316b0a: b9020022           ltgr    %r2,%r2
                 0000000000316b0e: a7740015           brc     7,316b38
                >0000000000316b12: e3d0c0000024       stg     %r13,0(%r12)
                 0000000000316b18: 4120c010           la      %r2,16(%r12)
                 0000000000316b1c: 4130d060           la      %r3,96(%r13)
                 0000000000316b20: e340d0600004       lg      %r4,96(%r13)
                 0000000000316b26: c0e50002b567       brasl   %r14,36d5f4
      Call Trace:
      ([<0000000000316a62>] jbd2_journal_file_inode+0x5e/0x138)
       [<00000000002da13c>] mpage_da_map_and_submit+0x2e8/0x42c
       [<00000000002daac2>] ext4_da_writepages+0x2da/0x504
       [<00000000002597e8>] writeback_single_inode+0xf8/0x268
       [<0000000000259f06>] writeback_sb_inodes+0xd2/0x18c
       [<000000000025a700>] writeback_inodes_wb+0x80/0x168
       [<000000000025aa92>] wb_writeback+0x2aa/0x324
       [<000000000025abde>] wb_do_writeback+0xd2/0x274
       [<000000000025ae3a>] bdi_writeback_thread+0xba/0x1c4
       [<00000000001737be>] kthread+0xa6/0xb0
       [<000000000056c1da>] kernel_thread_starter+0x6/0xc
       [<000000000056c1d4>] kernel_thread_starter+0x0/0xc
      INFO: lockdep is turned off.
      Last Breaking-Event-Address:
       [<0000000000316a8a>] jbd2_journal_file_inode+0x86/0x138
      Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      a43a9d93
    • L
      ACPI: Add D3 cold state · 28c2103d
      Lin Ming 提交于
      _SxW returns an Integer containing the lowest D-state supported in state
      Sx. If OSPM has not indicated that it supports _PR3, then the value “3”
      corresponds to D3.  If it has indicated _PR3 support, the value “3”
      represents D3hot and the value “4” represents D3cold.
      
      Linux does set _OSC._PR3, so we should fix it to expect that _SxW can
      return 4.
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Acked-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      28c2103d
    • L
      ACPI: processor: fix processor_physically_present in UP kernel · 932df741
      Lin Ming 提交于
      Usually, there are multiple processors defined in ACPI table, for
      example
      
          Scope (_PR)
          {
              Processor (CPU0, 0x00, 0x00000410, 0x06) {}
              Processor (CPU1, 0x01, 0x00000410, 0x06) {}
              Processor (CPU2, 0x02, 0x00000410, 0x06) {}
              Processor (CPU3, 0x03, 0x00000410, 0x06) {}
          }
      
      processor_physically_present(...) will be called to check whether those
      processors are physically present.
      
      Currently we have below codes in processor_physically_present,
      
      cpuid = acpi_get_cpuid(...);
      if ((cpuid == -1) && (num_possible_cpus() > 1))
              return false;
      return true;
      
      In UP kernel, acpi_get_cpuid(...) always return -1 and
      num_possible_cpus() always return 1, so
      processor_physically_present(...) always returns true for all passed in
      processor handles.
      
      This is wrong for UP processor or SMP processor running UP kernel.
      
      This patch removes the !SMP version of acpi_get_cpuid(), so both UP and
      SMP kernel use the same acpi_get_cpuid function.
      
      And for UP kernel, only processor 0 is valid.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=16548
      https://bugzilla.kernel.org/show_bug.cgi?id=16357Tested-by: NAnton Kochkov <anton.kochkov@gmail.com>
      Tested-by: NAmbroz Bizjak <ambrop7@gmail.com>
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      932df741
    • T
      idle governor: Avoid lock acquisition to read pm_qos before entering idle · 333c5ae9
      Tim Chen 提交于
      Thanks to the reviews and comments by Rafael, James, Mark and Andi.
      Here's version 2 of the patch incorporating your comments and also some
      update to my previous patch comments.
      
      I noticed that before entering idle state, the menu idle governor will
      look up the current pm_qos target value according to the list of qos
      requests received.  This look up currently needs the acquisition of a
      lock to access the list of qos requests to find the qos target value,
      slowing down the entrance into idle state due to contention by multiple
      cpus to access this list.  The contention is severe when there are a lot
      of cpus waking and going into idle.  For example, for a simple workload
      that has 32 pair of processes ping ponging messages to each other, where
      64 cpu cores are active in test system, I see the following profile with
      37.82% of cpu cycles spent in contention of pm_qos_lock:
      
      -     37.82%          swapper  [kernel.kallsyms]          [k]
      _raw_spin_lock_irqsave
         - _raw_spin_lock_irqsave
            - 95.65% pm_qos_request
                 menu_select
                 cpuidle_idle_call
               - cpu_idle
                    99.98% start_secondary
      
      A better approach will be to cache the updated pm_qos target value so
      reading it does not require lock acquisition as in the patch below.
      With this patch the contention for pm_qos_lock is removed and I saw a
      2.2X increase in throughput for my message passing workload.
      
      cc: stable@kernel.org
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJames Bottomley <James.Bottomley@suse.de>
      Acked-by: Nmark gross <markgross@thegnar.org>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      333c5ae9
    • E
      ns: Wire up the setns system call · 7b21fddd
      Eric W. Biederman 提交于
      32bit and 64bit on x86 are tested and working.  The rest I have looked
      at closely and I can't find any problems.
      
      setns is an easy system call to wire up.  It just takes two ints so I
      don't expect any weird architecture porting problems.
      
      While doing this I have noticed that we have some architectures that are
      very slow to get new system calls.  cris seems to be the slowest where
      the last system calls wired up were preadv and pwritev.  avr32 is weird
      in that recvmmsg was wired up but never declared in unistd.h.  frv is
      behind with perf_event_open being the last syscall wired up.  On h8300
      the last system call wired up was epoll_wait.  On m32r the last system
      call wired up was fallocate.  mn10300 has recvmmsg as the last system
      call wired up.  The rest seem to at least have syncfs wired up which was
      new in the 2.6.39.
      
      v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
      v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
      v4: Moved wiring up of the system call to another patch
      v5: ported to v2.6.39-rc6
      v6: rebased onto parisc-next and net-next to avoid syscall  conflicts.
      v7: ported to Linus's latest post 2.6.39 tree.
      
      >  arch/blackfin/include/asm/unistd.h     |    3 ++-
      >  arch/blackfin/mach-common/entry.S      |    1 +
      Acked-by: NMike Frysinger <vapier@gentoo.org>
      
      Oh - ia64 wiring looks good.
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b21fddd