1. 24 12月, 2010 3 次提交
  2. 23 12月, 2010 2 次提交
    • J
      taskstats: pad taskstats netlink response for aligment issues on ia64 · 4be2c95d
      Jeff Mahoney 提交于
      The taskstats structure is internally aligned on 8 byte boundaries but the
      layout of the aggregrate reply, with two NLA headers and the pid (each 4
      bytes), actually force the entire structure to be unaligned.  This causes
      the kernel to issue unaligned access warnings on some architectures like
      ia64.  Unfortunately, some software out there doesn't properly unroll the
      NLA packet and assumes that the start of the taskstats structure will
      always be 20 bytes from the start of the netlink payload.  Aligning the
      start of the taskstats structure breaks this software, which we don't
      want.  So, for now the alignment only happens on architectures that
      require it and those users will have to update to fixed versions of those
      packages.  Space is reserved in the packet only when needed.  This ifdef
      should be removed in several years e.g.  2012 once we can be confident
      that fixed versions are installed on most systems.  We add the padding
      before the aggregate since the aggregate is already a defined type.
      
      Commit 85893120 ("delayacct: align to 8 byte boundary on 64-bit systems")
      previously addressed the alignment issues by padding out the pid field.
      This was supposed to be a compatible change but the circumstances
      described above mean that it wasn't.  This patch backs out that change,
      since it was a hack, and introduces a new NULL attribute type to provide
      the padding.  Padding the response with 4 bytes avoids allocating an
      aligned taskstats structure and copying it back.  Since the structure
      weighs in at 328 bytes, it's too big to do it on the stack.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reported-by: NBrian Rogers <brian@xyzw.org>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Guillaume Chazarain <guichaz@gmail.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4be2c95d
    • W
      include/linux/unaligned: pack the whole struct rather than just the field · 4e06fd14
      Will Newton 提交于
      The current packed struct implementation of unaligned access adds the
      packed attribute only to the field within the unaligned struct rather than
      to the struct as a whole.  This is not sufficient to enforce proper
      behaviour on architectures with a default struct alignment of more than
      one byte.
      
      For example, the current implementation of __get_unaligned_cpu16 when
      compiled for arm with gcc -O1 -mstructure-size-boundary=32 assumes the
      struct is on a 4 byte boundary so performs the load of the 16bit packed
      field as if it were on a 4 byte boundary:
      
      __get_unaligned_cpu16:
              ldrh    r0, [r0, #0]
              bx      lr
      
      Moving the packed attribute to the struct rather than the field causes the
      proper unaligned access code to be generated:
      
      __get_unaligned_cpu16:
      	ldrb	r3, [r0, #0]	@ zero_extendqisi2
      	ldrb	r0, [r0, #1]	@ zero_extendqisi2
      	orr	r0, r3, r0, asl #8
      	bx	lr
      Signed-off-by: NWill Newton <will.newton@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e06fd14
  3. 21 12月, 2010 1 次提交
  4. 18 12月, 2010 3 次提交
  5. 17 12月, 2010 4 次提交
    • M
      block: max hardware sectors limit wrapper · 72d4cd9f
      Mike Snitzer 提交于
      Implement blk_limits_max_hw_sectors() and make
      blk_queue_max_hw_sectors() a wrapper around it.
      
      DM needs this to avoid setting queue_limits' max_hw_sectors and
      max_sectors directly.  dm_set_device_limits() now leverages
      blk_limits_max_hw_sectors() logic to establish the appropriate
      max_hw_sectors minimum (PAGE_SIZE).  Fixes issue where DM was
      incorrectly setting max_sectors rather than max_hw_sectors (which
      caused dm_merge_bvec()'s max_hw_sectors check to be ineffective).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@kernel.org
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      72d4cd9f
    • M
      block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead · e692cb66
      Martin K. Petersen 提交于
      When stacking devices, a request_queue is not always available. This
      forced us to have a no_cluster flag in the queue_limits that could be
      used as a carrier until the request_queue had been set up for a
      metadevice.
      
      There were several problems with that approach. First of all it was up
      to the stacking device to remember to set queue flag after stacking had
      completed. Also, the queue flag and the queue limits had to be kept in
      sync at all times. We got that wrong, which could lead to us issuing
      commands that went beyond the max scatterlist limit set by the driver.
      
      The proper fix is to avoid having two flags for tracking the same thing.
      We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the
      block layer merging functions. The queue_limit 'no_cluster' is turned
      into 'cluster' to avoid double negatives and to ease stacking.
      Clustering defaults to being enabled as before. The queue flag logic is
      removed from the stacking function, and explicitly setting the cluster
      flag is no longer necessary in DM and MD.
      Reported-by: NEd Lin <ed.lin@promise.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      e692cb66
    • H
      SSB: Fix nvram_get on BCM47xx platform · 3f84622d
      Hauke Mehrtens 提交于
      The nvram_get function was never in the mainline kernel, it only existed in
      an external OpenWrt patch. Use nvram_getenv function, which is in mainline
      and use an include instead of an extra function declaration.  et0macaddr
      contains the mac address in text from like 00:11:22:33:44:55. We have to
      parse it before adding it into macaddr.
      
      nvram_parse_macaddr will be merged into asm/mach-bcm47xx/nvram.h through
      the MIPS git tree and will be available soon. It will not build now without
      nvram_parse_macaddr, but it hasn't before either.
      Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
      To: linux-mips@linux-mips.org
      Cc: mb@bu3sch.de
      Cc: netdev@vger.kernel.org
      Cc: Hauke Mehrtens <hauke@hauke-m.de>
      Acked-by: NMichael Buesch <mb@bu3sch.de>
      Patchwork: https://patchwork.linux-mips.org/patch/1849/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      3f84622d
    • R
      PM / Runtime: Fix pm_runtime_suspended() · f08f5a0a
      Rafael J. Wysocki 提交于
      There are some situations (e.g. in __pm_generic_call()), where
      pm_runtime_suspended() is used to decide whether or not to execute
      a device's (system) ->suspend() callback.  The callback is not
      executed if pm_runtime_suspended() returns true, but it does so
      for devices that don't even support runtime PM, because the
      power.disable_depth device field is ignored by it.  This leads to
      problems (i.e. devices are not suspened when they should), so rework
      pm_runtime_suspended() so that it returns false if the device's
      power.disable_depth field is different from zero.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: stable@kernel.org
      f08f5a0a
  6. 16 12月, 2010 1 次提交
  7. 15 12月, 2010 1 次提交
    • D
      Input: define separate EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2 · ab4e0192
      Dmitry Torokhov 提交于
      The desire to keep old names for the EVIOCGKEYCODE/EVIOCSKEYCODE while
      extending them to support large scancodes was a mistake. While we tried
      to keep ABI intact (and we succeeded in doing that, programs compiled
      on older kernels will work on newer ones) there is still a problem with
      recompiling existing software with newer kernel headers.
      
      New kernel headers will supply updated ioctl numbers and kernel will
      expect that userspace will use struct input_keymap_entry to set and
      retrieve keymap data. But since the names of ioctls are still the same
      userspace will happily compile even if not adjusted to make use of the
      new structure and will start miraculously fail in the field.
      
      To avoid this issue let's revert EVIOCGKEYCODE/EVIOCSKEYCODE definitions
      and add EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2 so that userspace can explicitly
      select the style of ioctls it wants to employ.
      Reviewed-by: NHenrik Rydberg <rydberg@euromail.se>
      Acked-by: NJarod Wilson <jarod@redhat.com>
      Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      Signed-off-by: NDmitry Torokhov <dtor@mail.ru>
      ab4e0192
  8. 14 12月, 2010 1 次提交
  9. 11 12月, 2010 3 次提交
    • L
      ACPI: video: fix build for VIDEO_OUTPUT_CONTROL=n · 3353bebe
      Len Brown 提交于
      drivers/built-in.o: In function `acpi_video_bus_put_devices':
      video.c:(.text+0x79663): undefined reference to
      `video_output_unregister'
      drivers/built-in.o: In function `acpi_video_bus_add':
      video.c:(.text+0x7b0b3): undefined reference to `video_output_register'
      Signed-off-by: NLen Brown <len.brown@intel.com>
      3353bebe
    • L
      acpi: fix _OSI string setup regression · d90aa92c
      Lin Ming 提交于
      commit b0ed7a91(ACPICA/ACPI: Add new host interfaces for _OSI suppor)
      introduced a regression that _OSI string setup fails.
      
      There are 2 paths to setup _OSI string.
      
      DMI:
      acpi_dmi_osi_linux -> set_osi_linux -> acpi_osi_setup -> copy _OSI
      string to osi_setup_string
      
      Boot command line:
      acpi_osi_setup -> copy _OSI string to osi_setup_string
      
      Later, acpi_osi_setup_late will be called to handle osi_setup_string.
      If _OSI string is "Linux" or "!Linux", then the call path is,
      
      acpi_osi_setup_late -> acpi_cmdline_osi_linux -> set_osi_linux ->
      acpi_osi_setup -> copy _OSI string to osi_setup_string
      
      This actually never installs _OSI string(acpi_install_interface not
      called), but just copy the _OSI string to osi_setup_string.
      
      This patch fixes the regression.
      Reported-and-tested-by: NLukas Hejtmanek <xhejtman@ics.muni.cz>
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      d90aa92c
    • D
      atm: correct sysfs 'device' link creation and parent relationships · d9ca676b
      Dan Williams 提交于
      The ATM subsystem was incorrectly creating the 'device' link for ATM
      nodes in sysfs.  This led to incorrect device/parent relationships
      exposed by sysfs and udev.  Instead of rolling the 'device' link by hand
      in the generic ATM code, pass each ATM driver's bus device down to the
      sysfs code and let sysfs do this stuff correctly.
      Signed-off-by: NDan Williams <dcbw@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9ca676b
  10. 09 12月, 2010 3 次提交
  11. 08 12月, 2010 5 次提交
    • T
      nfs: remove extraneous and problematic calls to nfs_clear_request · 2df485a7
      Trond Myklebust 提交于
      When a nfs_page is freed, nfs_free_request is called which also calls
      nfs_clear_request to clean out the lock and open contexts and free the
      pagecache page.
      
      However, a couple of places in the nfs code call nfs_clear_request
      themselves. What happens here if the refcount on the request is still high?
      We'll be releasing contexts and freeing pointers while the request is
      possibly still in use.
      
      Remove those bare calls to nfs_clear_context. That should only be done when
      the request is being freed.
      
      Note that when doing this, we need to watch out for tests of req->wb_page.
      Previously, nfs_set_page_tag_locked() and nfs_clear_page_tag_locked()
      would check the value of req->wb_page to figure out if the page is mapped
      into the nfsi->nfs_page_tree. We now indicate the page is mapped using
      the new bit PG_MAPPED in req->wb_flags .
      Reported-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      2df485a7
    • L
      fanotify: Introduce FAN_NOFD · e9a3854f
      Lino Sanfilippo 提交于
      FAN_NOFD is used in fanotify events that do not provide an open file
      descriptor (like the overflow_event).
      Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      e9a3854f
    • L
      fanotify: on group destroy allow all waiters to bypass permission check · 09e5f14e
      Lino Sanfilippo 提交于
      When fanotify_release() is called, there may still be processes waiting for
      access permission. Currently only processes for which an event has already been
      queued into the groups access list will be woken up.  Processes for which no
      event has been queued will continue to sleep and thus cause a deadlock when
      fsnotify_put_group() is called.
      Furthermore there is a race allowing further processes to be waiting on the
      access wait queue after wake_up (if they arrive before clear_marks_by_group()
      is called).
      This patch corrects this by setting a flag to inform processes that the group
      is about to be destroyed and thus not to wait for access permission.
      
      [additional changelog from eparis]
      Lets think about the 4 relevant code paths from the PoV of the
      'operator' 'listener' 'responder' and 'closer'.  Where operator is the
      process doing an action (like open/read) which could require permission.
      Listener is the task (or in this case thread) slated with reading from
      the fanotify file descriptor.  The 'responder' is the thread responsible
      for responding to access requests.  'Closer' is the thread attempting to
      close the fanotify file descriptor.
      
      The 'operator' is going to end up in:
      fanotify_handle_event()
        get_response_from_access()
          (THIS BLOCKS WAITING ON USERSPACE)
      
      The 'listener' interesting code path
      fanotify_read()
        copy_event_to_user()
          prepare_for_access_response()
            (THIS CREATES AN fanotify_response_event)
      
      The 'responder' code path:
      fanotify_write()
        process_access_response()
          (REMOVE A fanotify_response_event, SET RESPONSE, WAKE UP 'operator')
      
      The 'closer':
      fanotify_release()
        (SUPPOSED TO CLEAN UP THE REST OF THIS MESS)
      
      What we have today is that in the closer we remove all of the
      fanotify_response_events and set a bit so no more response events are
      ever created in prepare_for_access_response().
      
      The bug is that we never wake all of the operators up and tell them to
      move along.  You fix that in fanotify_get_response_from_access().  You
      also fix other operators which haven't gotten there yet.  So I agree
      that's a good fix.
      [/additional changelog from eparis]
      
      [remove additional changes to minimize patch size]
      [move initialization so it was inside CONFIG_FANOTIFY_PERMISSION]
      Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      09e5f14e
    • L
      fanotify: if set by user unset FMODE_NONOTIFY before fsnotify_perm() is called · b1085ba8
      Lino Sanfilippo 提交于
      Unsetting FMODE_NONOTIFY in fsnotify_open() is too late, since fsnotify_perm()
      is called before. If FMODE_NONOTIFY is set fsnotify_perm() will skip permission
      checks, so a user can still disable permission checks by setting this flag
      in an open() call.
      This patch corrects this by unsetting the flag before fsnotify_perm is called.
      Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      b1085ba8
    • E
      fanotify: remove packed from access response message · 88d60c32
      Eric Paris 提交于
      Since fanotify has decided to be careful about alignment and packing
      rather than rely on __attribute__((packed)) for multiarch support.
      Since this attribute isn't doing anything on fanotify_response we just
      drop it.  This does not break API/ABI.
      Suggested-by: NTvrtko Ursulin <tvrtko.ursulin@sophos.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      88d60c32
  12. 07 12月, 2010 2 次提交
  13. 06 12月, 2010 1 次提交
  14. 03 12月, 2010 2 次提交
    • K
      mem-hotplug: introduce {un}lock_memory_hotplug() · 20d6c96b
      KOSAKI Motohiro 提交于
      Presently hwpoison is using lock_system_sleep() to prevent a race with
      memory hotplug.  However lock_system_sleep() is a no-op if
      CONFIG_HIBERNATION=n.  Therefore we need a new lock.
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Suggested-by: NHugh Dickins <hughd@google.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20d6c96b
    • J
      vmalloc: eagerly clear ptes on vunmap · 64141da5
      Jeremy Fitzhardinge 提交于
      On stock 2.6.37-rc4, running:
      
        # mount lilith:/export /mnt/lilith
        # find  /mnt/lilith/ -type f -print0 | xargs -0 file
      
      crashes the machine fairly quickly under Xen.  Often it results in oops
      messages, but the couple of times I tried just now, it just hung quietly
      and made Xen print some rude messages:
      
          (XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp
          3000000000000000) for mfn 1d7058 (pfn 18fa7)
          (XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms
          (XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp
          1000000000000000) for mfn 1d2e04 (pfn 1d1fb)
          (XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04
      
      Which means the domain tried to map a pagetable page RW, which would
      allow it to map arbitrary memory, so Xen stopped it.  This is because
      vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had
      finished with them, and those pages got recycled as pagetable pages
      while still having these RW aliases.
      
      Removing those mappings immediately removes the Xen-visible aliases, and
      so it has no problem with those pages being reused as pagetable pages.
      Deferring the TLB flush doesn't upset Xen because it can flush the TLB
      itself as needed to maintain its invariants.
      
      When unmapping a region in the vmalloc space, clear the ptes
      immediately.  There's no point in deferring this because there's no
      amortization benefit.
      
      The TLBs are left dirty, and they are flushed lazily to amortize the
      cost of the IPIs.
      
      This specific motivation for this patch is an oops-causing regression
      since 2.6.36 when using NFS under Xen, triggered by the NFS client's use
      of vm_map_ram() introduced in 56e4ebf8 ("NFS: readdir with vmapped
      pages") .  XFS also uses vm_map_ram() and could cause similar problems.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Bryan Schumaker <bjschuma@netapp.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64141da5
  15. 02 12月, 2010 2 次提交
    • T
      NFS: Fix a memory leak in nfs_readdir · 11de3b11
      Trond Myklebust 提交于
      We need to ensure that the entries in the nfs_cache_array get cleared
      when the page is removed from the page cache. To do so, we use the
      freepage address_space operation.
      
      Change nfs_readdir_clear_array to use kmap_atomic(), so that the
      function can be safely called from all contexts.
      
      Finally, modify the cache_page_release helper to call
      nfs_readdir_clear_array directly, when dealing with an anonymous
      page from 'uncached_readdir'.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      11de3b11
    • L
      Call the filesystem back whenever a page is removed from the page cache · 6072d13c
      Linus Torvalds 提交于
      NFS needs to be able to release objects that are stored in the page
      cache once the page itself is no longer visible from the page cache.
      
      This patch adds a callback to the address space operations that allows
      filesystems to perform page cleanups once the page has been removed
      from the page cache.
      
      Original patch by: Linus Torvalds <torvalds@linux-foundation.org>
      [trondmy: cover the cases of invalidate_inode_pages2() and
                truncate_inode_pages()]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      6072d13c
  16. 01 12月, 2010 4 次提交
  17. 30 11月, 2010 1 次提交
    • J
      TTY: open/hangup race fixup · acfa747b
      Jiri Slaby 提交于
      Like in the "TTY: don't allow reopen when ldisc is changing" patch,
      this one fixes a TTY WARNING as described in the option 1) there:
      1) __tty_hangup from tty_ldisc_hangup to tty_ldisc_enable. During this
      section tty_lock is held. However tty_lock is temporarily dropped in
      the middle of the function by tty_ldisc_hangup.
      
      The fix is to introduce a new flag which we set during the unlocked
      window and check it in tty_reopen too. The flag is TTY_HUPPING and is
      cleared after TTY_HUPPED is set.
      
      While at it, remove duplicate TTY_HUPPED set_bit. The one after
      calling ops->hangup seems to be more correct. But anyway, we hold
      tty_lock, so there should be no difference.
      
      Also document the function it does that kind of crap.
      
      Nicely reproducible with two forked children:
      static void do_work(const char *tty)
      {
      	if (signal(SIGHUP, SIG_IGN) == SIG_ERR) exit(1);
      	setsid();
      	while (1) {
      		int fd = open(tty, O_RDWR|O_NOCTTY);
      		if (fd < 0) continue;
      		if (ioctl(fd, TIOCSCTTY)) continue;
      		if (vhangup()) continue;
      		close(fd);
      	}
      	exit(0);
      }
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Reported-by: <Valdis.Kletnieks@vt.edu>
      Reported-by: NKyle McMartin <kyle@mcmartin.ca>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: stable <stable@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      acfa747b
  18. 29 11月, 2010 1 次提交