1. 15 1月, 2011 1 次提交
  2. 24 12月, 2010 6 次提交
  3. 23 12月, 2010 2 次提交
    • J
      taskstats: pad taskstats netlink response for aligment issues on ia64 · 4be2c95d
      Jeff Mahoney 提交于
      The taskstats structure is internally aligned on 8 byte boundaries but the
      layout of the aggregrate reply, with two NLA headers and the pid (each 4
      bytes), actually force the entire structure to be unaligned.  This causes
      the kernel to issue unaligned access warnings on some architectures like
      ia64.  Unfortunately, some software out there doesn't properly unroll the
      NLA packet and assumes that the start of the taskstats structure will
      always be 20 bytes from the start of the netlink payload.  Aligning the
      start of the taskstats structure breaks this software, which we don't
      want.  So, for now the alignment only happens on architectures that
      require it and those users will have to update to fixed versions of those
      packages.  Space is reserved in the packet only when needed.  This ifdef
      should be removed in several years e.g.  2012 once we can be confident
      that fixed versions are installed on most systems.  We add the padding
      before the aggregate since the aggregate is already a defined type.
      
      Commit 85893120 ("delayacct: align to 8 byte boundary on 64-bit systems")
      previously addressed the alignment issues by padding out the pid field.
      This was supposed to be a compatible change but the circumstances
      described above mean that it wasn't.  This patch backs out that change,
      since it was a hack, and introduces a new NULL attribute type to provide
      the padding.  Padding the response with 4 bytes avoids allocating an
      aligned taskstats structure and copying it back.  Since the structure
      weighs in at 328 bytes, it's too big to do it on the stack.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reported-by: NBrian Rogers <brian@xyzw.org>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Guillaume Chazarain <guichaz@gmail.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4be2c95d
    • W
      include/linux/unaligned: pack the whole struct rather than just the field · 4e06fd14
      Will Newton 提交于
      The current packed struct implementation of unaligned access adds the
      packed attribute only to the field within the unaligned struct rather than
      to the struct as a whole.  This is not sufficient to enforce proper
      behaviour on architectures with a default struct alignment of more than
      one byte.
      
      For example, the current implementation of __get_unaligned_cpu16 when
      compiled for arm with gcc -O1 -mstructure-size-boundary=32 assumes the
      struct is on a 4 byte boundary so performs the load of the 16bit packed
      field as if it were on a 4 byte boundary:
      
      __get_unaligned_cpu16:
              ldrh    r0, [r0, #0]
              bx      lr
      
      Moving the packed attribute to the struct rather than the field causes the
      proper unaligned access code to be generated:
      
      __get_unaligned_cpu16:
      	ldrb	r3, [r0, #0]	@ zero_extendqisi2
      	ldrb	r0, [r0, #1]	@ zero_extendqisi2
      	orr	r0, r3, r0, asl #8
      	bx	lr
      Signed-off-by: NWill Newton <will.newton@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e06fd14
  4. 21 12月, 2010 1 次提交
  5. 18 12月, 2010 3 次提交
  6. 17 12月, 2010 4 次提交
    • M
      block: max hardware sectors limit wrapper · 72d4cd9f
      Mike Snitzer 提交于
      Implement blk_limits_max_hw_sectors() and make
      blk_queue_max_hw_sectors() a wrapper around it.
      
      DM needs this to avoid setting queue_limits' max_hw_sectors and
      max_sectors directly.  dm_set_device_limits() now leverages
      blk_limits_max_hw_sectors() logic to establish the appropriate
      max_hw_sectors minimum (PAGE_SIZE).  Fixes issue where DM was
      incorrectly setting max_sectors rather than max_hw_sectors (which
      caused dm_merge_bvec()'s max_hw_sectors check to be ineffective).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@kernel.org
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      72d4cd9f
    • M
      block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead · e692cb66
      Martin K. Petersen 提交于
      When stacking devices, a request_queue is not always available. This
      forced us to have a no_cluster flag in the queue_limits that could be
      used as a carrier until the request_queue had been set up for a
      metadevice.
      
      There were several problems with that approach. First of all it was up
      to the stacking device to remember to set queue flag after stacking had
      completed. Also, the queue flag and the queue limits had to be kept in
      sync at all times. We got that wrong, which could lead to us issuing
      commands that went beyond the max scatterlist limit set by the driver.
      
      The proper fix is to avoid having two flags for tracking the same thing.
      We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the
      block layer merging functions. The queue_limit 'no_cluster' is turned
      into 'cluster' to avoid double negatives and to ease stacking.
      Clustering defaults to being enabled as before. The queue flag logic is
      removed from the stacking function, and explicitly setting the cluster
      flag is no longer necessary in DM and MD.
      Reported-by: NEd Lin <ed.lin@promise.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      e692cb66
    • H
      SSB: Fix nvram_get on BCM47xx platform · 3f84622d
      Hauke Mehrtens 提交于
      The nvram_get function was never in the mainline kernel, it only existed in
      an external OpenWrt patch. Use nvram_getenv function, which is in mainline
      and use an include instead of an extra function declaration.  et0macaddr
      contains the mac address in text from like 00:11:22:33:44:55. We have to
      parse it before adding it into macaddr.
      
      nvram_parse_macaddr will be merged into asm/mach-bcm47xx/nvram.h through
      the MIPS git tree and will be available soon. It will not build now without
      nvram_parse_macaddr, but it hasn't before either.
      Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
      To: linux-mips@linux-mips.org
      Cc: mb@bu3sch.de
      Cc: netdev@vger.kernel.org
      Cc: Hauke Mehrtens <hauke@hauke-m.de>
      Acked-by: NMichael Buesch <mb@bu3sch.de>
      Patchwork: https://patchwork.linux-mips.org/patch/1849/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      3f84622d
    • R
      PM / Runtime: Fix pm_runtime_suspended() · f08f5a0a
      Rafael J. Wysocki 提交于
      There are some situations (e.g. in __pm_generic_call()), where
      pm_runtime_suspended() is used to decide whether or not to execute
      a device's (system) ->suspend() callback.  The callback is not
      executed if pm_runtime_suspended() returns true, but it does so
      for devices that don't even support runtime PM, because the
      power.disable_depth device field is ignored by it.  This leads to
      problems (i.e. devices are not suspened when they should), so rework
      pm_runtime_suspended() so that it returns false if the device's
      power.disable_depth field is different from zero.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: stable@kernel.org
      f08f5a0a
  7. 16 12月, 2010 2 次提交
  8. 15 12月, 2010 1 次提交
    • D
      Input: define separate EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2 · ab4e0192
      Dmitry Torokhov 提交于
      The desire to keep old names for the EVIOCGKEYCODE/EVIOCSKEYCODE while
      extending them to support large scancodes was a mistake. While we tried
      to keep ABI intact (and we succeeded in doing that, programs compiled
      on older kernels will work on newer ones) there is still a problem with
      recompiling existing software with newer kernel headers.
      
      New kernel headers will supply updated ioctl numbers and kernel will
      expect that userspace will use struct input_keymap_entry to set and
      retrieve keymap data. But since the names of ioctls are still the same
      userspace will happily compile even if not adjusted to make use of the
      new structure and will start miraculously fail in the field.
      
      To avoid this issue let's revert EVIOCGKEYCODE/EVIOCSKEYCODE definitions
      and add EVIOCGKEYCODE_V2/EVIOCSKEYCODE_V2 so that userspace can explicitly
      select the style of ioctls it wants to employ.
      Reviewed-by: NHenrik Rydberg <rydberg@euromail.se>
      Acked-by: NJarod Wilson <jarod@redhat.com>
      Acked-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      Signed-off-by: NDmitry Torokhov <dtor@mail.ru>
      ab4e0192
  9. 14 12月, 2010 1 次提交
  10. 11 12月, 2010 4 次提交
  11. 09 12月, 2010 3 次提交
  12. 08 12月, 2010 5 次提交
    • T
      nfs: remove extraneous and problematic calls to nfs_clear_request · 2df485a7
      Trond Myklebust 提交于
      When a nfs_page is freed, nfs_free_request is called which also calls
      nfs_clear_request to clean out the lock and open contexts and free the
      pagecache page.
      
      However, a couple of places in the nfs code call nfs_clear_request
      themselves. What happens here if the refcount on the request is still high?
      We'll be releasing contexts and freeing pointers while the request is
      possibly still in use.
      
      Remove those bare calls to nfs_clear_context. That should only be done when
      the request is being freed.
      
      Note that when doing this, we need to watch out for tests of req->wb_page.
      Previously, nfs_set_page_tag_locked() and nfs_clear_page_tag_locked()
      would check the value of req->wb_page to figure out if the page is mapped
      into the nfsi->nfs_page_tree. We now indicate the page is mapped using
      the new bit PG_MAPPED in req->wb_flags .
      Reported-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      2df485a7
    • L
      fanotify: Introduce FAN_NOFD · e9a3854f
      Lino Sanfilippo 提交于
      FAN_NOFD is used in fanotify events that do not provide an open file
      descriptor (like the overflow_event).
      Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      e9a3854f
    • L
      fanotify: on group destroy allow all waiters to bypass permission check · 09e5f14e
      Lino Sanfilippo 提交于
      When fanotify_release() is called, there may still be processes waiting for
      access permission. Currently only processes for which an event has already been
      queued into the groups access list will be woken up.  Processes for which no
      event has been queued will continue to sleep and thus cause a deadlock when
      fsnotify_put_group() is called.
      Furthermore there is a race allowing further processes to be waiting on the
      access wait queue after wake_up (if they arrive before clear_marks_by_group()
      is called).
      This patch corrects this by setting a flag to inform processes that the group
      is about to be destroyed and thus not to wait for access permission.
      
      [additional changelog from eparis]
      Lets think about the 4 relevant code paths from the PoV of the
      'operator' 'listener' 'responder' and 'closer'.  Where operator is the
      process doing an action (like open/read) which could require permission.
      Listener is the task (or in this case thread) slated with reading from
      the fanotify file descriptor.  The 'responder' is the thread responsible
      for responding to access requests.  'Closer' is the thread attempting to
      close the fanotify file descriptor.
      
      The 'operator' is going to end up in:
      fanotify_handle_event()
        get_response_from_access()
          (THIS BLOCKS WAITING ON USERSPACE)
      
      The 'listener' interesting code path
      fanotify_read()
        copy_event_to_user()
          prepare_for_access_response()
            (THIS CREATES AN fanotify_response_event)
      
      The 'responder' code path:
      fanotify_write()
        process_access_response()
          (REMOVE A fanotify_response_event, SET RESPONSE, WAKE UP 'operator')
      
      The 'closer':
      fanotify_release()
        (SUPPOSED TO CLEAN UP THE REST OF THIS MESS)
      
      What we have today is that in the closer we remove all of the
      fanotify_response_events and set a bit so no more response events are
      ever created in prepare_for_access_response().
      
      The bug is that we never wake all of the operators up and tell them to
      move along.  You fix that in fanotify_get_response_from_access().  You
      also fix other operators which haven't gotten there yet.  So I agree
      that's a good fix.
      [/additional changelog from eparis]
      
      [remove additional changes to minimize patch size]
      [move initialization so it was inside CONFIG_FANOTIFY_PERMISSION]
      Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      09e5f14e
    • L
      fanotify: if set by user unset FMODE_NONOTIFY before fsnotify_perm() is called · b1085ba8
      Lino Sanfilippo 提交于
      Unsetting FMODE_NONOTIFY in fsnotify_open() is too late, since fsnotify_perm()
      is called before. If FMODE_NONOTIFY is set fsnotify_perm() will skip permission
      checks, so a user can still disable permission checks by setting this flag
      in an open() call.
      This patch corrects this by unsetting the flag before fsnotify_perm is called.
      Signed-off-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      b1085ba8
    • E
      fanotify: remove packed from access response message · 88d60c32
      Eric Paris 提交于
      Since fanotify has decided to be careful about alignment and packing
      rather than rely on __attribute__((packed)) for multiarch support.
      Since this attribute isn't doing anything on fanotify_response we just
      drop it.  This does not break API/ABI.
      Suggested-by: NTvrtko Ursulin <tvrtko.ursulin@sophos.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      88d60c32
  13. 07 12月, 2010 3 次提交
    • G
      Input: add input driver for polled GPIO buttons · 0e7d0c86
      Gabor Juhos 提交于
      The existing gpio-keys driver can be usable only for GPIO lines with
      interrupt support. Several devices have buttons connected to a GPIO
      line which is not capable to generate interrupts. This patch adds a
      new input driver using the generic GPIO layer and the input-polldev
      to support such buttons.
      
      [Ben Gardiner <bengardiner@nanometrics.ca: fold code to use more
       of the original gpio_keys infrastructure; cleanups and other
       improvements.]
      Signed-off-by: NGabor Juhos <juhosg@openwrt.org>
      Signed-off-by: NBen Gardiner <bengardiner@nanometrics.ca>
      Tested-by: NBen Gardiner <bengardiner@nanometrics.ca>
      Signed-off-by: NDmitry Torokhov <dtor@mail.ru>
      0e7d0c86
    • R
      PM / Hibernate: Fix memory corruption related to swap · c9e664f1
      Rafael J. Wysocki 提交于
      There is a problem that swap pages allocated before the creation of
      a hibernation image can be released and used for storing the contents
      of different memory pages while the image is being saved.  Since the
      kernel stored in the image doesn't know of that, it causes memory
      corruption to occur after resume from hibernation, especially on
      systems with relatively small RAM that need to swap often.
      
      This issue can be addressed by keeping the GFP_IOFS bits clear
      in gfp_allowed_mask during the entire hibernation, including the
      saving of the image, until the system is finally turned off or
      the hibernation is aborted.  Unfortunately, for this purpose
      it's necessary to rework the way in which the hibernate and
      suspend code manipulates gfp_allowed_mask.
      
      This change is based on an earlier patch from Hugh Dickins.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Reported-by: NOndrej Zary <linux@rainbow-software.org>
      Acked-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: stable@kernel.org
      c9e664f1
    • E
      filter: fix sk_filter rcu handling · 46bcf14f
      Eric Dumazet 提交于
      Pavel Emelyanov tried to fix a race between sk_filter_(de|at)tach and
      sk_clone() in commit 47e958ea
      
      Problem is we can have several clones sharing a common sk_filter, and
      these clones might want to sk_filter_attach() their own filters at the
      same time, and can overwrite old_filter->rcu, corrupting RCU queues.
      
      We can not use filter->rcu without being sure no other thread could do
      the same thing.
      
      Switch code to a more conventional ref-counting technique : Do the
      atomic decrement immediately and queue one rcu call back when last
      reference is released.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46bcf14f
  14. 06 12月, 2010 1 次提交
  15. 05 12月, 2010 1 次提交
  16. 03 12月, 2010 2 次提交
    • K
      mem-hotplug: introduce {un}lock_memory_hotplug() · 20d6c96b
      KOSAKI Motohiro 提交于
      Presently hwpoison is using lock_system_sleep() to prevent a race with
      memory hotplug.  However lock_system_sleep() is a no-op if
      CONFIG_HIBERNATION=n.  Therefore we need a new lock.
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Suggested-by: NHugh Dickins <hughd@google.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20d6c96b
    • J
      vmalloc: eagerly clear ptes on vunmap · 64141da5
      Jeremy Fitzhardinge 提交于
      On stock 2.6.37-rc4, running:
      
        # mount lilith:/export /mnt/lilith
        # find  /mnt/lilith/ -type f -print0 | xargs -0 file
      
      crashes the machine fairly quickly under Xen.  Often it results in oops
      messages, but the couple of times I tried just now, it just hung quietly
      and made Xen print some rude messages:
      
          (XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp
          3000000000000000) for mfn 1d7058 (pfn 18fa7)
          (XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms
          (XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp
          1000000000000000) for mfn 1d2e04 (pfn 1d1fb)
          (XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04
      
      Which means the domain tried to map a pagetable page RW, which would
      allow it to map arbitrary memory, so Xen stopped it.  This is because
      vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had
      finished with them, and those pages got recycled as pagetable pages
      while still having these RW aliases.
      
      Removing those mappings immediately removes the Xen-visible aliases, and
      so it has no problem with those pages being reused as pagetable pages.
      Deferring the TLB flush doesn't upset Xen because it can flush the TLB
      itself as needed to maintain its invariants.
      
      When unmapping a region in the vmalloc space, clear the ptes
      immediately.  There's no point in deferring this because there's no
      amortization benefit.
      
      The TLBs are left dirty, and they are flushed lazily to amortize the
      cost of the IPIs.
      
      This specific motivation for this patch is an oops-causing regression
      since 2.6.36 when using NFS under Xen, triggered by the NFS client's use
      of vm_map_ram() introduced in 56e4ebf8 ("NFS: readdir with vmapped
      pages") .  XFS also uses vm_map_ram() and could cause similar problems.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Bryan Schumaker <bjschuma@netapp.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64141da5