1. 30 1月, 2008 40 次提交
    • G
      x86: scale cyc_2_nsec according to CPU frequency · 53d517cd
      Guillaume Chazarain 提交于
      scale the sched_clock() cyc_2_nsec scaling factor according to
      CPU frequency changes.
      
      [ mingo@elte.hu: simplified it and fixed it for SMP. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      53d517cd
    • R
      x86: protect against sigaltstack wraparound · 83bd0102
      Roland McGrath 提交于
      cf http://lkml.org/lkml/2007/10/3/41
      
      To summarize: on Linux, SA_ONSTACK decides whether you are already on the
      signal stack based on the value of the SP at the time of a signal.  If
      you are not already inside the range, you are not "on the signal stack"
      and so the new signal handler frame starts over at the base of the signal
      stack.
      
      sigaltstack (and sigstack before it) was invented in BSD.  There, the
      SA_ONSTACK behavior has always been different.  It uses a kernel state
      flag to decide, rather than the SP value.  When you first take an
      SA_ONSTACK signal and switch to the alternate signal stack, it sets the
      SS_ONSTACK flag in the thread's sigaltstack state in the kernel.
      Thereafter you are "on the signal stack" and don't switch SP before
      pushing a handler frame no matter what the SP value is.  Only when you
      sigreturn from the original handler context do you clear the SS_ONSTACK
      flag so that a new handler frame will start over at the base of the
      alternate signal stack.
      
      The undesireable effect of the Linux behavior is that an overflow of the
      alternate signal stack can not only go undetected, but lead to a ring
      buffer effect of clobbering the original handler frame at the base of the
      signal stack for each successive signal that comes just after the
      overflow.  This is what Shi Weihua's test case demonstrates.  Normally
      this does not come up because of the signal mask, but the test case uses
      SA_NODEFER for its SIGSEGV handler.
      
      The other subtle part of the existing Linux semantics is that a simple
      longjmp out of a signal handler serves to take you off the signal stack
      in a safe and reliable fashion without having used sigreturn (nor having
      just returned from the handler normally, which means the same).  After
      the longjmp (or even informal stack switching not via any proper libc or
      kernel interface), the alternate signal stack stands ready to be used
      again.
      
      A paranoid program would allocate a PROT_NONE red zone around its
      alternate signal stack.  Then a small overflow would trigger a SIGSEGV in
      handler setup, and be fatal (core dump) whether or not SIGSEGV is
      blocked.  As with thread stack red zones, that cannot catch all overflows
      (or underflows).  e.g., a local array as large as page size allocated in
      a function called from a handler, but not actually touched before more
      calls push more stack, could cause an overflow that silently pushes into
      some unrelated allocated pages.
      
      The BSD behavior does not do anything in particular about overflow.  But
      it does at least avoid the wraparound or "ring buffer effect", so you'll
      just get a straightforward all-out overflow down your address space past
      the low end of the alternate signal stack.  I don't know what the BSD
      behavior is for longjmp out of an SA_ONSTACK handler.
      
      The POSIX wording relating to sigaltstack is pretty minimal.  I don't
      think it speaks to this issue one way or another.  (The program that
      overflows its stack is clearly in undefined behavior territory of one
      sort or another anyhow.)
      
      Given the longjmp issue and the potential for highly subtle complications
      in existing programs relying on this in arcane ways deep in their code, I
      am very dubious about changing the behavior to the BSD style persistent
      flag.  I think Shi Weihua's patches have a similar effect by tracking the
      SP used in the last handler setup.
      
      I think it would be sensible for the signal handler setup code to detect
      when it would itself be causing a stack overflow.  Maybe something like
      the following patch (untested).  This issue exists in the same way on all
      machines, so ideally they would all do a similar check.
      
      When it's the handler function itself or its callees that cause the
      overflow, rather than the signal handler frame setup alone crossing the
      boundary, this still won't help.  But I don't see any way to distinguish
      that from the valid longjmp case.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      83bd0102
    • I
      x86: add DMI quirk for io-delay hangs on Compaq Presario V6000 laptops · f9fc5891
      Ingo Molnar 提交于
      add the DMI strings provided by Islam Amer <pharon@gmail.com>, for
      the Compaq Presario V6000 (Quanta/30B7).
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      f9fc5891
    • I
      x86: make io_delay=0xed the default · d0049e71
      Ingo Molnar 提交于
      make io_delay=0xed the default. This frees up port 0x80 which is
      a debug port on some machines and locks up certain laptops.
      
      Testing only for now. Try the io_delay=0x80 boot option if this does not
      work for you.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d0049e71
    • I
      x86: various changes and cleanups to in_p/out_p delay details · 6e7c4025
      Ingo Molnar 提交于
      various changes to the in_p/out_p delay details:
      
      - add the io_delay=none method
      - make each method selectable from the kernel config
      - simplify the delay code a bit by getting rid of an indirect function call
      - add the /proc/sys/kernel/io_delay_type sysctl
      - change 'io_delay=standard|alternate' to io_delay=0x80 and io_delay=0xed
      - make the io delay config not depend on CONFIG_DEBUG_KERNEL
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: N"David P. Reed" <dpreed@reed.com>
      6e7c4025
    • R
      x86: provide a DMI based port 0x80 I/O delay override. · b02aae9c
      Rene Herman 提交于
      x86: provide a DMI based port 0x80 I/O delay override.
      
      Certain (HP) laptops experience trouble from our port 0x80 I/O delay
      writes. This patch provides for a DMI based switch to the "alternate
      diagnostic port" 0xed (as used by some BIOSes as well) for these.
      
      David P. Reed confirmed that port 0xed works for him and provides a
      proper delay. The symptoms of _not_ working are a hanging machine,
      with "hwclock" use being a direct trigger.
      
      Earlier versions of this attempted to simply use udelay(2), with the
      2 being a value tested to be a nicely conservative upper-bound with
      help from many on the linux-kernel mailinglist but that approach has
      two problems.
      
      First, pre-loops_per_jiffy calibration (which is post PIT init while
      some implementations of the PIT are actually one of the historically
      problematic devices that need the delay) udelay() isn't particularly
      well-defined. We could initialise loops_per_jiffy conservatively (and
      based on CPU family so as to not unduly delay old machines) which
      would sort of work, but...
      
      Second, delaying isn't the only effect that a write to port 0x80 has.
      It's also a PCI posting barrier which some devices may be explicitly
      or implicitly relying on. Alan Cox did a survey and found evidence
      that additionally some drivers may be racy on SMP without the bus
      locking outb.
      
      Switching to an inb() makes the timing too unpredictable and as such,
      this DMI based switch should be the safest approach for now. Any more
      invasive changes should get more rigid testing first. It's moreover
      only very few machines with the problem and a DMI based hack seems
      to fit that situation.
      
      This also introduces a command-line parameter "io_delay" to override
      the DMI based choice again:
      
      	io_delay=<standard|alternate>
      
      where "standard" means using the standard port 0x80 and "alternate"
      port 0xed.
      
      This retains the udelay method as a config (CONFIG_UDELAY_IO_DELAY) and
      command-line ("io_delay=udelay") choice for testing purposes as well.
      
      This does not change the io_delay() in the boot code which is using
      the same port 0x80 I/O delay but those do not appear to be a problem
      as David P. Reed reported the problem was already gone after using the
      udelay version. He moreover reported that booting with "acpi=off" also
      fixed things and seeing as how ACPI isn't touched until after this DMI
      based I/O port switch I believe it's safe to leave the ones in the boot
      code be.
      
      The DMI strings from David's HP Pavilion dv9000z are in there already
      and we need to get/verify the DMI info from other machines with the
      problem, notably the HP Pavilion dv6000z.
      
      This patch is partly based on earlier patches from Pavel Machek and
      David P. Reed.
      Signed-off-by: NRene Herman <rene.herman@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b02aae9c
    • M
      x86: fix: s2ram + P4 + tsc = annoyance · 4c6b8b4d
      Mike Galbraith 提交于
      s2ram recently became useful here, except for the kernel's annoying
      habit of disabling my P4's perfectly good TSC.
      
      [  107.894470] CPU 1 is now offline
      [  107.894474] SMP alternatives: switching to UP code
      [  107.895832] CPU0 attaching sched-domain:
      [  107.895836]  domain 0: span 1
      [  107.895838]   groups: 1
      [  107.896097] CPU1 is down
      [    3.726156] Intel machine check architecture supported.
      [    3.726165] Intel machine check reporting enabled on CPU#0.
      [    3.726167] CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
      [    3.726170] CPU0: Thermal monitoring enabled
      [    3.726175] Back to C!
      [    3.726708] Force enabled HPET at resume
      [    3.726775] Enabling non-boot CPUs ...
      [    3.727049] CPU0 attaching NULL sched-domain.
      [    3.727165] SMP alternatives: switching to SMP code
      [    3.727858] Booting processor 1/1 eip 3000
      [    3.727862] CPU 1 irqstacks, hard=b042f000 soft=b042d000
      [    3.738173] Initializing CPU#1
      [    3.798912] Calibrating delay using timer specific routine.. 5986.12 BogoMIPS (lpj=2993061)
      [    3.798920] CPU: After generic identify, caps: bfebfbff 00000000 00000000 00000000 00004400 00000000 00000000 00000000
      [    3.798931] CPU: Trace cache: 12K uops, L1 D cache: 8K
      [    3.798934] CPU: L2 cache: 512K
      [    3.798936] CPU: Physical Processor ID: 0
      [    3.798938] CPU: After all inits, caps: bfebfbff 00000000 00000000 0000b080 00004400 00000000 00000000 00000000
      [    3.798946] Intel machine check architecture supported.
      [    3.798952] Intel machine check reporting enabled on CPU#1.
      [    3.798955] CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
      [    3.798959] CPU1: Thermal monitoring enabled
      [    3.799161] CPU1: Intel(R) Pentium(R) 4 CPU 3.00GHz stepping 09
      [    3.799187] checking TSC synchronization [CPU#0 -> CPU#1]:
      [    3.819181] Measured 63588552840 cycles TSC warp between CPUs, turning off TSC clock.
      [    3.819184] Marking TSC unstable due to: check_tsc_sync_source failed.
      
      If check_tsc_warp() is called after initial boot, and the TSC has in the
      meantime been set (BIOS, user, silicon, elves) to a value lower than the
      last stored/stale value, we blame the TSC.  Reset to pristine condition
      after every test.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      4c6b8b4d
    • R
      x86: hibernation: document __save_processor_state() on x86 · 5c9c9bec
      Rafael J. Wysocki 提交于
      Document the fact that __save_processor_state() has to save all CPU
      registers referred to by the kernel in case a different kernel is
      used to load and restore a hibernation image containing it.
      Sigend-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5c9c9bec
    • S
      x86: fix make mrproper · 9484b1eb
      Sam Ravnborg 提交于
      Michael Opdenacker reported:
      
      For backward compatibility with earlier (< 2.6.24) kernels,
      arch/i386/boot/bzImage or arch/x86_64/boot/bzImage symbolic links to
      arch/x86/boot/bzImage are created when you build an x86 kernel. The
      arch/i386 or arch/x86_64 directories are then created for this only
      purpose.
      
      Issue: these generated directories and symbolic links are *not cleaned
      up* when you run "make mrproper" (and thus "make distclean"). This
      disturbs the production of patches, because the source tree is left with
      generated files and directories.
      
      Sam has an alternative fix:
      
      The directory is killed during make clean as opposed to make mrproper.
      Reported-by: NMichael Opdenacker <michael-lists@free-electrons.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      9484b1eb
    • V
      time: track accurate idle time with tick_sched.idle_sleeptime · 6378ddb5
      Venki Pallipadi 提交于
      Current idle time in kstat is based on jiffies and is coarse grained.
      tick_sched.idle_sleeptime is making some attempt to keep track of idle time
      in a fine grained manner.  But, it is not handling the time spent in
      interrupts fully.
      
      Make tick_sched.idle_sleeptime accurate with respect to time spent on
      handling interrupts and also add tick_sched.idle_lastupdate, which keeps
      track of last time when idle_sleeptime was updated.
      
      This statistics will be crucial for cpufreq-ondemand governor, which can
      shed some conservative gaurd band that is uses today while setting the
      frequency.  The ondemand changes that uses the exact idle time is coming
      soon.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6378ddb5
    • J
      NTP: correct inconsistent ntp interval/tick_length usage · bbe4d18a
      john stultz 提交于
      I recently noticed on one of my boxes that when synched with an NTP
      server, the drift value reported for the system was ~283ppm. While in
      some cases, clock hardware can be that bad, it struck me as unusual as
      the system was using the acpi_pm clocksource, which is one of the more
      trustworthy and accurate clocksources on x86 hardware.
      
      I brought up another system and let it sync to the same NTP server, and
      I noticed a similar 280some ppm drift.
      
      In looking at the code, I found that the acpi_pm's constant frequency
      was being computed correctly at boot-up, however once the system was up,
      even without the ntp daemon running, the clocksource's frequency was
      being modified by the clocksource_adjust() function.
      
      Digging deeper, I realized that in the code that keeps track of how much
      the clocksource is skewing from the ntp desired time, we were using
      different lengths to establish how long an time interval was.
      
      The clocksource was being setup with the following interval:
      	NTP_INTERVAL_LENGTH = NSEC_PER_SEC/NTP_INTERVAL_FREQ
      
      While the ntp code was using the tick_length_base value:
      	tick_length_base ~= (tick_usec * NSEC_PER_USEC * USER_HZ)
      					/NTP_INTERVAL_FREQ
      
      The subtle difference is:
      	(tick_usec * NSEC_PER_USEC * USER_HZ) != NSEC_PER_SEC
      
      This difference in calculation was causing the clocksource correction
      code to apply a correction factor to the clocksource so the two
      intervals were the same, however this results in the actual frequency of
      the clocksource to be made incorrect. I believe this difference would
      affect all clocksources, although to differing degrees depending on the
      clocksource resolution.
      
      The issue was introduced when my HZ free ntp patch landed in 2.6.21-rc1,
      so my apologies for the mistake, and for not noticing it until now.
      
      The following patch, corrects the clocksource's initialization code so
      it uses the same interval length as the code in ntp.c. After applying
      this patch, the drift value for the same system went from ~283ppm to
      only 2.635ppm.
      
      I believe this patch to be good, however it does affect all arches and
      I've only tested on x86, so some caution is advised. I do think it would
      be a likely candidate for a stable 2.6.24.x release.
      
      Any thoughts or feedback would be appreciated.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      bbe4d18a
    • B
      x86: assign IRQs to HPET timers, fix · 37a47db8
      Balaji Rao 提交于
      Looks like IRQ 31 is assigned to timer 3, even without the patch!
      I wonder who wrote the number 31. But the manual says that it is
      zero by default.
      
      I think we should check whether the timer has been allocated an IRQ before
      proceeding to assign one to it.  Here is a patch that does this.
      Signed-off-by: NBalaji Rao <balajirrao@gmail.com>
      Tested-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      37a47db8
    • B
      x86: assign IRQs to HPET timers · e3f37a54
      Balaji Rao 提交于
      The userspace API for the HPET (see Documentation/hpet.txt) did not work. The
      HPET_IE_ON ioctl was failing as there was no IRQ assigned to the timer
      device. This patch fixes it by allocating IRQs to timer blocks in the HPET.
      
      arch/x86/kernel/hpet.c |   13 +++++--------
      drivers/char/hpet.c    |   45 ++++++++++++++++++++++++++++++++++++++-------
      include/linux/hpet.h   |    2 +-
      3 files changed, 44 insertions(+), 16 deletions(-)
      Signed-off-by: NBalaji Rao <balajirrao@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      e3f37a54
    • I
      x86: make clockevents more robust · 45fe4fe1
      Ingo Molnar 提交于
      detect zero event-device multiplicators - they then cause
      division-by-zero crashes if a clockevent has been initialized
      incorrectly.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      45fe4fe1
    • T
      x86: unregister PIT clocksource when PIT is disabled · 1a0c009a
      Thomas Gleixner 提交于
      The following scenario might leave PIT as a disfunctional clock source:
      
          PIT is registered as clocksource
          PM_TIMER is registered as clocksource and enables highres/dyntick mode
          PIT is switched to oneshot mode
          -> now the readout of PIT is bogus, but the user might select PIT
          via the sysfs override, which would break the box as the time
          readout is unusable.
      
      Unregister the PIT clocksource when the PIT clock event device is switched
      into shutdown / oneshot mode.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1a0c009a
    • T
      clocksource: add unregister function to disable unusable clocksources · 4713e22c
      Thomas Gleixner 提交于
      On x86 the PIT might become an unusable clocksource. Add an unregister
      function to provide a possibilty to remove the PIT from the list of
      available clock sources.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4713e22c
    • T
      x86: restrict PIT clocksource usage · 316da3b3
      Thomas Gleixner 提交于
      PIT clocksource is registered unconditionally even when HPET is enabled
      or when PIT is replaced by the local APIC timer. In both cases PIT can
      not be used as it is stopped and the readout would be stale.
      
      Prevent registering PIT in those cases.
      
      patch depends on:
      
        x86: offer is_hpet_enabled() on !CONFIG_HPET_TIMER too
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      316da3b3
    • I
      x86: offer is_hpet_enabled() on !CONFIG_HPET_TIMER too · df619e6b
      Ingo Molnar 提交于
      offer is_hpet_enabled() on !CONFIG_HPET_TIMER too.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      df619e6b
    • A
      clocksource: make clocksource watchdog cycle through online CPUs · 1ada5cba
      Andi Kleen 提交于
      This way it checks if the clocks are synchronized between CPUs too.
      This might be able to detect slowly drifting TSCs which only
      go wrong over longer time.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1ada5cba
    • P
      clocksource.c: use init_timer_deferrable for clocksource_watchdog · 1077f5a9
      Parag Warudkar 提交于
      clocksource_watchdog can use a deferrable timer - reduces wakeups from
      idle per second.
      Signed-off-by: NParag Warudkar <parag.warudkar@gmail.com>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1077f5a9
    • G
      time: fold __get_realtime_clock_ts() into getnstimeofday() · efd9ac86
      Geert Uytterhoeven 提交于
        - getnstimeofday() was just a wrapper around __get_realtime_clock_ts()
        - Replace calls to __get_realtime_clock_ts() by calls to getnstimeofday()
        - Fix bogus reference to get_realtime_clock_ts(), which never existed
      Signed-off-by: NGeert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      efd9ac86
    • A
      clocksource: make CLOCKSOURCE_MASK bullet-proof · 1d76c262
      Atsushi Nemoto 提交于
      Signed-off-by: NAtsushi Nemoto <anemo@mba.ocn.ne.jp>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1d76c262
    • T
      timer: clean up tick-broadcast.c · 186e3cb8
      Thomas Gleixner 提交于
      clean up tick-broadcast.c
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      186e3cb8
    • P
      time: more timer related cleanups · b10db7f0
      Pavel Machek 提交于
      I was confused by FSEC = 10^15 NSEC statement, plus small whitespace
      fixes. When there's copyright, there should be GPL.
      Signed-off-by: NPavel Machek <pavel@suse.cz>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b10db7f0
    • P
      time: timer cleanups · 4c9dc641
      Pavel Machek 提交于
      Small cleanups to tick-related code. Wrong preempt count is followed
      by BUG(), so it is hardly KERN_WARNING.
      Signed-off-by: NPavel Machek <pavel@suse.cz>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      4c9dc641
    • P
      time: clean hungarian notation from timers · a6fa8e5a
      Pavel Machek 提交于
      Clean up hungarian notation from timer code.
      Signed-off-by: NPavel Machek <pavel@suse.cz>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      a6fa8e5a
    • G
      kobj: fix threshold_init_device/kobject_uevent_env oops · 213eca7f
      Greg KH 提交于
      the logic in this function is just crazy.  It's recursive, but we
      can circumvent the creation for the kobject and whole creation of the
      threshold_block if some conditions are met.  That's why we see the
      allocate_threshold_blocks so many times in the callstack, yet only a few
      kobjects created.
      
      Then we blow up in kobject_uevent_env() on the first debug printk.
      Which means that we are just passing in garbage.
      
      Man, this is one time that comments in code would have been very nice to
      have, and why forward goto's into major code blocks are just evil...
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      213eca7f
    • L
      Merge git://git.linux-nfs.org/pub/linux/nfs-2.6 · 85004cc3
      Linus Torvalds 提交于
      * git://git.linux-nfs.org/pub/linux/nfs-2.6: (118 commits)
        NFSv4: Iterate through all nfs_clients when the server recalls a delegation
        NFSv4: Deal more correctly with duplicate delegations
        NFS: Fix a potential race between umount and nfs_access_cache_shrinker()
        NFS: Add an asynchronous delegreturn operation for use in nfs_clear_inode
        nfs: convert NFS_*(inode) helpers to static inline
        nfs: obliterate NFS_FLAGS macro
        NFS: Address memory leaks in the NFS client mount option parser
        nfs4: allow nfsv4 acls on non-regular-files
        NFS: Optimise away the sigmask code in aio/dio reads and writes
        SUNRPC: Don't bother changing the sigmask for asynchronous RPC calls
        SUNRPC: rpcb_getport_sync() passes incorrect address size to rpc_create()
        SUNRPC: Clean up block comment preceding rpcb_getport_sync()
        SUNRPC: Use appropriate argument types in rpcb client
        SUNRPC: rpcb_getport_sync() should use built-in hostname generator
        SUNRPC: Clean up functions that free address_strings array
        NFS: NFS version number is unsigned
        NLM: Fix a bogus 'return' in nlmclnt_rpc_release
        NLM: Introduce an arguments structure for nlmclnt_init()
        NLM/NFS: Use cached nlm_host when calling nlmclnt_proc()
        NFS: Invoke nlmclnt_init during NFS mount processing
        ...
      85004cc3
    • J
      as-iosched: fix double locking bug in as_merged_requests() · 149a051f
      Jens Axboe 提交于
      If the two requests belong to the same io context, we will attempt
      to lock the same lock twice. But swapping contexts is pointless in
      that case, so just check for rioc == nioc before doing the double
      lock and copy.
      Tested-by: NOlof Johansson <olof@lixom.net>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      149a051f
    • T
      NFSv4: Iterate through all nfs_clients when the server recalls a delegation · 3fbd67ad
      Trond Myklebust 提交于
      The same delegation may have been handed out to more than one nfs_client.
      Ensure that if a recall occurs, we return all instances.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      3fbd67ad
    • T
      NFSv4: Deal more correctly with duplicate delegations · 57bfa891
      Trond Myklebust 提交于
      If a (broken?) server hands out two different delegations for the same
      file, then we should return one of them.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      57bfa891
    • T
      NFS: Fix a potential race between umount and nfs_access_cache_shrinker() · 6f23e387
      Trond Myklebust 提交于
      Thanks to Yawei Niu for spotting the race.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      6f23e387
    • T
      NFS: Add an asynchronous delegreturn operation for use in nfs_clear_inode · e6f81075
      Trond Myklebust 提交于
      Otherwise, there is a potential deadlock if the last dput() from an NFSv4
      close() or other asynchronous operation leads to nfs_clear_inode calling
      the synchronous delegreturn.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      e6f81075
    • B
    • B
      nfs: obliterate NFS_FLAGS macro · 3a10c30a
      Benny Halevy 提交于
      use NFS_I(inode)->flags instead
      Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      3a10c30a
    • C
      NFS: Address memory leaks in the NFS client mount option parser · fc601477
      Chuck Lever 提交于
      David Howells noticed that repeating the same mount option twice during an
      NFS mount request can result in orphaned memory in certain cases.
      
      Only the client_address and mount_server.hostname strings are initialized
      in the mount parsing loop, so those appear to be the only two pointers that
      might be written over by repeating a mount option.  The strings in the
      nfs_server section of the nfs_parsed_mount_data structure are set only once
      after the options are parsed, thus these are not susceptible to being
      overwritten.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      fc601477
    • J
      nfs4: allow nfsv4 acls on non-regular-files · 3d1c5508
      J. Bruce Fields 提交于
      The rfc doesn't give any reason it shouldn't be possible to set an
      attribute on a non-regular file.  And if the server supports it, then it
      shouldn't be up to us to prevent it.
      
      Thanks to Erez for the report and Trond for further analysis.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Tested-by: NErez Zadok <ezk@cs.sunysb.edu>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      3d1c5508
    • T
      NFS: Optimise away the sigmask code in aio/dio reads and writes · f3c391e8
      Trond Myklebust 提交于
      There are no interruptible waits for asynchronous RPC tasks, so we don't
      need to wrap calls to rpc_run_task() with an
      rpc_clnt_sigmask/rpc_clnt_unsigmask pair.
      
      Instead we can wrap the wait_for_completion_interruptible() in
      nfs_direct_wait(). This means that we completely optimise away sigmask
      setting for the case of non-blocking aio/dio.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      f3c391e8
    • T
      SUNRPC: Don't bother changing the sigmask for asynchronous RPC calls · 34f5b466
      Trond Myklebust 提交于
      The caller will never sleep in rpc_execute, so don't bother setting the
      sigmask.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      34f5b466
    • C
      SUNRPC: rpcb_getport_sync() passes incorrect address size to rpc_create() · afc88112
      Chuck Lever 提交于
      The variable "sin" is a pointer, so sizeof(sin) is the size of a pointer,
      not the size of thing that sin points to.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      afc88112