提交 · 46151122e0a2e80e5a6b2889f595e371fe2b600d · xiphi1978 / linux

08 5月, 2008 5 次提交

sched: fix weight calculations · 46151122

由 Mike Galbraith 提交于 5月 08, 2008

The conversion between virtual and real time is as follows:

  dvt = rw/w * dt <=> dt = w/rw * dvt

Since we want the fair sleeper granularity to be in real time, we actually
need to do:

  dvt = - rw/w * l

This bug could be related to the regression reported by Yanmin Zhang:

| Comparing with kernel 2.6.25, sysbench+mysql(oltp, readonly) has lots
| of regressions with 2.6.26-rc1:
|
| 1) 8-core stoakley: 28%;
| 2) 16-core tigerton: 20%;
| 3) Itanium Montvale: 50%.
Reported-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

46151122

semaphore: fix · bf726eab

由 Ingo Molnar 提交于 5月 08, 2008

Yanmin Zhang reported:

| Comparing with kernel 2.6.25, AIM7 (use tmpfs) has more th
| regression under 2.6.26-rc1 on my 8-core stoakley, 16-core tigerton,
| and Itanium Montecito. Bisect located the patch below:
|
| 64ac24e7 is first bad commit
| commit 64ac24e7
| Author: Matthew Wilcox <matthew@wil.cx>
| Date:   Fri Mar 7 21:55:58 2008 -0500
|
|     Generic semaphore implementation
|
| After I manually reverted the patch against 2.6.26-rc1 while fixing
| lots of conflicts/errors, aim7 regression became less than 2%.

i reproduced the AIM7 workload and can confirm Yanmin's findings that
-.26-rc1 regresses over .25 - by over 67% here.

Looking at the workload i found and fixed what i believe to be the real
bug causing the AIM7 regression: it was inefficient wakeup / scheduling
/ locking behavior of the new generic semaphore code, causing suboptimal
performance.

The problem comes from the following code. The new semaphore code does
this on down():

        spin_lock_irqsave(&sem->lock, flags);
        if (likely(sem->count > 0))
                sem->count--;
        else
                __down(sem);
        spin_unlock_irqrestore(&sem->lock, flags);

and this on up():

        spin_lock_irqsave(&sem->lock, flags);
        if (likely(list_empty(&sem->wait_list)))
                sem->count++;
        else
                __up(sem);
        spin_unlock_irqrestore(&sem->lock, flags);

where __up() does:

        list_del(&waiter->list);
        waiter->up = 1;
        wake_up_process(waiter->task);

and where __down() does this in essence:

        list_add_tail(&waiter.list, &sem->wait_list);
        waiter.task = task;
        waiter.up = 0;
        for (;;) {
                [...]
                spin_unlock_irq(&sem->lock);
                timeout = schedule_timeout(timeout);
                spin_lock_irq(&sem->lock);
                if (waiter.up)
                        return 0;
        }

the fastpath looks good and obvious, but note the following property of
the contended path: if there's a task on the ->wait_list, the up() of
the current owner will "pass over" ownership to that waiting task, in a
wake-one manner, via the waiter->up flag and by removing the waiter from
the wait list.

That is all and fine in principle, but as implemented in
kernel/semaphore.c it also creates a nasty, hidden source of contention!

The contention comes from the following property of the new semaphore
code: the new owner owns the semaphore exclusively, even if it is not
running yet.

So if the old owner, even if just a few instructions later, does a
down() [lock_kernel()] again, it will be blocked and will have to wait
on the new owner to eventually be scheduled (possibly on another CPU)!
Or if another task gets to lock_kernel() sooner than the "new owner"
scheduled, it will be blocked unnecessarily and for a very long time
when there are 2000 tasks running.

I.e. the implementation of the new semaphores code does wake-one and
lock ownership in a very restrictive way - it does not allow
opportunistic re-locking of the lock at all and keeps the scheduler from
picking task order intelligently.

This kind of scheduling, with 2000 AIM7 processes running, creates awful
cross-scheduling between those 2000 tasks, causes reduced parallelism, a
throttled runqueue length and a lot of idle time. With increasing number
of CPUs it causes an exponentially worse behavior in AIM7, as the chance
for a newly woken new-owner task to actually run anytime soon is less
and less likely.

Note that it takes just a tiny bit of contention for the 'new-semaphore
catastrophy' to happen: the wakeup latencies get added to whatever small
contention there is, and quickly snowball out of control!

I believe Yanmin's findings and numbers support this analysis too.

The best fix for this problem is to use the same scheduling logic that
the kernel/mutex.c code uses: keep the wake-one behavior (that is OK and
wanted because we do not want to over-schedule), but also allow
opportunistic locking of the lock even if a wakee is already "in
flight".

The patch below implements this new logic. With this patch applied the
AIM7 regression is largely fixed on my quad testbox:

  # v2.6.25 vanilla:
  ..................
  Tasks   Jobs/Min        JTI     Real    CPU     Jobs/sec/task
  2000    56096.4         91      207.5   789.7   0.4675
  2000    55894.4         94      208.2   792.7   0.4658

  # v2.6.26-rc1-166-gc0a18111 vanilla:
  ...................................
  Tasks   Jobs/Min        JTI     Real    CPU     Jobs/sec/task
  2000    33230.6         83      350.3   784.5   0.2769
  2000    31778.1         86      366.3   783.6   0.2648

  # v2.6.26-rc1-166-gc0a18111 + semaphore-speedup:
  ...............................................
  Tasks   Jobs/Min        JTI     Real    CPU     Jobs/sec/task
  2000    55707.1         92      209.0   795.6   0.4642
  2000    55704.4         96      209.0   796.0   0.4642

i.e. a 67% speedup. We are now back to within 1% of the v2.6.25
performance levels and have zero idle time during the test, as expected.

Btw., interactivity also improved dramatically with the fix - for
example console-switching became almost instantaneous during this
workload (which after all is running 2000 tasks at once!), without the
patch it was stuck for a minute at times.

There's another nice side-effect of this speedup patch, the new generic
semaphore code got even smaller:

   text    data     bss     dec     hex filename
   1241       0       0    1241     4d9 semaphore.o.before
   1207       0       0    1207     4b7 semaphore.o.after

(because the waiter.up complication got removed.)

Longer-term we should look into using the mutex code for the generic
semaphore code as well - but i's not easy due to legacies and it's
outside of the scope of v2.6.26 and outside the scope of this patch as
well.
Bisected-by: N"Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

bf726eab

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 · 3de2403e

由 Linus Torvalds 提交于 5月 07, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc: Fix fork/clone/vfork system call restart.
  sparc: Fix mmap VA span checking.

3de2403e

sparc: Fix fork/clone/vfork system call restart. · 1e38c126

由 David S. Miller 提交于 5月 07, 2008

We clobber %i1 as well as %i0 for these system calls,
because they give two return values.

Therefore, on error, we have to restore %i1 properly
or else the restart explodes since it uses the wrong
arguments.

This fixes glibc's nptl/tst-eintr1.c testcase.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e38c126

[MAINTAINERS] New maintainer for Intel ethernet adapters · e0164af6

由 Auke Kok 提交于 5月 07, 2008

I'm handing over maintainership to Jeff Kirsher and moving on
to other Linux/Open Source work within Intel. Good luck to Jeff ;)
Signed-off-by: NAuke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e0164af6

07 5月, 2008 13 次提交

sparc: Fix mmap VA span checking. · 58163393

由 David S. Miller 提交于 5月 07, 2008

We should not conditionalize VA range checks on MAP_FIXED.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58163393

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 · bd1d23a8

由 Linus Torvalds 提交于 5月 06, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Fix initrd regression.
  usb: Sparc build fix, make USB_ISP1760_OF depend on PPC_OF
  sparc64: remove online_page()
  sparc64: use compat_sys_utimes instead of home-grown local copy.
  sbus: Fix bpp driver build.
  sparc video: make blank use proper constant
  Revert "[SPARC64]: Wrap SMP IPIs with irq_enter()/irq_exit()."
  sparc: tcx.c remove unnecessary function

bd1d23a8

Revert "uml: fix gcc problem" · c0a18111

由 Linus Torvalds 提交于 5月 06, 2008

This reverts commit 22eecde2.  Uli
reports that it breaks UML on x86-64 with the Fedora 8 gcc (gcc 4.1.2),
causing a crash on startup. See

	http://marc.info/?l=linux-kernel&m=121011722806093&w=2

for a trace.
Reported-by: NUlrich Drepper <drepper@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c0a18111

sparc64: Fix initrd regression. · d45100f7

由 David S. Miller 提交于 5月 06, 2008

We die because we forget to convert initrd_start and
initrd_end to virtual addresses.

Reported by Mikael Pettersson
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d45100f7

usb: Sparc build fix, make USB_ISP1760_OF depend on PPC_OF · 3eb6753e

由 David S. Miller 提交于 5月 06, 2008

Sparc doesn't have some of the OF interfaces this driver
wants to use.
Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3eb6753e

Fix bogus warning in sysdev_driver_register() · db176c6e

由 OGAWA Hirofumi 提交于 5月 07, 2008

        if ((drv->entry.next != drv->entry.prev) ||
            (drv->entry.next != NULL)) {

warns list_empty(&drv->entry).
Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Greg KH <gregkh@suse.de>
Cc: Len Brown <lenb@kernel.org>
[ Version 2 totally redone based on suggestions from Linus & Greg ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

db176c6e

VFS: fix unused variable warning · 6ce07c7b

由 Linus Torvalds 提交于 5月 06, 2008

Commit 33dcdac2 ("kill ->put_inode")
removed the final use of i_op->put_inode, but left the now totally
unused "op" variable in iput().

Get rid of it.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6ce07c7b

x86: fix PAE pmd_bad bootup warning · aeed5fce

由 Hugh Dickins 提交于 5月 06, 2008

Fix warning from pmd_bad() at bootup on a HIGHMEM64G HIGHPTE x86_32.

That came from 9fc34113 x86: debug pmd_bad();
but we understand now that the typecasting was wrong for PAE in the previous
version: pagetable pages above 4GB looked bad and stopped Arjan from booting.

And revert that cded932b x86: fix pmd_bad
and pud_bad to support huge pages. It was the wrong way round: we shouldn't
weaken every pmd_bad and pud_bad check to let huge pages slip through - in
part they check that we _don't_ have a huge page where it's not expected.

Put the x86 pmd_bad() and pud_bad() definitions back to what they have long
been: they can be improved (x86_32 should use PTE_MASK, to stop PAE thinking
junk in the upper word is good; and x86_64 should follow x86_32's stricter
comparison, to stop thinking any subset of required bits is good); but that
should be a later patch.

Fix Hans' good observation that follow_page() will never find pmd_huge()
because that would have already failed the pmd_bad test: test pmd_huge in
between the pmd_none and pmd_bad tests. Tighten x86's pmd_huge() check?
No, once it's a hugepage entry, it can get quite far from a good pmd: for
example, PROT_NONE leaves it with only ACCESSED of the KERN_PGTABLE bits.

However... though follow_page() contains this and another test for huge
pages, so it's nice to keep it working on them, where does it actually get
called on a huge page? get_user_pages() checks is_vm_hugetlb_page(vma) to
to call alternative hugetlb processing, as does unmap_vmas() and others.
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Earlier-version-tested-by: NIngo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jeff Chua <jeff.chua.linux@gmail.com>
Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aeed5fce

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 · bb78be83

由 Linus Torvalds 提交于 5月 06, 2008

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  [PATCH] fix SMP ordering hole in fcntl_setlk()
  [PATCH] kill ->put_inode
  [PATCH] fix reservation discarding in affs

bb78be83

[PATCH] fix SMP ordering hole in fcntl_setlk() · 0b2bac2f

由 Al Viro 提交于 5月 06, 2008

fcntl_setlk()/close() race prevention has a subtle hole - we need to
make sure that if we *do* have an fcntl/close race on SMP box, the
access to descriptor table and inode->i_flock won't get reordered.

As it is, we get STORE inode->i_flock, LOAD descriptor table entry vs.
STORE descriptor table entry, LOAD inode->i_flock with not a single
lock in common on both sides. We do have BKL around the first STORE,
but check in locks_remove_posix() is outside of BKL and for a good
reason - we don't want BKL on common path of close(2).

Solution is to hold ->file_lock around fcheck() in there; that orders
us wrt removal from descriptor table that preceded locks_remove_posix()
on close path and we either come first (in which case eviction will be
handled by the close side) or we'll see the effect of close and do
eviction ourselves. Note that even though it's read-only access,
we do need ->file_lock here - rcu_read_lock() won't be enough to
order the things.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0b2bac2f

[PATCH] kill ->put_inode · 33dcdac2

由 Christoph Hellwig 提交于 4月 29, 2008

And with that last patch to affs killing the last put_inode instance we
can finally, after many years of transition kill this racy and awkward
interface.

(It's kinda funny that even the description in
Documentation/filesystems/vfs.txt was entirely wrong..)

Also remove a very misleading comment above the defintion of
struct super_operations.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

33dcdac2

[PATCH] fix reservation discarding in affs · dca3c336

由 Roman Zippel 提交于 4月 29, 2008

- remove affs_put_inode, so preallocations aren't discared unnecessarily
  often.
- remove affs_drop_inode, it's called with a spinlock held, so it can't
  use a mutex.
- make i_opencnt atomic
- avoid direct b_count manipulations
- a few allocation failure fixes, so that these are more gracefully
  handled now.
Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dca3c336

Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev · 31d9168d

由 Linus Torvalds 提交于 5月 06, 2008

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (27 commits)
  pata_atiixp: Don't disable
  sata_inic162x: update intro comment, up the version and drop EXPERIMENTAL
  sata_inic162x: add cardbus support
  sata_inic162x: kill now unused SFF related stuff
  sata_inic162x: use IDMA for ATAPI commands
  sata_inic162x: use IDMA for non DMA ATA commands
  sata_inic162x: kill now unused bmdma related stuff
  sata_inic162x: use IDMA for ATA_PROT_DMA
  sata_inic162x: update TF read handling
  sata_inic162x: add / update constants
  sata_inic162x: misc clean ups
  sata_mv use hweight16() for bit counting (V2)
  sata_mv NCQ-EH for FIS-based switching
  sata_mv delayed eh handling
  libata: export ata_eh_analyze_ncq_error
  sata_mv new mv_port_intr function
  sata_mv fix mv_host_intr bug for hc_irq_cause
  sata_mv NCQ and SError fixes for mv_err_intr
  sata_mv rearrange mv_config_fbs
  sata_mv errata workaround for sata25 part 1
  ...

31d9168d

06 5月, 2008 22 次提交

pata_atiixp: Don't disable · 05177f17

由 Alan Cox 提交于 5月 02, 2008

A couple of distributions (Fedora, Ubuntu) were having weird problems with the
ATI IXP series PATA controllers being reported as simplex.  At the heart of
the problem is that both distros ignored the recommendations to load pata_acpi
and ata_generic *AFTER* specific host drivers.

The underlying cause however is that if you D3 and then D0 an ATI IXP it
helpfully throws away some configuration and won't let you rewrite it.

Add checks to ata_generic and pata_acpi to pin ATIIXP devices.  Possibly the
real answer here is to quirk them and pin them, but right now we can't do that
before they've been pcim_enable()'d by a driver.

I'm indebted to David Gero for this.  His bug report not only reported the
problem but identified the cause correctly and he had tested the right values
to prove what was going on

[If you backport this for 2.6.24 you will need to pull in the 2.6.25
removal of the bogus WARN_ON() in pcim_enagle]
Signed-off-by: NAlan Cox <alan@redhat.com>
Tested-by: NDavid Gero <davidg@havidave.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

05177f17

sata_inic162x: update intro comment, up the version and drop EXPERIMENTAL · 22bfc6d5

由 Tejun Heo 提交于 4月 30, 2008

sata_inic162x is now ready for production use.  Bump the version,
explain what's working and what's not and drop EXPERIMENTAL.
Signed-off-by: NTejun Heo <htejun@gmail.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

22bfc6d5

sata_inic162x: add cardbus support · ba66b242

由 Tejun Heo 提交于 4月 30, 2008

When attached to cardbus, mmio region is at BAR 1.  Other than that,
everything else is the same.  Add support for it.
Signed-off-by: NTejun Heo <htejun@gmail.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

ba66b242

sata_inic162x: kill now unused SFF related stuff · f8b0685a

由 Tejun Heo 提交于 4月 30, 2008

sata_inic162x now doesn't use any SFF features.  Remove all SFF
related stuff.

* Mask unsolicited ATA interrupts.  This removes our primary source of
  spurious interrupts and spurious interrupt handling can be tightened
  up.  There's no need to clear ATA interrupts by reading status
  register either.

* Don't dance with IDMA_CTL_ATA_NIEN and simplify accesses to
  IDMA_CTL.

* Inherit from sata_port_ops instead of ata_sff_port_ops.

* Don't initialize or use ioaddr.  There's no need to map BAR0-4
  anymore.
Signed-off-by: NTejun Heo <htejun@gmail.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

f8b0685a

sata_inic162x: use IDMA for ATAPI commands · b3f677e5

由 Tejun Heo 提交于 4月 30, 2008

Use IDMA for ATAPI commands.  Write and some misc commands time out
when executed using ATAPI_PROT_DMA but ATAPI_PROT_PIO works fine.  As
PIO is driven by DMA too, it doesn't make any noticeable difference
for native SATA devices.  inic_check_atapi_dma() is implemented to
force PIO for those ATAPI commands.

After this change, sata_inic162x issues all commands using IDMA.
Signed-off-by: NTejun Heo <htejun@gmail.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

b3f677e5

sata_inic162x: use IDMA for non DMA ATA commands · 049e8e04

由 Tejun Heo 提交于 4月 30, 2008

Use IDMA for PIO and non-data commands.  This allows sata_inic162x to
safely drive LBA48 devices.  Kill inic_dev_config() which contains
code to reject LBA48 devices.

With this change, status checking in inic_qc_issue() to avoid hard
lock up after hotplug can go away too.
Signed-off-by: NTejun Heo <htejun@gmail.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

049e8e04

sata_inic162x: kill now unused bmdma related stuff · ab5b0235

由 Tejun Heo 提交于 4月 30, 2008

sata_inic162x doesn't use BMDMA anymore.  Kill bmdma related stuff.

* prdctl manipulation

* port IRQ mask manipulation

* inherit ATA_BASE_SHT instead of ATA_BMDMA_SHT

* BMDMA methods
Signed-off-by: NTejun Heo <htejun@gmail.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

ab5b0235

sata_inic162x: use IDMA for ATA_PROT_DMA · 3ad400a9

由 Tejun Heo 提交于 4月 30, 2008

The modified driver on initio site has enough clue on how to use IDMA.
Use IDMA for ATA_PROT_DMA.

* LBA48 now works as long as it uses DMA (LBA48 devices still aren't
  allowed as it can destroy data if PIO is used for any reason).

* No need to mask IRQs for read DMAs as IDMA_DONE is properly raised
  after transfer to memory is actually completed.  There will be some
  spurious interrupts but host_intr will handle it correctly and
  manipulating port IRQ mask interacts badly with the other port for
  some reason, so command type dependent port IRQ masking is not used
  anymore.
Signed-off-by: NTejun Heo <htejun@gmail.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

3ad400a9

sata_inic162x: update TF read handling · 364fac0e

由 Tejun Heo 提交于 5月 01, 2008

inic162x can't reliably read back TF or at least we don't know how to
do it yet.  The only values which seem reliable are status and error.
This patch updates access to TF.

* implement inic_tf_read() which reads the TF area in mmio area

* implement custom inic_qc_fill_rtf() which only returns true if
  status indicates device error.  it'll be returning bogus addresses
  for device errors but it'll be able to report why it failed at
  least.

* implement custom inic_check_ready() and use ata_wait_after_reset()
  instead of the SFF version.

* use inic_tf_read() for classification.

This is not perfect but it fixes hotplug detection failure and at
least makes the driver report 0's instead of random garbages while
reporting valid status and error for device errors.
Signed-off-by: NTejun Heo <htejun@gmail.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

364fac0e

sata_inic162x: add / update constants · b0dd9b8e

由 Tejun Heo 提交于 4月 30, 2008

* add a bunch of constants, most are from the datasheet, a few
  undocumented ones are from initio's modified driver

* HCTL_PWRDWN is bit 12 not 13

This is in preparation of further inic162x updates.
Signed-off-by: NTejun Heo <htejun@gmail.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

b0dd9b8e

sata_inic162x: misc clean ups · 36f674d9

由 Tejun Heo 提交于 4月 30, 2008

* use larger indents for structure member definitions

* kill unused variable @addr in inic_scr_write()

* kill unnecessary flushes in inic_freeze/thaw()

* kill buggy explicit kfree() on devres managed port private data

This is in preparation of further inic162x updates.
Signed-off-by: NTejun Heo <htejun@gmail.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

36f674d9

sata_mv use hweight16() for bit counting (V2) · c46938cc

由 Mark Lord 提交于 5月 02, 2008

Some tidying as suggested by Grant Grundler.

Nuke local bit-counting function from sata_mv in favour of using hweight16().
Also add a short explanation for the 15msec timeout used when waiting for empty/idle.
Signed-off-by: NMark Lord <mlord@pobox.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

c46938cc

sata_mv NCQ-EH for FIS-based switching · 4c299ca3

由 Mark Lord 提交于 5月 02, 2008

Convert sata_mv's EH for FIS-based switching (FBS) over to the
sequence recommended by Marvell.  This enables us to catch/analyze
multiple failed links on a port-multiplier when using NCQ.

To do this, we clear the ERR_DEV bit in the EDMA Halt-Conditions register,
so that the EDMA engine doesn't self-disable on the first NCQ error.

Our EH code sets the MV_PP_FLAG_DELAYED_EH flag to prevent new commands
being queued while we await completion of all outstanding NCQ commands
on all links of the failed PM.

The SATA Test Control register tells us which links have failed,
so we must only wait for any other active links to finish up
before we stop the EDMA and run the .error_handler afterward.

The patch also includes skeleton code for handling of non-NCQ FBS operation.
This is more for documentation purposes right now, as that mode is not yet
enabled in sata_mv.
Signed-off-by: NMark Lord <mlord@pobox.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

4c299ca3

sata_mv delayed eh handling · 29d187bb

由 Mark Lord 提交于 5月 02, 2008

Introduce a new "delayed error handling" mechanism in sata_mv,
to enable us to eventually deal with multiple simultaneous NCQ
failures on a single host link when a PM is present.

This involves a port flag (MV_PP_FLAG_DELAYED_EH) to prevent new
commands being queued, and a pmp bitmap to indicate which pmp links
had NCQ errors.

The new mv_pmp_error_handler() uses those values to invoke
ata_eh_analyze_ncq_error() on each failed link, prior to freezing
the port and passing control to sata_pmp_error_handler().

This is based upon a strategy suggested by Tejun.

For now, we just implement the delayed mechanism.
The next patch in this series will add the multiple-NCQ EH code
to take advantage of it.
Signed-off-by: NMark Lord <mlord@pobox.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

29d187bb

libata: export ata_eh_analyze_ncq_error · 10acf3b0

由 Mark Lord 提交于 5月 02, 2008

Export ata_eh_analyze_ncq_error() for subsequent use by sata_mv,
as suggested by Tejun.
Signed-off-by: NMark Lord <mlord@pobox.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

10acf3b0

sata_mv new mv_port_intr function · a9010329

由 Mark Lord 提交于 5月 02, 2008

Separate out the inner loop body of mv_host_intr()
into it's own function called mv_port_intr().

This should help maintainabilty.
Signed-off-by: NMark Lord <mlord@pobox.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

a9010329

sata_mv fix mv_host_intr bug for hc_irq_cause · eabd5eb1

由 Mark Lord 提交于 5月 02, 2008

Remove the unwanted reads of hc_irq_cause from mv_host_intr(),
thereby removing a bug whereby we were not always reading it when needed..
Signed-off-by: NMark Lord <mlord@pobox.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

eabd5eb1

sata_mv NCQ and SError fixes for mv_err_intr · 37b9046a

由 Mark Lord 提交于 5月 02, 2008

Sigh.  Undo some earlier changes to mv_port_intr(),
so that we now read/clear SError again in all cases.

Arrange the top of the function to be as close as possible
to what we need for a later update (in this series) for ERR_DEV handling.

Fix things so that libata-eh can attempt a READ_LOG_EXT_10H
in response to a failed NCQ command, by just doing a local
mv_eh_freeze() rather than ata_port_freeze().

This will now fully handle NCQ errors much of the time,
but more fixes are needed for FBS/PMP, and for certain chip errata.
Signed-off-by: NMark Lord <mlord@pobox.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

37b9046a

sata_mv rearrange mv_config_fbs · 00f42eab

由 Mark Lord 提交于 5月 02, 2008

Rearrange mv_config_fbs() to more closely follow the (corrected) datasheet
recommendations for NCQ and FIS-based switching (FBS).

Also, maintain a port flag to let us know when FBS is enabled.
We will make more use of that flag later in this patch series.
Signed-off-by: NMark Lord <mlord@pobox.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

00f42eab

sata_mv errata workaround for sata25 part 1 · dd2890f6

由 Mark Lord 提交于 5月 02, 2008

Part 1 of workaround for errata "sata#25" for the 60x1 series
(the second half of this errata workaround is still in development.

Bit22 of the GPIO port has to be set "on" when in NCQ mode.
Signed-off-by: NMark Lord <mlord@pobox.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

dd2890f6

sata_mv new mv_qc_defer method · 3e4a1391

由 Mark Lord 提交于 5月 02, 2008

The EDMA engine cannot tolerate a mix of NCQ/non-NCQ commands,
and cannot be used for PIO at all.  So we need to prevent libata
from trying to feed us such mixtures.

Introduce mv_qc_defer() for this purpose, and use it for all chip versions.
Signed-off-by: NMark Lord <mlord@pobox.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

3e4a1391

sata_mv wait for empty+idle · 9b2c4e0b

由 Mark Lord 提交于 5月 02, 2008

When performing EH, it is recommended to wait for the EDMA engine
to empty out requests-in-progress before disabling EDMA.

Introduce code to poll the EDMA_STATUS register for idle/empty bits
before disabling EDMA.  For non-EH operation, this will normally exit
without delay, other than the register read.

A later series of patches may focus on eliminating this and various
other register reads (when possible) throughout the driver,
but for now we're focussing on solid reliablity.
Signed-off-by: NMark Lord <mlord@pobox.com>
Signed-off-by: NJeff Garzik <jgarzik@redhat.com>

9b2c4e0b