- 24 7月, 2012 4 次提交
-
-
由 Behdad Esfahbod 提交于
Seems to be about what Uniscribe does. Not exactly. But close enough. More consonants will start a new cluster. A few scripts went way down in failures. In particular: - Devanagari failures went down from 490 to 56. - Telugu went down from 113 to 49. Other scripts went down slightly or didn't change. New numbers: BENGALI: 353908 out of 354285 tests passed. 377 failed (0.106412%) DEVANAGARI: 693572 out of 693628 tests passed. 56 failed (0.00807349%) GUJARATI: 366485 out of 366506 tests passed. 21 failed (0.00572978%) GURMUKHI: 60750 out of 60809 tests passed. 59 failed (0.0970251%) KANNADA: 950730 out of 951913 tests passed. 1183 failed (0.124276%) KHMER: 298613 out of 299124 tests passed. 511 failed (0.170832%) MALAYALAM: 1046881 out of 1048416 tests passed. 1535 failed (0.146411%) ORIYA: 42320 out of 42329 tests passed. 9 failed (0.021262%) SINHALA: 271333 out of 271847 tests passed. 514 failed (0.189077%) TAMIL: 1091837 out of 1091837 tests passed. 0 failed (0%) TELUGU: 970524 out of 970573 tests passed. 49 failed (0.00504856%) Some of the remaining Telugu and Devanagari issues seem to be Uniscribe eating Anusvara when placed before a non-joiner. Ouch!
-
由 Behdad Esfahbod 提交于
Oops, thinko.
-
由 Behdad Esfahbod 提交于
Uniscribe reorders U+0E3A to be after U+0E38 and U+0E39. We do that by modifying the ccc for U+0E3A. Fixes the two remaining Thai failures (see previous commit).
-
由 Behdad Esfahbod 提交于
Adjust the list of marks before SARA AM that get the reordering treatment. Also adjust cluster formation to match Uniscribe. With Wikipedia test data, now I see: - For Thai, with the Angsana New font from Win7, I see 54 failures out of over 4M tests (0.00129107%). Of the 54, two are legitimate reordering issues (fix coming soon), and the other 52 are simply Uniscribe using a zero-width space char instead of an unknown character for missing glyphs. No idea why. The missing-glyph sequences include one that is a Thai character followed by an Arabic Sokun. Someone confused it with Nikhahit I assume! - For Lao, with the Dokchampa font from Win7, 33 tests fail out of 54k (0.0615167%). All seem to be insignificant mark positioning with two marks on a base. Have to investigate.
-
- 23 7月, 2012 4 次提交
-
-
由 Behdad Esfahbod 提交于
Test case was: <U+0D15,U+0D4D,U+0D15,U+0D4A>.
-
由 Behdad Esfahbod 提交于
This should address all possible cluster misformations that I had in mind.
-
由 Behdad Esfahbod 提交于
This should fix any instabilities in cluster formation that we were speculating may happen with surrounding syllables. Or most of it perhaps.
-
由 Behdad Esfahbod 提交于
Fixes crashes reported with left matra under non-uniscribe-bug-compatibilty mode.
-
- 21 7月, 2012 16 次提交
-
-
由 Behdad Esfahbod 提交于
Improves Bengali and Gurmukhi. Malayalam regressed a bit. We will deal with that later.
-
由 Behdad Esfahbod 提交于
This is a bunch of hacks for now. Improves Bengali a bit.
-
由 Behdad Esfahbod 提交于
Gurmukhi failures half now. Others changed slightly.
-
由 Behdad Esfahbod 提交于
Malayalam failures go way down. Other scripts benefitted slightly too. Sinhala had one or two test regressions, but...
-
由 Behdad Esfahbod 提交于
Fixes 20 out of 48 failing Oriya tests. Failure rate down to 0.066% now.
-
由 Behdad Esfahbod 提交于
Oriya failures down from 0.65% to 0.20%.
-
由 Behdad Esfahbod 提交于
Fixes most Malayalam failures. Down from 1.6% to 0.38% now. Fixes a few more in other scripts too.
-
由 Behdad Esfahbod 提交于
-
由 Behdad Esfahbod 提交于
If x is not constant, we cannot ASSERT_STATIC on it.
-
由 Behdad Esfahbod 提交于
Apparently this was approved in Feb 2012. No font yet.
-
由 Behdad Esfahbod 提交于
Althought IndicMatraCategory.txt classifies it as Top_And_Right matra, it does not have Unicode decomposition, and Uniscribe does not do anything special about it either. Gujarati failures down from 0.672% to 0.0130966%.
-
由 Behdad Esfahbod 提交于
Now that we break syllables on Halant,ZWNJ, this code can be simplified.
-
由 Behdad Esfahbod 提交于
-
由 Behdad Esfahbod 提交于
That's really what Uniscribe does, and explains a lot of pecularities of Halant,ZWNJ before the base. Sent Telugu from 1% failures to 0.03%. Improved Kannada and Malayalam slightly. Fixed half of Bengali, and did NOT break anything!
-
由 Behdad Esfahbod 提交于
Specifically, don't apply 'init' if previous char is a joiner. Fixes some more of Bengali.
-
由 Behdad Esfahbod 提交于
Fixes more of Telugu, Kannada, and Oriya. May break things (outside Indic...), but we cannot think of any font relying on this immediately.
-
- 20 7月, 2012 16 次提交
-
-
由 Behdad Esfahbod 提交于
-
由 Behdad Esfahbod 提交于
Not tuned, just copied from step 2. Fixes another 0.5% of Kannada failures. 1% to go.
-
由 Behdad Esfahbod 提交于
Fixes a few Devanagari, half of remaining Kannada failures, quarter for Telugu, and others slightly improved or unchanged.
-
由 Behdad Esfahbod 提交于
Fixes 5 Devanagari failures, and no regressions.
-
由 Behdad Esfahbod 提交于
Brings down failures with Lohit-Telugu from 57% to 1.40%.
-
由 Behdad Esfahbod 提交于
-
由 Behdad Esfahbod 提交于
Kannada failures down from 3.5% to 2.93%.
-
由 Behdad Esfahbod 提交于
It's not in IndicSyllabicCategory.txt. Fixes most of Gurmukhi failures. Failures down from 7.7% to 0.222%!
-
由 Behdad Esfahbod 提交于
-
由 Behdad Esfahbod 提交于
-
由 Behdad Esfahbod 提交于
-
由 Behdad Esfahbod 提交于
For Khmer, all consonants are subjoining. No need to look in the font. We were looking in the wrong order anyway.
-
由 Behdad Esfahbod 提交于
Fixes 1.5% more failures for Telugu, 2% for Kannada. Breaks one test in Devanagari.
-
由 Behdad Esfahbod 提交于
Fixes another 5% of Kannada failures.
-
由 Behdad Esfahbod 提交于
Fixes most failures of Oriya, and improves others a bit.
-
由 Behdad Esfahbod 提交于
-