1. 19 1月, 2022 3 次提交
  2. 18 1月, 2022 1 次提交
  3. 17 1月, 2022 4 次提交
  4. 13 1月, 2022 2 次提交
    • C
      fix: new restcountries url (#10043) · a784b12e
      ColleterVi 提交于
      Url extension "eu" and path "rest" are no longer available. Replacing them for a working url.
      a784b12e
    • D
      Speed up the StateC::L feature function (#10019) · 28299644
      Daniël de Kok 提交于
      * Speed up the StateC::L feature function
      
      This function gets the n-th most-recent left-arc with a particular head.
      Before this change, StateC::L would construct a vector of all left-arcs
      with the given head and then pick the n-th most recent from that vector.
      Since the number of left-arcs strongly correlates with the doc length
      and the feature is constructed for every transition, this can make
      transition-parsing quadratic.
      
      With this change StateC::L:
      
      - Searches left-arcs backwards.
      - Stops early when the n-th matching transition is found.
      - Does not construct a vector (reducing memory pressure).
      
      This change doesn't avoid the linear search when the transition that is
      queried does not occur in the left-arcs. Regardless, performance is
      improved quite a bit with very long docs:
      
      Before:
      
         N  Time
      
       400   3.3
       800   5.4
      1600  11.6
      3200  30.7
      
      After:
      
         N  Time
      
       400   3.2
       800   5.0
      1600   9.5
      3200  23.2
      
      We can probably do better with more tailored data structures, but I
      first wanted to make a low-impact PR.
      
      Found while investigating #9858.
      
      * StateC::L: simplify loop
      28299644
  5. 12 1月, 2022 1 次提交
  6. 07 1月, 2022 1 次提交
  7. 05 1月, 2022 4 次提交
  8. 04 1月, 2022 2 次提交
  9. 03 1月, 2022 1 次提交
  10. 29 12月, 2021 1 次提交
  11. 27 12月, 2021 2 次提交
  12. 21 12月, 2021 1 次提交
  13. 20 12月, 2021 2 次提交
  14. 16 12月, 2021 4 次提交
  15. 15 12月, 2021 2 次提交
  16. 07 12月, 2021 4 次提交
  17. 06 12月, 2021 2 次提交
  18. 05 12月, 2021 1 次提交
    • L
      Migrate regression tests into the main test suite (#9655) · 7d508046
      Lj Miranda 提交于
      * Migrate regressions 1-1000
      
      * Move serialize test to correct file
      
      * Remove tests that won't work in v3
      
      * Migrate regressions 1000-1500
      
      Removed regression test 1250 because v3 doesn't support the old LEX
      scheme anymore.
      
      * Add missing imports in serializer tests
      
      * Migrate tests 1500-2000
      
      * Migrate regressions from 2000-2500
      
      * Migrate regressions from 2501-3000
      
      * Migrate regressions from 3000-3501
      
      * Migrate regressions from 3501-4000
      
      * Migrate regressions from 4001-4500
      
      * Migrate regressions from 4501-5000
      
      * Migrate regressions from 5001-5501
      
      * Migrate regressions from 5501 to 7000
      
      * Migrate regressions from 7001 to 8000
      
      * Migrate remaining regression tests
      
      * Fixing missing imports
      
      * Update docs with new system [ci skip]
      
      * Update CONTRIBUTING.md
      
      - Fix formatting
      - Update wording
      
      * Remove lemmatizer tests in el lang
      
      * Move a few tests into the general tokenizer
      
      * Separate Doc and DocBin tests
      7d508046
  19. 30 11月, 2021 2 次提交
    • D
      morphologizer: avoid recreating label tuple for each token (#9764) · 72f7f4e6
      Daniël de Kok 提交于
      * morphologizer: avoid recreating label tuple for each token
      
      The `labels` property converts the dictionary key set to a tuple. This
      property was used for every annotated token, recreating the tuple over
      and over again.
      
      Construct the tuple once in the set_annotations function and reuse it.
      
      On a Finnish pipeline that I was experimenting with, this results in a
      speedup of ~15% (~13000 -> ~15000 WPS).
      
      * tagger: avoid recreating label tuple for each token
      72f7f4e6
    • A
      Switch to latest CI images (#9773) · c19f0c16
      Adriane Boyd 提交于
      c19f0c16