Integrated Trainer of Parameter Server (API add `fluid.contrib.layers.sparse_embedding` only) !22957
Created by: seiriosPlus
PR types
New features, Function optimization
PR changes
APIs, OPs, Others
Describe
fully optimized code for parameter server training.
Introduction
API changes:
- add
fluid.contrib.sparse_embeddingfor large sparse embedding. - other API changes caused by code formatter.
OP changes:
-
recv_saveadd attributeis_sparse -
sendremove unused attributesend -
checkpoint_notifyremove attributestrainer_id/dir/lookup_table/epmap -
checkpoint_notifyadd attributesis_slice/varname/remote_varnames/endpoints/slice_varnames/dirname -
distributed_lookup_tabledelete unused attributeheight_sections
explain:
recv_save / send / checkpoint_notify / distributed_lookup_table are all private ops for distributed training, user-friendly.
Transpiler
old: huge methods with if/for loop new: passes
trainer: delete_optimizer_pass->distributed_ops_pass->append_send_ops_pass->fake_init_ops_pass->init_from_server_pass->delet_extra_optimizes_pass
pserver: add_listen_and_serv_pass->add_rpc_global_flags_pass->add_optimizer_pass->large_scale_sparse_pass->build_pserver_startup_program_pass->large_scale_sparse_pass
Communicator
reimplement: Communicator -> AsyncComunicator -> GeoCommunicator Communicator -> AsyncComunicator -> HalfAsyncCommunicator -> SyncCommunicator
Server
add LargeScaleKV implement
- auto growth id
- id in [0, INT64]
- hash by pservers, fix hotspot issues.
- save to SelectedRows/Text
- PServer Save
Experiments
| CTR LARGE_SCALE VS 1.7.2(8) | SPEED lines/esc | TEST AUC | ||
|---|---|---|---|---|
| ASYNC + DATASET | ||||
| EPOCH | 22957 | 1.7.2 | 22957 | 1.7.2 |
| 0 | 40382.7114 | 41824.4988 | 0.743417 | 0.747783 |
| 1 | 40852.5408 | 42742.1863 | 0.763407 | 0.765855 |
| 2 | 41757.7742 | 42824.1947 | 0.773173 | 0.775051 |
| 3 | 42245.0450 | 41419.2507 | 0.779447 | 0.780185 |
| 4 | 42255.9334 | 43046.1868 | 0.782906 | 0.783628 |
| 5 | 41638.7534 | 43064.7863 | 0.785447 | 0.785996 |
| 6 | 41625.8496 | 43903.9036 | 0.787759 | 0.787912 |
| 7 | 41826.0234 | 43849.0151 | 0.789067 | 0.789307 |
| 8 | 41644.1003 | 43084.6209 | 0.790005 | 0.790547 |
| 9 | 42006.6102 | 45298.1854 | 0.791332 | 0.791442 |
| 10 | 41520.9987 | 44090.6176 | 0.791979 | 0.792363 |
| 11 | 41644.4256 | 44754.7476 | 0.79242 | 0.792934 |
| 12 | 41600.1201 | 44129.0481 | 0.792934 | 0.793618 |
| 13 | 42242.2510 | 43911.7627 | 0.793653 | 0.793995 |
| 14 | 42493.9051 | 43280.5678 | 0.793839 | 0.794289 |
| 15 | 42483.4592 | 43156.0943 | 0.794197 | 0.794641 |
| 16 | 41973.2283 | 43050.7187 | 0.794247 | 0.794467 |
| 17 | 41648.6511 | 43470.0413 | 0.794318 | 0.795061 |
| 18 | 42001.9489 | 43878.2971 | 0.794401 | 0.794801 |
| 19 | 43204.9721 | 44350.7521 | 0.794796 | 0.794991 |
| W2V LARGE_SCALE VS 1.7.2(8) | SPEED words/esc | TEST ACC | ||
|---|---|---|---|---|
| ASYNC + DATASET | ||||
| EPOCH | 22957 | 1.7.2 | 22957 | 1.7.2 |
| 0 | 38710.8308 | 52546.4205 | 0.357 | 0.291 |
| 1 | 31047.3975 | 53228.6121 | 0.488 | 0.421 |
| 2 | 38398.9667 | 53219.9925 | 0.552 | 0.483 |
| 3 | 30892.7404 | 52935.8162 | 0.592 | 0.53 |
| 4 | 38253.3532 | 53083.7943 | 0.609 | 0.57 |
| 5 | 30509.0984 | 52890.2298 | 0.614 | 0.583 |
| 6 | 36201.4896 | 52968.6198 | 0.621 | 0.6 |
| 7 | 31285.0067 | 53509.7486 | 0.628 | 0.608 |
| 8 | 34499.2948 | 52988.9057 | 0.632 | 0.615 |
| 9 | 34003.5555 | 53091.7894 | 0.634 | 0.62 |
| 10 | 32231.5232 | 52983.9373 | 0.637 | 0.623 |
| 11 | 36732.4116 | 52930.2642 | 0.64 | 0.63 |
| 12 | 29601.1020 | 52820.1654 | 0.639 | 0.632 |
| 13 | 47984.6335 | 53022.2126 | 0.64 | 0.634 |
| 14 | 44598.9333 | 53156.1390 | 0.641 | 0.635 |
| SIMNET LARGE_SCALE VS 1.7.2(8) | SPEED lines/esc | TEST PN | ||
|---|---|---|---|---|
| ASYNC + DATASET | ||||
| EPOCH | 22957 | 1.7.2 | 22957 | 1.7.2 |
| 0 | 80882.7005 | 69527.7479 | 1.80131 | 1.83487 |
| 2 | 80851.1122 | 69427.2556 | 1.92519 | 1.95693 |
| 4 | 81212.7949 | 69551.4940 | 1.99275 | 2.01557 |
| 6 | 81046.8461 | 69718.6492 | 2.02755 | 2.065 |
| 8 | 81191.8261 | 69810.9468 | 2.0512 | 2.07834 |
| 10 | 81614.7143 | 69974.7245 | 2.08456 | 2.11172 |
| 12 | 80365.6785 | 69782.9580 | 2.06726 | 2.12081 |
| 14 | 80845.8114 | 70085.8516 | 2.1108 | 2.1329 |
| 16 | 81453.6759 | 70270.9776 | 2.12224 | 2.11487 |
| 18 | 80598.0124 | 70201.1548 | 2.14547 | 2.16318 |
| 20 | 80936.1689 | 69586.5245 | 2.12098 | 2.17401 |
| 22 | 80281.5667 | 69935.3969 | 2.16638 | 2.17755 |
| 24 | 80648.5904 | 69712.8900 | 2.17197 | 2.18454 |
| 26 | 80175.3672 | 69874.0455 | 2.17442 | 2.18979 |
| 28 | 80482.3953 | 69670.9059 | 2.1805 | 2.19599 |
| 30 | 80357.8911 | 69693.6987 | 2.18897 | 2.20132 |
| 32 | 80576.7476 | 69591.5745 | 2.19511 | 2.20476 |
| 34 | 81010.0043 | 69194.1689 | 2.19904 | 2.21155 |
| 36 | 80852.6387 | 69869.2369 | 2.19907 | 2.20889 |
| 38 | 80696.7542 | 69573.0540 | 2.2062 | 2.20458 |
NEXT Work
- the full unification of Tensor and LargeScakeLV on PServer
- Speed up for Geo/Async.
- more test at business situations.