- 01 10月, 2020 3 次提交
-
-
由 simonJJJ 提交于
Former-commit-id: af2d89b4dd972c27a5976d7b1fad11b551afb2a6
-
https://github.com/Oneflow-Inc/oneflow由 simonJJJ 提交于
Former-commit-id: 2f787462d13393508c9a20b74b3aa51aa0292325
-
由 guo ran 提交于
* layer_norm_param_grad * refine * use tmp_buffer in fp16, because gpu_atomic_add is slow * testcase * refine * refine * refine * fp16 test case * fix * rm trainable * fix Co-authored-by: Nguoran <guoran@oneflow.org> Co-authored-by: NJuncheng <liujuncheng1022@gmail.com> Former-commit-id: 5f5a128c
-
- 30 9月, 2020 4 次提交
-
-
由 simonJJJ 提交于
Former-commit-id: fc79abfb3760a8a26cab19f36741447a49d17f90
-
由 qq_22305325 提交于
* support oneflow_api_registry * fix code format * Expand folder range * optimize of_api_registry * Update of_api_registry.h add a blank line between constructor and member function * fix code format * optimize oneflow_pybind_api * fix code foemat * fix oneflow.cmake * add a blank line Co-authored-by: Noneflow-bot <69100618+oneflow-bot@users.noreply.github.com> Former-commit-id: 757d2eab
-
由 Shenghang Tsai 提交于
Former-commit-id: ef593e38
-
- 29 9月, 2020 1 次提交
-
-
由 liyurui 提交于
Co-authored-by: NJuncheng <liujuncheng1022@gmail.com> Former-commit-id: 9966d1db
-
- 28 9月, 2020 3 次提交
-
-
由 daquexian 提交于
* update lib name in link flags * get lib path from imp module * reformat and replace % with .format Co-authored-by: Noneflow-bot <69100618+oneflow-bot@users.noreply.github.com> Former-commit-id: 95a85c87
-
由 cheng cheng 提交于
* Networker interface * global epoll comm net * half implement of Networker * implement of Networke::Send * Implement of Networker::Recieve * Implement of Networker::HandlerRecieveSendMsgFromSrcMachine * Implement of Networker::HandlerRecieveAckMsgFromDstMachine * refine iterator in Networker * add networker test exe * add log and blocking count * OF_BARRIER for networker test * add log for debug * add more log * fix bug and add check * fix bug of wrong delete global in callback * fix bug of double free * exchange Netwoker deconstructor * moving BlockingCount in networker_test_main * send_before_recv & recv_before_send ; fix lock bug of status access * Networker -> Transport * add TODO(chengcheng) and rename interface of Transport handler * add more test * fix bug and refine test code * fix compile err for new change * Test for correctness * Test throughput like ibverbs read bandwith * Fix BUG: All stat change need be set in the block protected by lock * test 23 data up to 8388608 * note for debug * OF_BARRIER_ALL * ctrl_client clear * not commnet * test global * CtrlServer: Clear cq before shutdown * Fix cq shutdown on loop thread * add log * try shutdown grpc server before cq shutdown * move grpc server to loop thread * fix rpc loop return condition * revert change in ctrl server * fix bug of delete global in runtime * transport support 1. Receive size > Send size; 2. Local Send/Recv * add test for Send size < Recv size and local transport * fix bug when Send size < Recv size * refine code for review * Fix bug of Transport UnRegisterMemory * add Transport user doc * refine code for review. move memcopy from block protected by mutex Former-commit-id: 3a54beb8
-
- 27 9月, 2020 3 次提交
-
-
由 Shenghang Tsai 提交于
Former-commit-id: e210c8cb
-
由 cheng cheng 提交于
Former-commit-id: 79024ddd
-
由 Juncheng 提交于
* Add model update user ops * fix * extract MakePredicatorIsSafeToDelete/IsUserOpWithTypeName * Remove useless code * Dev indexed slices model update user ops (#3561) * indexed_slices user ops and rewrite pass * fix lazy adam * fix * add testcase * fix sbp * rename IndexedSlicesUpdateOpKernelState * address review * rename state * check model_diff dtype * use TmpBufferManager * refine * refine Co-authored-by: Nguoran <guoran@oneflow.org> * Dev adam xla and rm sys op (#3584) * rm model_update sys op * mv indexed slice op kernel to model_update op * adam optimizer * revert indexed_slices_reduce_sum op * fix * refine * rm mask in lazy_adam testcase * add cast Co-authored-by: Nguoran <guoran@oneflow.org> * reuse adam optm * refine * enable_fuse_model_update_ops * fix * pure CPU * model update ops SetAreaId * Node is not safe to delete if it has more than one consumer. * refine * fix Co-authored-by: Nguo ran <360112263@qq.com> Co-authored-by: Nguoran <guoran@oneflow.org> Former-commit-id: b303c687
-
- 25 9月, 2020 4 次提交
-
-
由 Shenghang Tsai 提交于
Former-commit-id: 2e3ce2e6
-
由 Shenghang Tsai 提交于
Former-commit-id: 11317a14
-
由 liyurui 提交于
* add convert_url_to_oss_https_url and DCN flag * add third_party_mirror flag in ci * fix the reviewer's suggestions * recover the include order && skip oss2 if not exists * fix find oss module * of_format * fix the reviewer's suggestions and add some md5 checks * check for non-empty third-party-mirro variable * add flag introduction in readme Former-commit-id: 72a98a3f
-
由 cheng cheng 提交于
Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com> Former-commit-id: 22e0c80e
-
- 24 9月, 2020 6 次提交
-
-
由 Shenghang Tsai 提交于
Former-commit-id: e70add24
-
由 hsj0429 提交于
* add comments for cuda_copy_d2h_stream_type.cpp * modify comments for cuda_copy_d2h_stream_type.cpp Co-authored-by: NLi Xinqi <lixinqi2010@gmail.com> Former-commit-id: 5421ca67
- 23 9月, 2020 3 次提交
-
-
由 cheng cheng 提交于
* Add control test * fix bug of grpc ctrl server shutdown * delete call for last cancel item in cq Former-commit-id: cbed90ae
-
由 Juncheng 提交于
Co-authored-by: Noneflow-bot <69100618+oneflow-bot@users.noreply.github.com> Former-commit-id: e3623f61
-
- 22 9月, 2020 3 次提交
-
-
由 Shenghang Tsai 提交于
* check in script * add ifs * add ignore * fix url * add ifs * rename log * fix key * fix url * add todo * add note * add ci task * fix oss2 * fix setuptools * cleanup * install wheel Co-authored-by: Noneflow-bot <69100618+oneflow-bot@users.noreply.github.com> Former-commit-id: 35d15f1b
-
由 Shenghang Tsai 提交于
Former-commit-id: d8498227
-
由 Li Xinqi 提交于
Co-authored-by: NShenghang Tsai <jackalcooper@gmail.com> Former-commit-id: e9e4d8e9
-
- 17 9月, 2020 10 次提交
-
-
由 Shenghang Tsai 提交于
* add instructions on download src from aliyun * fix typo * add bash scripts * fmt Former-commit-id: 7bc6c283
-
由 Shenghang Tsai 提交于
Former-commit-id: 19a664a0
-
由 Shenghang Tsai 提交于
Former-commit-id: 08a871b1
-
由 Shenghang Tsai 提交于
Former-commit-id: e935d9b4
-
-
由 Shenghang Tsai 提交于
Former-commit-id: c6e07900
-
由 Li Xinqi 提交于
* move cluster_control from core/control to core/job * rename ClusterControl to ClusterInstruction * explicitly call oneflow.env.init() in 2node_test.py * refactor WorkerLoop with lazy_runtime_thread * rename RunLazyJobSet to AsyncRunLazyJobSet * EagerInstruction * replace eager_util with eager::Oneflow * 1) remove unnecessary BarrierClear in cluster_instruction.cpp; 2) refactor BarrierClear to NewSessionBarrier * 1) rename eager::Oneflow to eager::EagerOneflow; 2) more comments * OccasionallyClearCtrlKV * ObsoleteCtrlKeys * reformat Former-commit-id: c74a53dd
-
由 Shenghang Tsai 提交于
Former-commit-id: bd6bb8f8
-
由 Shenghang Tsai 提交于
Former-commit-id: 6f79c8db
-
由 Shenghang Tsai 提交于
Former-commit-id: ffabc33e
-