code_and_doc.md 16.4 KB
Newer Older
T
thunder 已提交
1 2 3 4
  - Appendix

    This appendix contains python, document specifications and Pull Request process. Please follow the relevant contents

T
thunder 已提交
5
    - [Appendix 1:Python Code Specification](#Appendix1)
T
thunder 已提交
6

T
thunder 已提交
7
    - [Appendix 2:Document Specification](#Appendix2)
T
thunder 已提交
8

T
thunder 已提交
9
    - [Appendix 3:Pull Request Description](#Appendix3)
T
thunder 已提交
10 11 12

    <a name="Appendix1"></a>

T
thunder 已提交
13
    ## Appendix 1:Python Code Specification
T
thunder 已提交
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

    The Python code of PaddleOCR follows [PEP8 Specification]( https://www.python.org/dev/peps/pep-0008/ ), some of the key concerns include the following

    - Space 

      - Spaces should be added after commas, semicolons, colons, not before them

        ```python
        # true:
        print(x, y)
        
        # false:
        print(x , y)
        ```

      - When specifying a keyword parameter or default parameter value in a function, do not use spaces on both sides of it

        ```python
        # true:
        def complex(real, imag=0.0)
        # false:
        def complex(real, imag = 0.0)
        ```

    - comment

      - Inline comments: inline comments are indicated by the` # `sign. Two spaces should be left between code and` # `, and one space should be left between` # `and comments, for example

        ```python
        x = x + 1  # Compensate for border
        ```

      - Functions and methods: The definition of each function should include the following:

T
thunder 已提交
48
        - Function description: Utility, input and output of function
T
thunder 已提交
49

T
thunder 已提交
50
        - Args: Name and description of each parameter
T
thunder 已提交
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
        - Returns: The meaning and type of the return value

        ```python
        def fetch_bigtable_rows(big_table, keys, other_silly_variable=None):
            """Fetches rows from a Bigtable.
        
            Retrieves rows pertaining to the given keys from the Table instance
            represented by big_table.  Silly things may happen if
            other_silly_variable is not None.
        
            Args:
                big_table: An open Bigtable Table instance.
                keys: A sequence of strings representing the key of each table row
                    to fetch.
                other_silly_variable: Another optional variable, that has a much
                    longer name than the other args, and which does nothing.
        
            Returns:
                A dict mapping keys to the corresponding table row data
                fetched. Each row is represented as a tuple of strings. For
                example:
        
                {'Serak': ('Rigel VII', 'Preparer'),
                 'Zim': ('Irk', 'Invader'),
                 'Lrrr': ('Omicron Persei 8', 'Emperor')}
        
                If a key from the keys argument is missing from the dictionary,
                then that row was not found in the table.
            """
            pass
        ```

    <a name="Appendix2"></a>

T
thunder 已提交
85
    ## Appendix 2: Document Specification
T
thunder 已提交
86

T
thunder 已提交
87
    ### 2.1 Overall Description
T
thunder 已提交
88 89 90 91 92 93 94 95 96 97 98

    - Document Location: If you add new features to your original Markdown file, please **Do not re-create** a new file. If you don't know where to add it, you can first PR the code and then ask the official in commit.

    - New Markdown Document Name: Describe the content of the document in English, typically a combination of lowercase letters and underscores, such as `add_New_Algorithm.md`

    - New Markdown Document Format: Catalog - Body - FAQ

      > The directory generation method can use [this site](https://ecotrust-canada.github.io/markdown-toc/ ) Automatically extract directories after copying MD contents, and then add `<a name='XXXX'></a> before each heading of the MD file

    - English and Chinese: Any changes or additions to the document need to be made in both Chinese and English documents.

T
thunder 已提交
99
    ### 2.2 Format Specification
T
thunder 已提交
100 101 102 103 104

    - Title format: The document title format follows the format of: Arabic decimal point combination-space-title (for example, `2.1 XXXX`, `2.XXXX`)

    - Code block: Displays code in code block format that needs to be run, describing the meaning of command parameters before the code block. for example:

T
thunder 已提交
105
      > Pipeline of detection + direction Classify + recognition: Vertical text can be recognized after set direction classifier parameters`--use_angle_cls true`.
T
thunder 已提交
106 107 108 109 110 111 112 113 114 115 116 117 118
      >
      > ```
      > paddleocr --image_dir ./imgs/11.jpg --use_angle_cls true
      > ```

    - Variable Rrferences: If code variables or command parameters are referenced in line, they need to be represented in line code, for example, above `--use_angle_cls true` with one space in front and one space in back

    - Uniform naming: e.g. PP-OCRv2, PP-OCR mobile, `paddleocr` whl package, PPOCRLabel, Paddle Lite, etc.

    - Supplementary notes: Supplementary notes by reference format `>`.

    - Picture: If a picture is added to the description document, specify the naming of the picture (describing its content) and add the picture under `doc/`.

T
thunder 已提交
119 120
    - Title: Capitalize the first letter of each word in the title.

T
thunder 已提交
121 122
    <a name="Appendix3"></a>

T
thunder 已提交
123
    ## Appendix 3: Pull Request Description
T
thunder 已提交
124

T
thunder 已提交
125
    ### 3.1 PaddleOCR Branch Description
T
thunder 已提交
126 127 128 129 130 131 132 133 134 135 136 137

    PaddleOCR will maintain two branches in the future, one for each:

    - release/x.x family branch: stable release version branch, also the default branch. PaddleOCR releases a new release branch based on feature updates and adapts to the release version of Paddle. As versions iterate, more and more release/x.x family branches are maintained by default with the latest version of the release branch.
    - dygraph branch: For the development branch, adapts the dygraph version of the Paddle dynamic graph to primarily develop new functionality. If you need to redevelop, choose the dygraph branch. To ensure that the dygraph branch pulls out the release/x.x branch when needed, the code for the dygraph branch can only use the valid API in the latest release branch of Paddle. That is, if a new API has been developed in the Paddle dygraph branch but has not yet appeared in the release branch code, do not use it in Paddle OCR. In addition, performance optimization, parameter tuning, policy updates that do not involve API can be developed normally.

    The historical branch of PaddleOCR will no longer be maintained in the future. These branches will continue to be maintained, considering that some of you may still be using them:

    - Develop branch: This branch was used for the development and testing of static diagrams and is currently compatible with version >=1.7. If you have special needs, you can also use this branch to accommodate older versions of Paddle, but you won't update your code until you fix the bug.

    PaddleOCR welcomes you to actively contribute code to repo. Here are some basic processes for contributing code.

T
thunder 已提交
138
    ### 3.2 PaddleOCR Code Submission Process And Specification
T
thunder 已提交
139

T
thunder 已提交
140
    > If you are familiar with Git use, you can jump directly to [Some Conventions For Submitting Code in 3.2.10](#Some_conventions_for_submitting_code)
T
thunder 已提交
141

T
thunder 已提交
142
    #### 3.2.1 Create Your `Remote Repo`
T
thunder 已提交
143 144 145 146 147 148 149 150 151 152 153 154 155 156 157

    - In PaddleOCR [GitHub Home]( https://github.com/PaddlePaddle/PaddleOCR ) Click the `Fork` button in the upper left corner to create a `remote repo`in your personal directory, such as ` https://github.com/ {your_name}/PaddleOCR`.

    ![banner](../banner.png)

    - Clone `Remote repo`

    ```
    # pull code of develop branch
    git clone https://github.com/{your_name}/PaddleOCR.git -b dygraph
    cd PaddleOCR
    ```

    > Clone failures are mostly due to network reasons, try again later or configure the proxy

T
thunder 已提交
158
    #### 3.2.2 Login And Connect Using Token
T
thunder 已提交
159

T
thunder 已提交
160
    Start by viewing the information for the current `remote repo`.
T
thunder 已提交
161 162 163 164 165 166 167

    ```
    git remote -v
    # origin    https://github.com/{your_name}/PaddleOCR.git (fetch)
    # origin    https://github.com/{your_name}/PaddleOCR.git (push)
    ```

T
thunder 已提交
168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186
    Only the information of the clone `remote repo`, i.e. the PaddleOCR under your username, is available. Due to the change in Github's login method, you need to reconfigure the `remote repo` address by means of a Token. The token is generated as follows:
    
    1. Find Personal Access Tokens: Click on your avatar in the upper right corner of the Github page and choose Settings --> Developer settings --> Personal access tokens,
    
    2. Click Generate new token: Fill in the token name in Note, such as 'paddle'. In Select scopes, select repo (required), admin:repo_hook, delete_repo, etc. You can check them according to your needs. Then click Generate token to generate the token, and finally copy the generated token.

    Delete the original origin configuration
   
    ```
    git remote rm origin
    ```
    
    Change the remote branch to `https://oauth2:{token}@github.com/{your_name}/PaddleOCR.git`. For example, if the token value is 12345 and your user name is PPOCR, run the following command
    
    ```
    git remote add origin https://oauth2:12345@github.com/PPOCR/PaddleOCR.git
    ```
    
    This establishes a connection to our own `remote repo`. Next we create a remote host of the original PaddleOCR repo, named upstream.
T
thunder 已提交
187 188 189 190 191

    ```
    git remote add upstream https://github.com/PaddlePaddle/PaddleOCR.git
    ```

T
thunder 已提交
192
    Use `git remote -v` to view current `remote warehouse` information, output as follows, found to include two origin and two upstream of `remote repo` .
T
thunder 已提交
193 194 195 196 197 198 199 200 201 202

    ```
    origin    https://github.com/{your_name}/PaddleOCR.git (fetch)
    origin    https://github.com/{your_name}/PaddleOCR.git (push)
    upstream    https://github.com/PaddlePaddle/PaddleOCR.git (fetch)
    upstream    https://github.com/PaddlePaddle/PaddleOCR.git (push)
    ```

    This is mainly to keep the local repository up to date when subsequent pull request (PR) submissions are made.

T
thunder 已提交
203
    #### 3.2.3 Create Local Branch
T
thunder 已提交
204

T
thunder 已提交
205 206
    First get the latest code of upstream, then create a new_branch branch based on the dygraph of the upstream repo (upstream).
    
T
thunder 已提交
207
    ```
T
thunder 已提交
208 209
    git fetch upstream
    git checkout -b new_branch upstream/dygraph
T
thunder 已提交
210
    ```
T
thunder 已提交
211 212 213 214 215 216 217 218 219 220 221
    
    > If for a newly forked PaddleOCR project, the user's remote repo (origin) has the same branch updates as the upstream repository (upstream), you can also create a new local branch based on the default branch of the origin repo or a specified branch with the following command
    >
    > ```
    > # Create new_branch branch on user remote repo (origin) based on develop branch
    > git checkout -b new_branch origin/develop
    > # Create new_branch branch based on upstream remote repo develop branch
    > # If you need to create a new branch from upstream, 
    > # you need to first use git fetch upstream to get upstream code
    > git checkout -b new_branch upstream/develop
    > ```
T
thunder 已提交
222 223 224 225 226 227 228

    The final switch to the new branch is displayed with the following output information.

    ```
    Branch new_branch set up to track remote branch develop from upstream.
    Switched to a new branch 'new_branch'
    ```
T
thunder 已提交
229 230 231
    
    After switching branches, file changes can be made on this branch
    
T
thunder 已提交
232
    #### 3.2.4 Use Pre-Commit Hook
T
thunder 已提交
233 234 235 236 237 238 239 240 241 242 243 244 245 246

    Paddle developers use the pre-commit tool to manage Git pre-submit hooks. It helps us format the source code (C++, Python) and automatically check for basic things (such as having only one EOL per file, not adding large files to Git) before committing it.

    The pre-commit test is part of the unit test in Travis-CI. PR that does not satisfy the hook cannot be submitted to PaddleOCR. Install it first and run it in the current directory:

    ```
    pip install pre-commit
    pre-commit install
    ```

     >  1. Paddle uses clang-format to adjust the C/C++ source code format. Make sure the `clang-format` version is above 3.8.
     >
     >  2. Yapf installed through pip install pre-commit is slightly different from conda install-c conda-forge pre-commit, and PaddleOCR developers use `pip install pre-commit`.

T
thunder 已提交
247
    #### 3.2.5 Modify And Submit Code
T
thunder 已提交
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266

     If you make some changes on `README.Md ` on PaddleOCR, you can view the changed file through `git status`, and then add the changed file using `git add`。

    ```
    git status # View change files
    git add README.md
    pre-commit
    ```

    Repeat these steps until the pre-comit format check does not error. As shown below.

    ![img](../precommit_pass.png)

    Use the following command to complete the submission.

    ```
    git commit -m "your commit info"
    ```

T
thunder 已提交
267
    #### 3.2.6 Keep Local Repo Up To Date
T
thunder 已提交
268 269 270 271 272 273 274 275 276

    Get the latest code for upstream and update the current branch. Here the upstream comes from section 2.2, `Connecting to a remote repo`.

    ```
    git fetch upstream
    # If you want to commit to another branch, you need to pull code from another branch of upstream, here is develop
    git pull upstream develop
    ```

T
thunder 已提交
277
    #### 3.2.7 Push To Remote Repo
T
thunder 已提交
278 279 280 281 282

    ```
    git push origin new_branch
    ```

T
thunder 已提交
283
    #### 3.2.7 Submit Pull Request
T
thunder 已提交
284 285 286 287 288

    Click the new pull request to select the local branch and the target branch, as shown in the following figure. In the description of PR, fill in the functions completed by the PR. Next, wait for review, and if you need to modify something, update the corresponding branch in origin with the steps above.

    ![banner](../pr.png)

T
thunder 已提交
289
    #### 3.2.8 Sign CLA Agreement And Pass Unit Tests
T
thunder 已提交
290 291 292 293 294 295 296

    - Signing the CLA When submitting a Pull Request to PaddlePaddle for the first time, you need to sign a CLA (Contributor License Agreement) agreement to ensure that your code can be incorporated as follows:

      1. Please check the Check section in PR, find the license/cla, and click on the right detail to enter the CLA website

      2. Click Sign in with GitHub to agree on the CLA website and when clicked, it will jump back to your Pull Request page

T
thunder 已提交
297
    #### 3.2.9 Delete Branch
T
thunder 已提交
298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319

    - Remove remote branch

      After PR is merged into the main repo, we can delete the branch of the remote repofrom the PR page.
      You can also use `git push origin:branch name` to delete remote branches, such as:

  ```
    git push origin :new_branch
  ```

- Delete local branch

  ```
      # Switch to the development branch, otherwise the current branch cannot be deleted
      git checkout develop
      
      # Delete new_ Branch Branch
      git branch -D new_branch
  ```

    <a name="Some_conventions_for_submitting_code"></a>

T
thunder 已提交
320
    #### 3.2.10 Some Conventions For Submitting Code
T
thunder 已提交
321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349

    In order for official maintainers to better focus on the code itself when reviewing it, please follow the following conventions each time you submit your code:

    1)Please ensure that the unit tests in Travis-CI pass smoothly. If not, indicate that there is a problem with the submitted code, and the official maintainer generally does not review it.
    
    2)Before submitting a Pull Request.
    
    - Note the number of commits.

      Reason: If you only modify one file and submit more than a dozen commits, each commit will only make a few modifications, which can be very confusing to the reviewer. The reviewer needs to look at each commit individually to see what changes have been made, and does not exclude the fact that changes between commits overlap each other.
      
      Suggestion: Keep as few commits as possible each time you submit, and supplement your last commit with git commit --amend. For multiple commits that have been Push to a remote warehouse, you can refer to [squash commits after push](https://stackoverflow.com/questions/5667884/how-to-squash-commits-in-git-after-they-have-been-pushed ).

    - Note the name of each commit: it should reflect the content of the current commit, not be too arbitrary.


    3) If you have solved a problem, add in the first comment box of the Pull Request:fix #issue_number,This will automatically close the corresponding Issue when the Pull Request is merged. Key words include:close, closes, closed, fix, fixes, fixed, resolve, resolves, resolved,please choose the right vocabulary. Detailed reference [Closing issues via commit messages](https://help.github.com/articles/closing-issues-via-commit-messages).
    
    In addition, in response to the reviewer's comments, you are requested to abide by the following conventions:
    
    1) Each review comment from an official maintainer would like a response, which would better enhance the contribution of the open source community.
    
    - If you agree to the review opinion and modify it accordingly, give a simple Done.
    - If you disagree with the review, please give your own reasons for refuting.
    
    2)If there are many reviews:
    
    - Please give an overview of the changes.
    - Please reply with `start a review', not directly. The reason is that each reply sends an e-mail message, which can cause a mail disaster.