wrds++

cb55094f · rictjo · 368fb395 · cb55094f
隐藏空白更改
内联并排

Showing with 6 addition and 7 deletion

README.md README.md +6 -7

未找到文件。
--- a/README.md
+++ b/README.md
@@ -416,11 +416,11 @@ np.array([['HSPA1A','HSPA1B', 'HSPA1L', 'IGFBP7', 'TMSB10', 'TMSB4X', 'RPLP2',
 ```
 which account for the top `64` obesity transcripts. We note that some of these are shared with diabetics. If we study which ones describes the `Marginal` or `Absent` genes we can see that there are some that we might want to exclude for technical reasons. We will leave that excercise for the curious reader.

-# Example 8: Latent data assumptions. Building a Parent-Child list
+# Example 8: Latent grouping assumptions. Building a Parent-Child list

-So you are sitting on a large amount of groupings that you have a significance test for. Testing what you are interested in per analyte symbol/id. Since you will conduct a large amount of tests there is also a large risk that you will technically test the same thing over and over again. In order to remove this effect from your group testing you could employ my `HierarchicalEnrichment` routine, but then you will need a relationship file describing how to build the group DAG Hierarchy. The relationship file contains a `parent id`, a `tab delimiter` and a `child id` on each line. The routine that I demonstrate here uses a divide-and-conquer type approach which means that a subgroup, or child, is only assigned if it is fully contained within the parents definition. You can create redundant assignments by setting `bSingleDescent=False` but it is not the recommended default setting.
+So you are sitting on a large amount of groupings that you have a significance test for. Testing what you are interested in per analyte symbol/id. Since you will conduct a large amount of tests there is also a large risk that you will technically test the same thing over and over again. In order to remove this effect from your group testing you could employ my `HierarchicalEnrichment` routine, but then you would also need a relationship file describing how to build the a group DAG Hierarchy. This can be done with a relationship file that contains a `parent id`, a `tab delimiter` and a `child id` on each line. The routine that I demonstrate here uses a divide-and-conquer type approach to construct that information, which means that a subgroup, or child, is only assigned if it is fully contained within the parents definition. You can create redundant assignments by setting `bSingleDescent=False`, but it is not the recommended default setting.

-Construction of the downward node relationships can be done with my `build_pclist_word_hierarchy` routine. Ok, Enough talk. Let us assume that you are sitting on the following data:
+Construction of the downward node relationships can be done with my `build_pclist_word_hierarchy` routine. Ok. Enough talk. Let us assume that you are sitting on the following data:
 ```
    portfolios = { 'PORT001' : ['Anders EQT' ,['AAPL','GOOG','IBM','HOUSE001','OTLY','GOLD','BANANAS'] ],
                   'PORT002' : ['Anna EQT'   ,['AAPL','AMZN','HOUSE001','CAR','BOAT','URANIUM','PLUTONIUM','BOOKS'] ],
@@ -437,7 +437,7 @@ Then you might have noticed that some of the portfolios seem to contain the othe
 ```
 which will return the list you need. You can now save it as a node relationship file and use that in my DAG construction routine.

-Lets instead assume that you want the read latent codings from a [file](https://gist.githubusercontent.com/richardtjornhammar/6780e6d99e701fcc83994cc7a5f77759/raw/2d9cb00540960491e70883cb851ca16e4f254ee9/new_compartment_genes.gmt), then you could issue :
+Lets instead assume that you want to read the analyte groupings from a [file](https://gist.githubusercontent.com/richardtjornhammar/6780e6d99e701fcc83994cc7a5f77759/raw/2d9cb00540960491e70883cb851ca16e4f254ee9/new_compartment_genes.gmt), then you could issue :
 ```
    import os
    os.system('wget https://gist.githubusercontent.com/richardtjornhammar/6780e6d99e701fcc83994cc7a5f77759/raw/2d9cb00540960491e70883cb851ca16e4f254ee9/new_compartment_genes.gmt')
@@ -445,9 +445,8 @@ Lets instead assume that you want the read latent codings from a [file](https://
    filename = 'new_compartment_genes.gmt'
    pcl , pcd = build_pclist_word_hierarchy ( filename = filename , bReturnList=True )
 ```
-If there are latent assumptions for some groupings then you can also read them out by checking what the definition referes to (here we already know that there is one for the mitochondrion definition):
+If there are latent assumptions for some groupings then you can read them out by checking what the definitions refers to (here we already know that there is one for the mitochondrion definition):
 ```
-    pcl , pcd = build_pclist_word_hierarchy ( filename = filename , bReturnList=True )
    for item in pcl :
        if  'mito' in pcd[item[1]][0] or 'mela' in pcd[item[1]][0] :
            print ( pcd[item[0]][0] , ' -> ' , pcd[item[1]][0] )
@@ -461,7 +460,7 @@ melanosome membrane   ->  mitochondrion
 full cell  ->  mitochondrial outer membrane
 full cell  ->  mitochondrial intermembrane space
 ```
-That the definition for the mitochondrion is fully contained within the melanosome membrane definition and so testing for that group should be accounted for when testing the parent. We know that the melanosome membrane is associated with sight and that being diabetic is associated with mitochondrial dysfunction, but also that diabetic retinopathy affects diabetics, so there might be a knowledge based, or True, genetic connection relating these two spatially distinct regions of the cell.
+the definition for the mitochondrion is fully contained within the melanosome membrane definition and so testing for that group should be account for the mitochondrion. This can be done with the `HierarchicalEnrichment` routing exemplified above. We know that the melanosome membrane is associated with sight and that being diabetic is associated with mitochondrial dysfunction, but also that diabetic retinopathy affects diabetics, so there might be a knowledge based, or True, genetic connection relating these two spatially distinct regions of the cell.

 # Notes