Announcement

Collapse

Natural Science 301 Guidelines

This is an open forum area for all members for discussions on all issues of science and origins. This area will and does get volatile at times, but we ask that it be kept to a dull roar, and moderators will intervene to keep the peace if necessary. This means obvious trolling and flaming that becomes a problem will be dealt with, and you might find yourself in the doghouse.

As usual, Tweb rules apply. If you haven't read them now would be a good time.

Forum Rules: Here
See more
See less

“DeNovo Origin of Human Protein-Coding Genes” or How Some New Genes Come About.

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    RESULTS: Expression analysis by RNA-Seq 1


    All cells have the same gene content, but cells in different parts of an organism’s body express different genes, which make different kinds of proteins.



    So what were these new genes doing? Where were they being expressed?



    To an extent, the authors could address those two questions from databases used to locate the denovo genes. However, mostly they used a different set of databases yet again. This time, it was RNA-seq databases. These store data collected via a new methodology called “Whole Transcriptome Shotgun Sequencing” which can take snapshots in time of the various RNAs - messenger, transfer, ribose, and other fragments - circulating in cells at various times.

    

All these RNAs constitute what is called the transcriptome because they are created at, and associated with transcription. The transcriptome changes enormously in time as genes switch on and off. And naturally the transciptome varies from tissue to tissue.

    I couldn’t easily locate a description of RNA-seq which gave me some kind of reasonable understanding of how it works. The technique is very new and very powerful. So rather than attempting to describe it, I’ll simply report on the results of their studies with respect to where those denovo genes were found, and how much they expressed.


    By searching RNA-seq databases for 11 tissues: adipose, whole brain, cerebral cortex, breast, colon, heart, liver, lymph node, skeletal muscle, lung and testes, the scientists were able to determine expression levels of the genes in each tissue for for 53 of the 60 denovo genes. For the remaining 7 genes, they found the expression data in some of the other databases used previously. Possibly a reason for not finding all genes on RNA-seq was because the expression levels of denovo genes is generally very low and for these 7 specific genes, the levels were the lowest.

    What they did find however, was that the expression level for all tissues was highest in the testes with the cerebral cortex next. And the tissues that had the highest number of denovo genes expressed was the cerebral cortex, followed by the testes. The expression levels can be seen here. The tissues with the highest numbers of genes expressed can be seen at figures 2b and 2c here.

    
Only when it came to the tissues with the greatest number of genes having the highest expression levels did the cerebral cortex and testes not stand out. Then the adipose, the lung and the breast did better. 



    Two of these denovo genes were intriguing. One labelled ENSG00000187488 was very highly expressed in the testes, and the other labelled ENSG00000206028 was very highly expressed in the cerebral cortex. In fact in the other tissues the latter gene only registered in the brain, lymph node and lung, and there at extremely low levels relative the the cerebral cortex. ENSG00000187488 did register in all other tissues, often markedly so. Nevertheless, its expression level in the testes was twice that for the next highest tissue, the heart.

    Having located these putative denovo genes and determined where they are expressed, the next question the authors tackled was whether or not these genes were under selective pressure.


    To be continued ....
    Last edited by rwatts; 09-17-2014, 03:39 PM.

    Comment


    • #17
      RESULTS: Evolutionary rates of the new genes on the human lineage 1

      If the new genes were under selective pressure, the authors reasoned that they would likely have acquired new functions. The reason for this is that any mutation which lessened the viability of the function would be weeded out of the gene pool, and any mutation which improved it, would have spread.

      So to see if the new genes were under selective pressure they examined the rate of sequence evolution of the new genes. A lesser rate in the human lineage compared to the chimp lineage would mean selective pressure was involved within the human lineage, the chimp lineage sequences not having evolved to the stage of being functional genes. That is, they could take on a greater mutational load.



      Substitution rates were calculated for the human denovo genes and the orthogolous chimp sequences. The average genome wide substitution rates were also calculated using a very conservative methodology as described in the materials and methods section of the paper. The human and chimp rates were then compared with the genome wide rates.



      The results can be seen in figure 3 at the link in the OP.

      The denovo human genes evolved faster than the older, established genes, but slightly less than the chimp orthologous sequences. They think this is because the chimp sequences were not under selective pressure - they were still “junk” sequences. Another thing to note is that there is evidence that new genes are under relaxed purifying selection, and so would show higher substitution rates. Purifying selection is the process by which harmful mutations are weeded out, and if this is process is relaxed, then genes are going to show more mutations in a given time, compared to those that are under stronger purifying selection.

      Having described the methods and the data they used to identify the denovo genes the authors next turn their attention to discussing the results.

      To be continued ...


      Comment


      • #18
        DISCUSSION

        The scientists begin their discussion by noting the results of their investigation - 60 denovo genes of which 59 are shown to be fixed. This number gives a rate of denovo gene generation of around 10 to 12 genes per million years, this being much higher than some previous estimates. And of course, it’s way higher than what was thought come 40 years ago by Jacob and Ohno.

        This rate is still lower than the estimates for gene generation via duplication, whereby a gene duplicates and the two then diverge. There, the rate estimates can be up to three times higher.

        However, as the authors point out, their estimate may well be too low. They used a very conservative filter to locate the denovo genes. They needed to find open reading frames in the human genome that had clear orthologs in chimps and orangutans, but which did not have reading frames in those primates. The genome analysis for the latter two primates is not as complete as is that for the human genome and so matching sequences may well be missing. Other reasons were provided for the underestimate and perhaps the most important was the number of protein databases used to show that the purported denovo genes were in fact being translated (i.e. making proteins/polypeptides). Two major databases were used however these are limited by the fact that an awful lot of proteins are missing from them. Various proteins are manufactured at different times within a cell, and various cells manufacture different proteins. So what exists in these data bases at the moment depends very much on exactly what research has been done to date. As an example of the effect of this limitation, 56 possible denovo genes were eliminated from the study because no associated data could be found in the protein databases. In a few more years time, a repeat study may well include these additional genes.

        Possible functions for some of these genes were suggested by the RNA-seq data. Many of them, as an example, showed higher expression in the cerebral cortex than in any other tissue. The authors suggest that the results of their study may well feed into theories which attempt to understand how the human brain came to be.

        The researchers note that many new genes, even those created via the other mechanisms such as duplication, exon shuffling and so on, end up having a major function in the testes. In this research, the testes proved crucial as well. They note that there is some speculation that the testes are a crucible for the birth of new genes. A lot can go on in the testes as a source of sperm competition, sexual conflict and reproductive isolation. Furthermore, the natures of certain cells in the testes such as spermatocytes and spermatids are such that the initial expression of new genes is greatly facilitated as are the levels of transcription machinery components, allowing “promiscuous transcription on non functional sequences, including denovo originated genes”. In other words, these cells are such that a lot of transcription is going on, not just of regular and denovo genes, but also of the “junk” regions between the genes. Once bits and pieces of junk regions get transcribed, there is the possibility that some of the transcribed sequences will make it through to translation and the resulting polypeptide (protein) then becomes open to selection. That is, junk sequences can become denovo genes.



        The idea is that some of these denovo genes then evolve to become regular genes, taking on a more complex structure and becoming a firm part of an organism's expressed genome.

        The authors conclude by noting several caveats to their results. They emphasize the limitations of the databases they used, looking forward to larger datasets in the future. Existing databases also have problems with “contamination”. The reference they cite notes:-



        “The study by Knowles and McLysaght (2009) does have some important limitations. First, any gene classified as “known” by Ensembl was assumed to be accurately annotated, even though some of these genes have scant support (one or two cDNA sequences with no other supporting evidence). Gene prediction, even with cDNAs, is an unsolved problem, and the catalogs of “known” genes have been found to contain significant numbers of spurious annotations (Clamp et al. 2007). Single-exon genes and genes that overlap other genes are especially difficult to predict correctly. Knowles and McLysaght's requirement of supporting peptides from proteomics experiments should help to alleviate this problem, but such data have their own limitations, for example, relating to uniqueness of peptides and sample contamination. Indeed, one of the three genes identified in the study, associated with the peptide C22orf45 and called ENSG00000204626 in Ensembl, appears dubious—it is supported by only a single spliced cDNA sequence (AK127211) and is predicted to have an intron within a long 3′ UTR, which is extremely rare in eukaryotic genes (Nagy and Maquat 1998). This gene is not present in the RefSeq, UCSC Genes, Vega, or CCDS gene sets, and it appears to have been recently removed from Ensembl. This gene does have two supporting peptides and may truly be functional, but more supporting evidence would be welcome.”

        Another major caveat noted by the authors is that the expression levels of many of these genes are very low, indicating that these genes may have only weak biological roles. Furthermore, the exact functions of many of these genes are not well established.


        Thus the paper ends. 

The following sections deal with materials and methods, supporting information, acknowledgements, author contributions and references.

The supporting information is always worth a look. And the references section identifies additional relevant research. Often the articles can also be found on line.



        The end.



        Comment

        Related Threads

        Collapse

        Topics Statistics Last Post
        Started by eider, 04-14-2024, 03:22 AM
        43 responses
        137 views
        0 likes
        Last Post eider
        by eider
         
        Started by Ronson, 04-08-2024, 09:05 PM
        41 responses
        166 views
        0 likes
        Last Post Ronson
        by Ronson
         
        Working...
        X