Google Ads

Google Ads

Bible Wheel Book

+ Reply to Thread
Page 25 of 54 FirstFirst ... 1521222324252627282935 ... LastLast
Results 241 to 250 of 540
  1. #241
    Join Date
    Jul 2008
    Location
    London UK
    Posts
    663

    Searching for Individual words

    Here is a procedure that I am going to develop, for identifying possible WORDS within DNA

    1. Get a sample of mRNA or cDNA ideally because these have a simple START STOP pattern
    2. Identify and extract the non-coding regions
    3. Convert non-coding DNA into aminoacids
    4. Use a loop to extract all 5 letter sequences where all letters are different
    5. Count frequency of each 5 letter sequence

    If a 5 letter sequence occurs with high frequency then it may be a WORD, such as "Elohim"

    The Results

    I took the Mitochondrial DNA for Jews which is 16566 bases long, and measured the frequency of every possible 5-letter sequence. Here are the results -
    1SSSO 2 Position = 126
    EKPSL 2 Position = 631
    FIAPT 2 Position = 744
    FYSKD 2 Position = 877
    HPSPH 2 Position = 1173
    HRTIP 2 Position = 1213
    ILVQL 2 Position = 1466
    ISTIN 2 Position = 1586
    ITLLT 2 Position = 1613
    KLKIK 2 Position = 1746
    LGLLT 2 Position = 2005
    LITIL 2 Position = 2078
    LITTQ 2 Position = 2081
    LLGLL 2 Position = 2112
    LLPHS 2 Position = 2147
    LNYNI 2 Position = 2216
    LTSTS 2 Position = 2441
    NNYIT 2 Position = 2701
    NPLVN 2 Position = 2721
    NTNYL 2 Position = 2830
    OKNPP 2 Position = 2908
    PHSSP 2 Position = 3091
    PLVNL 2 Position = 3217
    PNTNY 2 Position = 3262
    PSSTP 2 Position = 3456
    PTPLI 2 Position = 3493
    RILVQ 2 Position = 3833
    SNLNY 2 Position = 4280
    SSSTP 2 Position = 4457
    SSTPP 2 Position = 4463
    TILIL 2 Position = 4700
    TPSOP 2 Position = 4888
    TTQLS 2 Position = 5029

    There are no 5 letter sequences occurs more than 2 times, which is a bit surprising considering that the DNA is 16566 bases long.

    Here are the results for 6 letter sequences
    NPLVNL 2 Position = 2720
    PNTNYL 2 Position = 3261
    PSSTPP 2 Position = 3455
    RILVQL 2 Position = 3832

    There are no 7, 8 or 9 letter sequences that occur more than once

    So
    IF the aminoacids do map onto the letters of an alphabet,
    and IF Mitochondrial DNA contains language areas, THEN the language has a distinct absence of 7, 8 and 9 letter words - which is not characteristic of Hebrew.

    Then I split up the DNA into coding and noncoding areas based on the start and stop codons.

    The Coding areas of the DNA contained the following 5 letter sequences that occurred more than once -

    FYSKD 2 Position = 423
    HRTIP 2 Position = 596
    ILVQL 2 Position = 733
    LGLLT 2 Position = 1021
    NNYIT 2 Position = 1434
    NPLVN 2 Position = 1447
    NTNYL 2 Position = 1503
    PLVNL 2 Position = 1706
    PNTNY 2 Position = 1732
    PTPLI 2 Position = 1850
    RILVQ 2 Position = 2015
    SSSTP 2 Position = 2369
    TPSOP 2 Position = 2597

    The noncoding areas contained the following 5 letter sequences occuring more than once -

    HPSPH 2 Position = 590
    LLGLL 2 Position = 1032
    LLPHS 2 Position = 1052
    PHSSP 2 Position = 1450
    TTQLS 2 Position = 2342

    So, in the noncoding areas only 5 "5-letter-words" appear more than once. I do not think that there is a language here.

    Repeating the Procedure with Yeast

    Yeast has 227020 bases. I divided up the yeast DNA into non-coding and coding areas based on the Start and Stop codons. Then I extracted every 5 letter sequence and recorded the frequency of each one. Here are the results -

    Coding Area: The following 5 letter sequences occur more than 5 times -
    ARRTT 8 Position = 3028
    FARRT 10 Position = 8078
    LLLLL 12 Position = 22355
    RFARR 7 Position = 33601
    RTTRF 7 Position = 35949
    SLLLL 6 Position = 38301
    TRFAR 9 Position = 42503
    TTRFA 9 Position = 43029

    Non Coding Areas: There are no 5 letter seqences occuring more than 4 times.
    CSUCS 3 Position = 2717
    FPLLL 3 Position = 5147
    LFLLL 3 Position = 10771
    LLFLL 3 Position = 11408
    LLLFL 3 Position = 11519
    LLLLL 4 Position = 11545
    LLLLQ 3 Position = 11551
    LLLSL 3 Position = 11575
    LRLLF 3 Position = 12264
    QLLLL 4 Position = 15724
    STSFL 3 Position = 20208
    SUCSU 3 Position = 20273
    UCSUC 3 Position = 21943

    So the noncoding areas of yeast have fewer "5-letter-words" than the coding areas!

    I am going to check over the computer code that I used to get these results, to make sure it is working properly, then I will make the software available as a download. The software will simply enable you to count the frequency of each "word" of a chosen length in any sample of DNA. If the DNA contains a language, then the "word" frequencies should indicate this.
    Last edited by Craig.Paardekooper; 07-21-2012 at 05:20 AM.

  2. #242
    Join Date
    Jul 2008
    Location
    London UK
    Posts
    663

    A natural sequence of Aminoacids

    Rather than trying to discover an aminoacid alphabet by analysig DNA, it would be much easier to see if the aminoacids fall naturally into a sequence. We can arrange the amino acids according to -

    1. their mass
    2. the order inwhich their codons appear when you cycle through the genetic code.

    Shcherbak discovered that if all the aminoacids are arranged in a circle, then there is a perfect balance of masses - see here - www.craigdemo.co.uk/circleoflife.pdf. He thought that it was very odd that all the aminoacids should sum in this way - almost as if they were conceived in one go. Shcherbak's pattern may hold the key to finding the aminoacid alphabet.

    What is interesting is that all the aminoacids are together and form this perfect balance. The order of the aminoacids in the circle is determined solely by cycling through the bases, in the order T C A G.

    Cycling through the bases in the order TCAG generates the 64 codons in a particular order - consequently it generates the aminoacids in a particular order - an aminoacid alphabet

    There are only 4 x 3 x 2 x 1 ways of choosing the order in which you cycle through the bases, so there can be only 24 different possible orders of the codons produced - which would give us 24 different aminoacid alphabets.

    It is then quite simple to test each alphabet to see which one generates the most meaningful translation.

    So I will create a program that cycles through the bases in each of the 24 possible ways, each time generating the codons in a particular order. This will give me the 20 aminoacids in a particular order each time. So I will end up with 24 aminoacid alphabets - and can then test each one to see if it produces meaningful words.

  3. #243
    Join Date
    Jul 2008
    Location
    London UK
    Posts
    663

    Finding words in the Nucleotide Sequence

    The the scientists who carried out the experiments to show that there is a language structure in DNA, did not convert the DNA into aminoacids first - rather they simply searched for words in the raw DNA sequence, consisting of A,C,T and G.

    So I will do this too. If the letters of the language are codons rather than individual nucleotides, then we should find that high frequency words reflect this by being a multiple of three nucleotides long.


    Results

    Here are the results 5-letter sequences in Non coding areas of Jewish Mitochondrial DNA

    Total number of 5 letter words = 7703

    Total Number of Different Words = 881

    Average Frequency of Words = 8.74347332576617


    Here are the results for 5-letter sequences in the coding areas of Jewish Mitochondrial DNA

    Total number of 5 letter words = 7706

    Total Number of Different Words = 896

    Average Frequency of Words = 8.60044642857143


    As you can see, there are a similar number of unique words appearing in both coding and noncoding areas, which might suggest that non-coding areas contain just as much information as coding areas.

    What might be interesting would be to see what words are unique to coding areas but not found in noncoding areas, and vica versa?

    Also, what would be the results for sequences of length 4 to 12 bases long?

    Also what are the most frequently occurring words. Perez found that the frequency of all three letter words follows a mathematical pattern. Perhaps similar patterns will be found for larger sequences?
    Last edited by Craig.Paardekooper; 07-21-2012 at 02:57 PM.

  4. #244
    Join Date
    Jul 2008
    Location
    London UK
    Posts
    663

    Update on Search for Alphabet sequence in XChromosome

    So far I have searched 72 million bases of the X Chromosome (which is about half) in search of a sequence of all 20 standard aminoacids . I have analysed both overlapping codons and nonoverlapping codons in all three forward frames, and have not found a single instance yet of the complete sequence of 20 aminoacids. the search continues

  5. #245
    Join Date
    Jul 2008
    Location
    London UK
    Posts
    663

    Odd Numerical Patterns Found in Archaea Microbes - the oldest organisms on earth

    There is a 30 letter sequence that occurs 68 times in the DNA of Archaea

    Here is the sequence - GTTGAAATCAGACTAATGTAGGATTGAAAG

    The number on the right is the position of the start of the sequence in the entire genome

    Total = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 85039
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665146
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665215
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665283
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665351
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665418
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665488
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665556
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665624
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665692
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665760
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665827
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665896
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665963
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666031
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666099
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666167
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666235
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666302
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666371
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666439
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666507
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666576
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666642
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666715
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666784
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666852
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666921
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666990
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667057
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667126
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667196
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667263
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667332
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667400
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667469
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667538
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667606
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667675
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667742
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667813
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667882
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667950
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668018
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668086
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668154
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668226
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668295
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668363
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668431
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668499
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668567
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668635
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668703
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668771
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668839
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668908
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668977
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669046
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669114
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669182
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669250
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669318
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669386
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669455
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669523
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669599
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669668

    When I measured the difference between the beginning of one sequence and the beginning of the next - this is what I got -

    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 85039 - difference = 85039
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665146 - difference = 580107
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665215 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665283 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665351 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665418 - difference = 67
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665488 - difference = 70
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665556 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665624 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665692 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665760 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665827 - difference = 67
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665896 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665963 - difference = 67
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666031 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666099 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666167 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666235 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666302 - difference = 67
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666371 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666439 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666507 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666576 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666642 - difference = 66
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666715 - difference = 73
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666784 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666852 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666921 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666990 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667057 - difference = 67
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667126 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667196 - difference = 70
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667263 - difference = 67
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667332 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667400 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667469 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667538 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667606 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667675 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667742 - difference = 67
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667813 - difference = 71
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667882 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667950 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668018 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668086 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668154 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668226 - difference = 72
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668295 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668363 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668431 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668499 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668567 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668635 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668703 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668771 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668839 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668908 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668977 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669046 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669114 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669182 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669250 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669318 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669386 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669455 - difference = 69
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669523 - difference = 68
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669599 - difference = 76
    GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669668 - difference = 69

    So this 30 letter sequence repeats itself 68 times, and each time is separated by 68 bases !

    GCGCTATTCTTTTACTTCTCGCAATGCGGTAATCAATGGTTGAAATCAGACTAATGTAGGATTGAAAGAAGCGCTTGGCTGCTTACTGCGTGGCTCAGTACTGCGCGTTGAAATCAGACTAATGTAGGATTGAAAGGAGGTGGTTAGAGACATCTTGAATAAAATGATGGGCAAGTTGAAATCAGACTAATGTAGGATTGAAAGCTTATTGACTTCTTCAATTCTCCCTTTTAGATATAGATGTTGAAATCAGACTAATGTAGGATTGAAAGGCATAACAGAGTGCGCCCTCCGTGCCTCCCGTTCCGTGTTGAAATCAGACTAATGTAGGATTGAAAGATGGGGGAACTGGCTCGAACTCGCTGATTTCTTCGAAGAGTTGAAATCAGACTAATGTAGGATTGAAAGCAAATTAGCAAGGGCTACAACGATACCATCATACGGAGTTGAAATCAGACTAATGTAGGATTGAAAGATCAACATCGGTCCGGGACAATATCAGTTCAAGGCGAGGTTGAAATCAGACTAATGTAGGATTGAAAGCTATGAGTACTGTAAAAGGAATTTAAATTTAAAATAGGGTTGAAATCAGACTAATGTAGGATTGAAAGTTCTAATGCCGTGAAGTATGCTCTTTTTGCTTCGGGAGGTTGAAATCAGACTAATGTAGGATTGAAAGGATTGTGGCGGTGATTTGGTGGACTTAGGCAACAATGAGTTGAAATCAGACTAATGTAGGATTGAAAGTACTTCAGACAATACAATATCCCAATCGAACAACACAGTTGAAATCAGACTAATGTAGGATTGAAAGACGGTGAACGTTGATCATCCTTTGATTGTCTTGCTGGTTGTTGAAATCAGACTAATGTAGGATTGAAAGTGTCCATCAAAAGGCCTAACGGAGGATGAGGTTCAAGAGTTGAAATCAGACTAATGTAGGATTGAAAGTTCCCACTTGATGATACCCTGACCTTCATTCTTTTCGAGTTGAAATCAGACTAATGTAGGATTGAAAGTTGTGGTATAATTAATAGTTCTCATCTGCCATTTTTTATGTTGAAATCAGACTAATGTAGGATTGAAAGGCTCGCAAGGAAATGCTGAACGGAAATAGGCACACCGTTGAAATCAGACTAATGTAGGATTGAAAGACTCTTTCACCTCCTAACAGCTTTTTTCTATATACTGTCACTGGTTGAAATCAGACTAATGTAGGATTGAAAGTCTATACCACCTCAAAAAATGTTTAAGGAAGCAGAATCTGTTGAAATCAGACTAATGTAGGATTGAAAGTTTGGGGTAACTGGCTTGAGCTTGCAGACTTCTTAGAGGTTGAAATCAGACTAATGTAGGATTGAAAGGCGAGCATGAACGGGAACTGCACGAATCCGCGGGCTCCCGTTGAAATCAGACTAATGTAGGATTGAAAGATAGCTGCAAAAATCTTTTGCATCCCTGCAAGTCCCCTTGTTGAAATCAGACTAATGTAGGATTGAAAGGCAAATAATCTATAAACTGTCTCGCTAAGTCGAAAGCGTTGAAATCAGACTAATGTAGGATTGAAAGCTCGAGTTCCCTGCTCATCCTGCTATATCTCTCAAAAATGTTGAAATCAGACTAATGTAGGATTGAAAGAGGATTACAGCCGGAATCCTCAGCCAGCCGACGAAGTAGAGTTGAAATCAGACTAATGTAGGATTGAAAGAGATTGCCCCCCTCAGCATGGCGGGGGACATTCGCGAGTTGAAATCAGACTAATGTAGGATTGAAAGTATTGTCAACCTTTTAATTCGCTCTTTCAATGCCCTAAAGTTGAAATCAGACTAATGTAGGATTGAAAGTAGAGGAATGGGAAATTCTGGGGAAGGATGACGAAAAAGTTGAAATCAGACTAATGTAGGATTGAAAGCTAAGAAGGCTTATAGTTGAGCACAGACACTTGGATTTAGTTGAAATCAGACTAATGTAGGATTGAAAGTAGGGGGGAGGTAGATGAGGATTGAAAGAGGGAATCTCAGTTGAAATCAGACTAATGTAGGATTGAAAGGAGAAGAGCTGACGTTCACTGAAGAGGACGGAAACAACGTTGAAATCAGACTAATGTAGGATTGAAAGCTGGGGTAACTTTCCGAAGTTCAAATGCAGGCATGAACTGTTGAAATCAGACTAATGTAGGATTGAAAGGATTCTAAGTTAAAATAGATGGGTTGAATAAAAAAAGGTTGAAATCAGACTAATGTAGGATTGAAAGGGACATGGGGCAATAAGCAGCCATTCAGCCCAGAGCCATCTGTTGAAATCAGACTAATGTAGGATTGAAAGATAGAGTTCTATGTTGTAGTATGTTCCTGCAGAGAGATCGTTGAAATCAGACTAATGTAGGATTGAAAGCATGCTCTTGCTGAGCCATAGCCAAGCCCATGAAGCGGGTTGAAATCAGACTAATGTAGGATTGAAAGACGTAGATTTGACATTCTCAGCGGGCGCAACGAGGCAGGTTGAAATCAGACTAATGTAGGATTGAAAGTTACATATATCTACAAAAGCTGCACAATCTGCAGCATCGTTGAAATCAGACTAATGTAGGATTGAAAGAATAAGAGCGGGCGAGTATGGCGACATGAGCGAGGCAAGTTGAAATCAGACTAATGTAGGATTGAAAGCTTCGAATGGGATAAGGGCTTAGAGGATTTGGCCGAATGGGAGTTGAAATCAGACTAATGTAGGATTGAAAGCATGCTGAAAACTACATTGTCTTCAAGGGCGTTTCTTATGTTGAAATCAGACTAATGTAGGATTGAAAGAATCCTGTGACAGTGCAACGTATTACTTTACTATTCATGTTGAAATCAGACTAATGTAGGATTGAAAGGACTCTGTGTGTGAGGTGGTGGTAGAGATGGAGCAAAAGTTGAAATCAGACTAATGTAGGATTGAAAGTTTTGAGAAAATAAGACGTATAAACCTTTGTAACAATGGTTGAAATCAGACTAATGTAGGATTGAAAGAAATTCCTCGGACGGGATAAGTTCAGCAAAGTTATCGCGTTGAAATCAGACTAATGTAGGATTGAAAGGTTTGATGTGCGAATGTGGATACTCGAAGAGGTATGTTGTTGAAATCAGACTAATGTAGGATTGAAAGTTGAACCATATCTCGTTCTGAGGGAAACTCATCTGAACGTTGAAATCAGACTAATGTAGGATTGAAAGGTTCCAGCCTGTGATAATAAAAAGCACCACAGCACCGAGTTGAAATCAGACTAATGTAGGATTGAAAGACATAAATCATTATCCTGTCGAAGTTTCAGATTATTTGGTTGAAATCAGACTAATGTAGGATTGAAAGGATATACCATCAACTGAATTACCTGAATTGCCATTAACAGTTGAAATCAGACTAATGTAGGATTGAAAGGTTGCGTATGCGCCCAGATATGGTACGAACTGTTCAAAAGTTGAAATCAGACTAATGTAGGATTGAAAGCGCTGGGACGATAATCAGACTTGTAAACGCTGATGCTACGTTGAAATCAGACTAATGTAGGATTGAAAGCGCTGATACAGCTTCAGCGGAGTCATACGCTTAATCCTGTTGAAATCAGACTAATGTAGGATTGAAAGTTCTATTTCTCGTCCGATATCCCTACTGTAGTGTCTTAGTTGAAATCAGACTAATGTAGGATTGAAAGAATTTATTGGGCCGATGGCAACACGCCCAGCTGAGCTTGTTGAAATCAGACTAATGTAGGATTGAAAGTTTTGTCCCGCTGAGCACGTGCACGCTGCGAATGATTGGTTGAAATCAGACTAATGTAGGATTGAAAGTTCATTCTCTCTTTCAATCTCAGCAATCTTACTCATTAGTTGAAATCAGACTAATGTAGGATTGAAAGTACTTGCGGAGCTACTACGACAAATACGCCAAACTGCTCGTTGAAATCAGACTAATGTAGGATTGAAAGGTAGGGATTTCTTTAATCCAGGTTTCGTCAATTGCTGCGTTGAAATCAGACTAATGTAGGATTGAAAGACATCTGAATCTTCGAATACCTCAATGCTGTATATGCTCGCTGTGCGTTGAAATCAGACTAATGTAGGATTGAAAGAGTACCCTTTCAAGCACAATCACGGCGATATCGTCTGCAGTTGAAATCAGACTAATGTAGGATTGAAAGCTTTGTACCTCAGCATAGCTCCCACAACACCGAGCAGCCCGGGTTGAAAT CAGACTAATGTAAGATTGAGAAACGTACTGTAGTGGAGTTTAGAAGGATC AAAAAATTCTCATCATTATTAACAAAAATATCGCTATTCTTAATAAGTAT TTGTTATCAAAAAATCAGCCAATGCCCAAAAGGGTCTCAACAGGAATTCC CGGATTCGATGAACTTTGCGGGGGTGGGCTGCCGCAAGGAGGTACGTATC TTGTTGTAGGAGCTGCAGAATCTGGAAAAACTGTTTTTTCTATGCAGTAT CTGGTAAATGGGGCGAGGATGTTTGGGGAAGCAGGAATATTCATCACC


    GCGCTATTCTTTTACTTCTCGCAATGCGGTAATCAATG
    GTTGAAATCAGACTAATGTAGGATTGAAAGAAGCGCTTGGCTGCTTACTGCGTGGCTCAGTACTGCGC
    GTTGAAATCAGACTAATGTAGGATTGAAAGGAGGTGGTTAGAGACATCTTGAATAAAATGATGGGCAA
    GTTGAAATCAGACTAATGTAGGATTGAAAGCTTATTGACTTCTTCAATTCTCCCTTTTAGATATAGAT
    GTTGAAATCAGACTAATGTAGGATTGAAAGGCATAACAGAGTGCGCCCTCCGTGCCTCCCGTTCCGT
    GTTGAAATCAGACTAATGTAGGATTGAAAGATGGGGGAACTGGCTCGAACTCGCTGATTTCTTCGAAGA
    GTTGAAATCAGACTAATGTAGGATTGAAAGCAAATTAGCAAGGGCTACAACGATACCATCATACGGA
    GTTGAAATCAGACTAATGTAGGATTGAAAGATCAACATCGGTCCGGGACAATATCAGTTCAAGGCGAG
    GTTGAAATCAGACTAATGTAGGATTGAAAGCTATGAGTACTGTAAAAGGAATTTAAATTTAAAATAGG
    GTTGAAATCAGACTAATGTAGGATTGAAAGTTCTAATGCCGTGAAGTATGCTCTTTTTGCTTCGGGAG
    GTTGAAATCAGACTAATGTAGGATTGAAAGGATTGTGGCGGTGATTTGGTGGACTTAGGCAACAATGA
    GTTGAAATCAGACTAATGTAGGATTGAAAGTACTTCAGACAATACAATATCCCAATCGAACAACACA
    GTTGAAATCAGACTAATGTAGGATTGAAAGACGGTGAACGTTGATCATCCTTTGATTGTCTTGCTGGTT
    GTTGAAATCAGACTAATGTAGGATTGAAAGTGTCCATCAAAAGGCCTAACGGAGGATGAGGTTCAAGA
    GTTGAAATCAGACTAATGTAGGATTGAAAGTTCCCACTTGATGATACCCTGACCTTCATTCTTTTCGA
    GTTGAAATCAGACTAATGTAGGATTGAAAGTTGTGGTATAATTAATAGTTCTCATCTGCCATTTTTTAT
    GTTGAAATCAGACTAATGTAGGATTGAAAGGCTCGCAAGGAAATGCTGAACGGAAATAGGCACACC
    GTTGAAATCAGACTAATGTAGGATTGAAAGACTCTTTCACCTCCTAACAGCTTTTTTCTATATACTGTCACTG
    GTTGAAATCAGACTAATGTAGGATTGAAAGTCTATACCACCTCAAAAAATGTTTAAGGAAGCAGAATCT
    GTTGAAATCAGACTAATGTAGGATTGAAAGTTTGGGGTAACTGGCTTGAGCTTGCAGACTTCTTAGAG
    GTTGAAATCAGACTAATGTAGGATTGAAAGGCGAGCATGAACGGGAACTGCACGAATCCGCGGGCTCCC
    GTTGAAATCAGACTAATGTAGGATTGAAAGATAGCTGCAAAAATCTTTTGCATCCCTGCAAGTCCCCTT
    GTTGAAATCAGACTAATGTAGGATTGAAAGGCAAATAATCTATAAACTGTCTCGCTAAGTCGAAAGC
    GTTGAAATCAGACTAATGTAGGATTGAAAGCTCGAGTTCCCTGCTCATCCTGCTATATCTCTCAAAAAT
    GTTGAAATCAGACTAATGTAGGATTGAAAGAGGATTACAGCCGGAATCCTCAGCCAGCCGACGAAGTAGA
    GTTGAAATCAGACTAATGTAGGATTGAAAGAGATTGCCCCCCTCAGCATGGCGGGGGACATTCGCGA
    GTTGAAATCAGACTAATGTAGGATTGAAAGTATTGTCAACCTTTTAATTCGCTCTTTCAATGCCCTAAA
    GTTGAAATCAGACTAATGTAGGATTGAAAGTAGAGGAATGGGAAATTCTGGGGAAGGATGACGAAAAA
    GTTGAAATCAGACTAATGTAGGATTGAAAGCTAAGAAGGCTTATAGTTGAGCACAGACACTTGGATTTA
    GTTGAAATCAGACTAATGTAGGATTGAAAGTAGGGGGGAGGTAGATGAGGATTGAAAGAGGGAATCTCA
    GTTGAAATCAGACTAATGTAGGATTGAAAGGAGAAGAGCTGACGTTCACTGAAGAGGACGGAAACAAC
    GTTGAAATCAGACTAATGTAGGATTGAAAGCTGGGGTAACTTTCCGAAGTTCAAATGCAGGCATGAACT
    GTTGAAATCAGACTAATGTAGGATTGAAAGGATTCTAAGTTAAAATAGATGGGTTGAATAAAAAAAG
    GTTGAAATCAGACTAATGTAGGATTGAAAGGGACATGGGGCAATAAGCAGCCATTCAGCCCAGAGCCATCT
    GTTGAAATCAGACTAATGTAGGATTGAAAGATAGAGTTCTATGTTGTAGTATGTTCCTGCAGAGAGATC
    GTTGAAATCAGACTAATGTAGGATTGAAAGCATGCTCTTGCTGAGCCATAGCCAAGCCCATGAAGCGG
    GTTGAAATCAGACTAATGTAGGATTGAAAGACGTAGATTTGACATTCTCAGCGGGCGCAACGAGGCAG
    GTTGAAATCAGACTAATGTAGGATTGAAAGTTACATATATCTACAAAAGCTGCACAATCTGCAGCATC
    GTTGAAATCAGACTAATGTAGGATTGAAAGAATAAGAGCGGGCGAGTATGGCGACATGAGCGAGGCAA
    GTTGAAATCAGACTAATGTAGGATTGAAAGCTTCGAATGGGATAAGGGCTTAGAGGATTTGGCCGAATGGGA
    GTTGAAATCAGACTAATGTAGGATTGAAAGCATGCTGAAAACTACATTGTCTTCAAGGGCGTTTCTTAT
    GTTGAAATCAGACTAATGTAGGATTGAAAGAATCCTGTGACAGTGCAACGTATTACTTTACTATTCAT
    GTTGAAATCAGACTAATGTAGGATTGAAAGGACTCTGTGTGTGAGGTGGTGGTAGAGATGGAGCAAAA
    GTTGAAATCAGACTAATGTAGGATTGAAAGTTTTGAGAAAATAAGACGTATAAACCTTTGTAACAATG
    GTTGAAATCAGACTAATGTAGGATTGAAAGAAATTCCTCGGACGGGATAAGTTCAGCAAAGTTATCGC
    GTTGAAATCAGACTAATGTAGGATTGAAAGGTTTGATGTGCGAATGTGGATACTCGAAGAGGTATGTT
    GTTGAAATCAGACTAATGTAGGATTGAAAGTTGAACCATATCTCGTTCTGAGGGAAACTCATCTGAAC
    GTTGAAATCAGACTAATGTAGGATTGAAAGGTTCCAGCCTGTGATAATAAAAAGCACCACAGCACCGA
    GTTGAAATCAGACTAATGTAGGATTGAAAGACATAAATCATTATCCTGTCGAAGTTTCAGATTATTTG
    GTTGAAATCAGACTAATGTAGGATTGAAAGGATATACCATCAACTGAATTACCTGAATTGCCATTAACA
    GTTGAAATCAGACTAATGTAGGATTGAAAGGTTGCGTATGCGCCCAGATATGGTACGAACTGTTCAAAA
    GTTGAAATCAGACTAATGTAGGATTGAAAGCGCTGGGACGATAATCAGACTTGTAAACGCTGATGCTAC
    GTTGAAATCAGACTAATGTAGGATTGAAAGCGCTGATACAGCTTCAGCGGAGTCATACGCTTAATCCT
    GTTGAAATCAGACTAATGTAGGATTGAAAGTTCTATTTCTCGTCCGATATCCCTACTGTAGTGTCTTA
    GTTGAAATCAGACTAATGTAGGATTGAAAGAATTTATTGGGCCGATGGCAACACGCCCAGCTGAGCTT
    GTTGAAATCAGACTAATGTAGGATTGAAAGTTTTGTCCCGCTGAGCACGTGCACGCTGCGAATGATTG
    GTTGAAATCAGACTAATGTAGGATTGAAAGTTCATTCTCTCTTTCAATCTCAGCAATCTTACTCATTA
    GTTGAAATCAGACTAATGTAGGATTGAAAGTACTTGCGGAGCTACTACGACAAATACGCCAAACTGCTC
    GTTGAAATCAGACTAATGTAGGATTGAAAGGTAGGGATTTCTTTAATCCAGGTTTCGTCAATTGCTGC
    GTTGAAATCAGACTAATGTAGGATTGAAAGACATCTGAATCTTCGAATACCTCAATGCTGTATATGCTCGCTGTGC
    GTTGAAATCAGACTAATGTAGGATTGAAAGAGTACCCTTTCAAGCACAATCACGGCGATATCGTCTGCA
    GTTGAAATCAGACTAATGTAGGATTGAAAGCTTTGTACCTCAGCATAGCTCCCACAACACCGAGCAGCCCGG
    GTTGAAATCAGACTAATGTAAGATTGAGAAACGTACTGTAGTGGAGTTTAGAAGGATCAAAAAATTCTCATCATTATTAA CAAAAATATCGCTATTCTTAATAAGTATTTGTTATCAAAAAATCAGCCAA TGCCCAAAAGGGTCTCAACAGGAATTCCCGGATTCGATGAACTTTGCGGG GGTGGGCTGCCGCAAGGAGGTACGTATCTTGTTGTAGGAGCTGCAGAATC TGGAAAAACTGTTTTTTCTATGCAGTATCTGGTAAATGGGGCGAGGATGT TTGGGGAAGCAGGAATATTCATCACC

    These findings have been confirmed here - http://crispr.u-psud.fr/crispr/crisp...5D=NC_015320_3

    There is another area in this Archaea where an almost identical pattern occurs. The word differs by only one letter, and the spacing between words is about 68 bases.
    The word occurs 20 times.

    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 83682 - difference = 83682
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 83749 - difference = 67
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 83824 - difference = 75
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 83892 - difference = 68
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 83960 - difference = 68
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84027 - difference = 67
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84094 - difference = 67
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84161 - difference = 67
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84229 - difference = 68
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84295 - difference = 66
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84363 - difference = 68
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84429 - difference = 66
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84497 - difference = 68
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84565 - difference = 68
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84631 - difference = 66
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84699 - difference = 68
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84768 - difference = 69
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84836 - difference = 68
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84903 - difference = 67
    GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84971 - difference = 68
    Last edited by Craig.Paardekooper; 07-23-2012 at 04:44 PM.

  6. #246
    Join Date
    Jul 2008
    Location
    London UK
    Posts
    663

    Pattern found in Elusimicrobium Minutum Archaea

    Here is a pattern that I found in Elusimicrobium Minutum Archaea.

    The longest repeating sequence is 36 bases long (6 x 6)
    Each of the repeats is separated from the previous one by 66 bases
    There are 13 repeats altogether

    There are 12 spacers between these 13 repeats.

    Total length of the 12 spacers = 13 x 30 - 1
    Total length of the 13 repeats = 13 x 6 x 6
    Total length of whole sequence = 13 x 66 - 1

    ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC


    Total = 13
    ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266139 - difference = 266139
    ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266204 - difference = 65
    ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266270 - difference = 66
    ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266336 - difference = 66
    ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266402 - difference = 66
    ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266468 - difference = 66
    ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266534 - difference = 66
    ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266600 - difference = 66
    ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266666 - difference = 66
    ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266732 - difference = 66
    ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266798 - difference = 66
    ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266930 - difference = 132
    ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266996 - difference = 66


    You might compare this to Termite Bacterium. Here the sequence length is 36 or 6 x 6, and the sequence repeats 21 times, with 66 bases between each repetition.

    Total length of the 20 spacers = 3 x 7 x 30
    Total length of the 21 repeats = 3 x 7 x 6 x 6
    Total length of whole sequence = 3 x 7 x 66


    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 21


    Total = 21
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 343558 - difference = 343558
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 343624 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 343755 - difference = 131
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 343821 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 343888 - difference = 67
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 343954 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344020 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344086 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344152 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344218 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344284 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344350 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344416 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344482 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344548 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344614 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344680 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344746 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344812 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344878 - difference = 66
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344944 - difference = 66

    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATATAATAGCGATAGCAGTGATATTCTTGATG
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCAATATTTTCAAGCCCAGTGCTGTCTCCAAGTTTAGTTTCCTTCCTCTCT CAGATGTGCTATAATATTACCATTAGCATCTAATCCAAATGATTT
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATACCGTATGCTTCTATAGTGTTTAATCTACA
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATTGTATTCAGCTTCAATGGCGATTTTTGGTTC
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATTTACGAACCGATTAAACAAACATGGGACGC
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCCAGAGATACGACGAACGTCACGGATTGAA
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCATTATATCAGCGATTGAGAGCATAAAACC
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATTTGATTCGCTGAGAGCCTTTACGGCCTGTG
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCAAAGTAAATACCATACGGGAACCACGTAG
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATGTACTAAAGGGTGCTACTACAGTAAAGCCC
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCAATTGTACGGAATATATTAACTTTCTTAC
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATGTAAGGTGCATAGCGTACTCCGGTAGCTGG
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATTGGATATCATATTACCTGCATACGCTGATA
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATTAACCAAAGCAGGCATATCGTATACATTGG
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATACATTGTTAAAAAATATAGAGAGAATTACG
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCAGGGATTATTACGTCCTCGCCTGCTTTTA
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATGCAAAGTAACTTAATCTAAACATTTTTACA
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATATGACGAATATAAAACTATGGCTGATATAG
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATGAATTCAAGTACGACGATATCAGAGATGGT
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCATGCAAAGCACTTTTATCACACTTACGTA
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT

    Termite Bacterium also displays longer sequences -

    Here the sequence is 2 x 19 bases long, and repeats 6 times
    The total length of the 5 spacers = 19 x 66

    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCA - 0 - 343624 - difference = 343624
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCA - 0 - 344020 - difference = 396
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCA - 0 - 344152 - difference = 132
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCA - 0 - 344284 - difference = 132
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCA - 0 - 344614 - difference = 330
    GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCA - 0 - 344878 - difference = 264

    Once again the interludes are multiples of 66.

    And here we have a sequence of 2 x 20 bases, occurring 2 x 2 times, and each one separated by a multiple of 22

    GTTTGGTTTTTATGTGTTAGTAGTTTGGTTTTTATGTGTT - 0 - 3559 - difference = 3559
    GTTTGGTTTTTATGTGTTAGTAGTTTGGTTTTTATGTGTT - 0 - 3581 - difference = 22
    GTTTGGTTTTTATGTGTTAGTAGTTTGGTTTTTATGTGTT - 0 - 3625 - difference = 22 x 2
    GTTTGGTTTTTATGTGTTAGTAGTTTGGTTTTTATGTGTT - 0 - 3647 - difference = 22

    And here we have a sequence of 43 bases

    GTAGTTTGGTTTTTATGTGTTAGTAGTTTGGTTTTTATGTGTT - 0 - 3556 - difference = 3556
    GTAGTTTGGTTTTTATGTGTTAGTAGTTTGGTTTTTATGTGTT - 0 - 3578 - difference = 22
    GTAGTTTGGTTTTTATGTGTTAGTAGTTTGGTTTTTATGTGTT - 0 - 3644 - difference = 66

    Repeats can be thought of as cycles when the repeat occurs at regular intervals. A cycle of 66 reminds me of Richards Biblewheel with 66 books divided into 3 cycles of 22 books each. In the Termite Bacterium we have 66 bases in each cycle = 22 codons.

    It is interesting to find cycles within DNA, since they probably have a nested structure, with super-cycles encompassing smaller ones. Perhaps a grand scheme will emerge. Vernon found that when the values of the words in Genesis 1 v 1 are laid out as blocks of height 37 units, then each word turns out to be a multiple of 37 - a multiple of 6.

    In the same way, if repeats define the cyclic nature of DNA, then laying out DNA in accordance with the cycle length may reveal some striking patterns. One way to detect this would be to create a program that automatically high-lights all repeats within a sequence. This would have to be done using a rich-textbox that allows font manipulation. Once the repeats are highlighted, then a slider could adjust the width (in base numbers) of the DNA display to see when repeats aligned or were in synchrony.

    It would be interesting to see what part prime numbers play in these repeats. I understand that there are some ready-made online programs that I can use to help with this task - BLAST is one of them. I will read up on them and post a link to the resources.

    What lies between the repeats is also of interest. It is here that I would expect smaller cycles to exist.

    Approaching the topic on a statistical level, it would be interesting to find out which numbers stand out as the most common lengths of repeats, and most common lengths between repeats. The different lengths could be displayed on a graph - with the highest peaks indicating the most common.
    Last edited by Craig.Paardekooper; 08-02-2012 at 04:29 AM.

  7. #247
    Join Date
    Jul 2008
    Location
    London UK
    Posts
    663

    Pattern Found in Yeast

    I think that this is an interesting repeating pattern - found in common yeast. The word length is 20 bases and the word occurs 8 times - every time separated by a multiple of 21 bases

    GACCACTCGATTCGCGCGCA - 0 - 129413 - difference = 129413
    GACCACTCGATTCGCGCGCA - 0 - 129455 - difference = 42
    GACCACTCGATTCGCGCGCA - 0 - 129476 - difference = 21
    GACCACTCGATTCGCGCGCA - 0 - 129497 - difference = 21
    GACCACTCGATTCGCGCGCA - 0 - 129518 - difference = 21
    GACCACTCGATTCGCGCGCA - 0 - 129539 - difference = 21
    GACCACTCGATTCGCGCGCA - 0 - 129623 - difference = 84
    GACCACTCGATTCGCGCGCA - 0 - 129644 - difference = 21

    This sequence also occurs

    TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129402 - difference = 129402
    TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129465 - difference = 63
    TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129486 - difference = 21
    TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129507 - difference = 21
    TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129612 - difference = 105
    TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129633 - difference = 21

    Each sequence is also separated by a multiple of 21 bases

    This sequence also occurs

    TTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCG CA - 0 - 129465 - difference = 129465
    TTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCG CA - 0 - 129486 - difference = 21
    TTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCG CA - 0 - 129612 - difference = 126

    Once again each sequence is separated by a multiple of 21
    Last edited by Craig.Paardekooper; 08-02-2012 at 04:05 AM.

  8. #248
    Join Date
    Jun 2007
    Location
    Yakima, Wa
    Posts
    14,703
    Quote Originally Posted by Craig.Paardekooper View Post
    I think that this is an interesting repeating pattern - found in common yeast. The word length is 20 bases and the word occurs 8 times - every time separated by a multiple of 21 bases

    GACCACTCGATTCGCGCGCA - 0 - 129413 - difference = 129413
    GACCACTCGATTCGCGCGCA - 0 - 129455 - difference = 42
    GACCACTCGATTCGCGCGCA - 0 - 129476 - difference = 21
    GACCACTCGATTCGCGCGCA - 0 - 129497 - difference = 21
    GACCACTCGATTCGCGCGCA - 0 - 129518 - difference = 21
    GACCACTCGATTCGCGCGCA - 0 - 129539 - difference = 21
    GACCACTCGATTCGCGCGCA - 0 - 129623 - difference = 84
    GACCACTCGATTCGCGCGCA - 0 - 129644 - difference = 21

    This sequence also occurs

    TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129402 - difference = 129402
    TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129465 - difference = 63
    TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129486 - difference = 21
    TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129507 - difference = 21
    TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129612 - difference = 105
    TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129633 - difference = 21

    Each sequence is also separated by a multiple of 21 bases

    This sequence also occurs

    TTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCG CA - 0 - 129465 - difference = 129465
    TTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCG CA - 0 - 129486 - difference = 21
    TTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCG CA - 0 - 129612 - difference = 126

    Once again each sequence is separated by a multiple of 21
    That is very curious. Any idea what it might mean? Is there anything in common amongst the sequences of length 21n that divide the repeated sequences?
    • Skepticism is the antiseptic of the mind.
    • Remember why we debate. We have nothing to lose but the errors we hold. Who but a stubborn fool would hold to errors once they have been exposed?

    Check out my blog site

  9. #249
    Join Date
    Jul 2008
    Location
    London UK
    Posts
    663

    Reply to Richard

    Here is the full sequence, so you can see what lies between each sequence -

    GACCACTCGATTCGCGCGCAGGACCACTCGGTTCGCGCGCAAGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCGCAAGACCACTCGATTCGCGCGCAAGACCACCTGATTCGCGCGCAGGACCACCTGATTCGCGCGCAGGACCATCCGGTTCGCGCGCAGGACCACTCGATTCGCGCGCAG

  10. #250
    Join Date
    Jul 2008
    Location
    London UK
    Posts
    663

    Pattern in Thermococcus Archaea

    Here are the results for Thermococcus Archaea

    The sequence has 29 letters and repeats itself 39 times
    The distance between the beginning of each sequence is approx 66 letters.


    Total = 39
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100515 - difference = 100515
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100582 - difference = 67
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100648 - difference = 66
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100716 - difference = 68
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100782 - difference = 66
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100850 - difference = 68
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100917 - difference = 67
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100982 - difference = 65
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101049 - difference = 67
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101113 - difference = 64
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101180 - difference = 67
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101245 - difference = 65
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101312 - difference = 67
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101380 - difference = 68
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101447 - difference = 67
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101513 - difference = 66
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101578 - difference = 65
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101646 - difference = 68
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101713 - difference = 67
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101781 - difference = 68
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101849 - difference = 68
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101915 - difference = 66
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101980 - difference = 65
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102047 - difference = 67
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102114 - difference = 67
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102182 - difference = 68
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102248 - difference = 66
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102314 - difference = 66
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102381 - difference = 67
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102444 - difference = 63
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102512 - difference = 68
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102578 - difference = 66
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102648 - difference = 70
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102715 - difference = 67
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102782 - difference = 67
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102849 - difference = 67
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102917 - difference = 68
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102984 - difference = 67
    TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 103050 - difference = 66

    Another sequence is 30 letters long, and repeats 37 times. The space between the beginning of each sequence varies between 66 - 69 letters

    Total = 37
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347446 - difference = 347446
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347510 - difference = 64
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347579 - difference = 69
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347648 - difference = 69
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347716 - difference = 68
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347783 - difference = 67
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347849 - difference = 66
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347915 - difference = 66
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347983 - difference = 68
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348050 - difference = 67
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348117 - difference = 67
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348185 - difference = 68
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348252 - difference = 67
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348319 - difference = 67
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348387 - difference = 68
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348454 - difference = 67
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348523 - difference = 69
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348589 - difference = 66
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348657 - difference = 68
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348724 - difference = 67
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348791 - difference = 67
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348857 - difference = 66
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348923 - difference = 66
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348989 - difference = 66
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349057 - difference = 68
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349125 - difference = 68
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349192 - difference = 67
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349259 - difference = 67
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349326 - difference = 67
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349393 - difference = 67
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349461 - difference = 68
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349528 - difference = 67
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349596 - difference = 68
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349662 - difference = 66
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349730 - difference = 68
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349800 - difference = 70
    GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349867 - difference = 67

    Here is a 16 letter sequence = 4 x 4 that repeats 14 times
    The distance between the beginning of one sequence and the next is 16 = 4 x 4

    Total = 11
    TAAGGAGGTGATATAG - 0 - 909684 - difference = 909684
    TAAGGAGGTGATATAG - 0 - 909700 - difference = 16
    TAAGGAGGTGATATAG - 0 - 909716 - difference = 16
    TAAGGAGGTGATATAG - 0 - 909732 - difference = 16
    TAAGGAGGTGATATAG - 0 - 909748 - difference = 16
    TAAGGAGGTGATATAG - 0 - 909764 - difference = 16
    TAAGGAGGTGATATAG - 0 - 909780 - difference = 16
    TAAGGAGGTGATATAG - 0 - 909796 - difference = 16
    TAAGGAGGTGATATAG - 0 - 909812 - difference = 16
    TAAGGAGGTGATATAG - 0 - 909828 - difference = 16
    TAAGGAGGTGATATAG - 0 - 909844 - difference = 16
    TAAGGAGGTGATATAG - 0 - 909860 - difference = 16
    TAAGGAGGTGATATAG - 0 - 909876 - difference = 16
    TAAGGAGGTGATATAG - 0 - 909892 - difference = 16


    If we take off the last letter, then it becomes a 64 letter sequence = 4 x 4 x 4
    Each sequence itself consists of 4 distinct parts TAAGGAGGTGATATAG each of 4 x 4
    And each sequence begins 4 x 4 after the beginning of the previous one.

    Here is the actual DNA sequence so you can see what is going on -

    TAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAG
    Last edited by Craig.Paardekooper; 07-25-2012 at 08:46 AM.

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may post replies
  • You may not post attachments
  • You may edit your posts
  •