# Thread: Has God's Signature been found in the Genetic Code

## Searching for Individual words

Here is a procedure that I am going to develop, for identifying possible WORDS within DNA

1. Get a sample of mRNA or cDNA ideally because these have a simple START STOP pattern
2. Identify and extract the non-coding regions
3. Convert non-coding DNA into aminoacids
4. Use a loop to extract all 5 letter sequences where all letters are different
5. Count frequency of each 5 letter sequence

If a 5 letter sequence occurs with high frequency then it may be a WORD, such as "Elohim"

The Results

I took the Mitochondrial DNA for Jews which is 16566 bases long, and measured the frequency of every possible 5-letter sequence. Here are the results -
1SSSO 2 Position = 126
EKPSL 2 Position = 631
FIAPT 2 Position = 744
FYSKD 2 Position = 877
HPSPH 2 Position = 1173
HRTIP 2 Position = 1213
ILVQL 2 Position = 1466
ISTIN 2 Position = 1586
ITLLT 2 Position = 1613
KLKIK 2 Position = 1746
LGLLT 2 Position = 2005
LITIL 2 Position = 2078
LITTQ 2 Position = 2081
LLGLL 2 Position = 2112
LLPHS 2 Position = 2147
LNYNI 2 Position = 2216
LTSTS 2 Position = 2441
NNYIT 2 Position = 2701
NPLVN 2 Position = 2721
NTNYL 2 Position = 2830
OKNPP 2 Position = 2908
PHSSP 2 Position = 3091
PLVNL 2 Position = 3217
PNTNY 2 Position = 3262
PSSTP 2 Position = 3456
PTPLI 2 Position = 3493
RILVQ 2 Position = 3833
SNLNY 2 Position = 4280
SSSTP 2 Position = 4457
SSTPP 2 Position = 4463
TILIL 2 Position = 4700
TPSOP 2 Position = 4888
TTQLS 2 Position = 5029

There are no 5 letter sequences occurs more than 2 times, which is a bit surprising considering that the DNA is 16566 bases long.

Here are the results for 6 letter sequences
NPLVNL 2 Position = 2720
PNTNYL 2 Position = 3261
PSSTPP 2 Position = 3455
RILVQL 2 Position = 3832

There are no 7, 8 or 9 letter sequences that occur more than once

So
IF the aminoacids do map onto the letters of an alphabet,
and IF Mitochondrial DNA contains language areas, THEN the language has a distinct absence of 7, 8 and 9 letter words - which is not characteristic of Hebrew.

Then I split up the DNA into coding and noncoding areas based on the start and stop codons.

The Coding areas of the DNA contained the following 5 letter sequences that occurred more than once -

FYSKD 2 Position = 423
HRTIP 2 Position = 596
ILVQL 2 Position = 733
LGLLT 2 Position = 1021
NNYIT 2 Position = 1434
NPLVN 2 Position = 1447
NTNYL 2 Position = 1503
PLVNL 2 Position = 1706
PNTNY 2 Position = 1732
PTPLI 2 Position = 1850
RILVQ 2 Position = 2015
SSSTP 2 Position = 2369
TPSOP 2 Position = 2597

The noncoding areas contained the following 5 letter sequences occuring more than once -

HPSPH 2 Position = 590
LLGLL 2 Position = 1032
LLPHS 2 Position = 1052
PHSSP 2 Position = 1450
TTQLS 2 Position = 2342

So, in the noncoding areas only 5 "5-letter-words" appear more than once. I do not think that there is a language here.

Repeating the Procedure with Yeast

Yeast has 227020 bases. I divided up the yeast DNA into non-coding and coding areas based on the Start and Stop codons. Then I extracted every 5 letter sequence and recorded the frequency of each one. Here are the results -

Coding Area: The following 5 letter sequences occur more than 5 times -
ARRTT 8 Position = 3028
FARRT 10 Position = 8078
LLLLL 12 Position = 22355
RFARR 7 Position = 33601
RTTRF 7 Position = 35949
SLLLL 6 Position = 38301
TRFAR 9 Position = 42503
TTRFA 9 Position = 43029

Non Coding Areas: There are no 5 letter seqences occuring more than 4 times.
CSUCS 3 Position = 2717
FPLLL 3 Position = 5147
LFLLL 3 Position = 10771
LLFLL 3 Position = 11408
LLLFL 3 Position = 11519
LLLLL 4 Position = 11545
LLLLQ 3 Position = 11551
LLLSL 3 Position = 11575
LRLLF 3 Position = 12264
QLLLL 4 Position = 15724
STSFL 3 Position = 20208
SUCSU 3 Position = 20273
UCSUC 3 Position = 21943

So the noncoding areas of yeast have fewer "5-letter-words" than the coding areas!

I am going to check over the computer code that I used to get these results, to make sure it is working properly, then I will make the software available as a download. The software will simply enable you to count the frequency of each "word" of a chosen length in any sample of DNA. If the DNA contains a language, then the "word" frequencies should indicate this.
Last edited by Craig.Paardekooper; 07-21-2012 at 04:20 AM.

## A natural sequence of Aminoacids

Rather than trying to discover an aminoacid alphabet by analysig DNA, it would be much easier to see if the aminoacids fall naturally into a sequence. We can arrange the amino acids according to -

1. their mass
2. the order inwhich their codons appear when you cycle through the genetic code.

Shcherbak discovered that if all the aminoacids are arranged in a circle, then there is a perfect balance of masses - see here - www.craigdemo.co.uk/circleoflife.pdf. He thought that it was very odd that all the aminoacids should sum in this way - almost as if they were conceived in one go. Shcherbak's pattern may hold the key to finding the aminoacid alphabet.

What is interesting is that all the aminoacids are together and form this perfect balance. The order of the aminoacids in the circle is determined solely by cycling through the bases, in the order T C A G.

Cycling through the bases in the order TCAG generates the 64 codons in a particular order - consequently it generates the aminoacids in a particular order - an aminoacid alphabet

There are only 4 x 3 x 2 x 1 ways of choosing the order in which you cycle through the bases, so there can be only 24 different possible orders of the codons produced - which would give us 24 different aminoacid alphabets.

It is then quite simple to test each alphabet to see which one generates the most meaningful translation.

So I will create a program that cycles through the bases in each of the 24 possible ways, each time generating the codons in a particular order. This will give me the 20 aminoacids in a particular order each time. So I will end up with 24 aminoacid alphabets - and can then test each one to see if it produces meaningful words.

## Finding words in the Nucleotide Sequence

The the scientists who carried out the experiments to show that there is a language structure in DNA, did not convert the DNA into aminoacids first - rather they simply searched for words in the raw DNA sequence, consisting of A,C,T and G.

So I will do this too. If the letters of the language are codons rather than individual nucleotides, then we should find that high frequency words reflect this by being a multiple of three nucleotides long.

Results

Here are the results 5-letter sequences in Non coding areas of Jewish Mitochondrial DNA

Total number of 5 letter words = 7703

Total Number of Different Words = 881

Average Frequency of Words = 8.74347332576617

Here are the results for 5-letter sequences in the coding areas of Jewish Mitochondrial DNA

Total number of 5 letter words = 7706

Total Number of Different Words = 896

Average Frequency of Words = 8.60044642857143

As you can see, there are a similar number of unique words appearing in both coding and noncoding areas, which might suggest that non-coding areas contain just as much information as coding areas.

What might be interesting would be to see what words are unique to coding areas but not found in noncoding areas, and vica versa?

Also, what would be the results for sequences of length 4 to 12 bases long?

Also what are the most frequently occurring words. Perez found that the frequency of all three letter words follows a mathematical pattern. Perhaps similar patterns will be found for larger sequences?
Last edited by Craig.Paardekooper; 07-21-2012 at 01:57 PM.

## Update on Search for Alphabet sequence in XChromosome

So far I have searched 72 million bases of the X Chromosome (which is about half) in search of a sequence of all 20 standard aminoacids . I have analysed both overlapping codons and nonoverlapping codons in all three forward frames, and have not found a single instance yet of the complete sequence of 20 aminoacids. the search continues

## Odd Numerical Patterns Found in Archaea Microbes - the oldest organisms on earth

There is a 30 letter sequence that occurs 68 times in the DNA of Archaea

Here is the sequence - GTTGAAATCAGACTAATGTAGGATTGAAAG

The number on the right is the position of the start of the sequence in the entire genome

Total = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 85039
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665146
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665215
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665283
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665351
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665418
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665488
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665556
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665624
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665692
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665760
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665827
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665896
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665963
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666031
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666099
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666167
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666235
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666302
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666371
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666439
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666507
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666576
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666642
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666715
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666784
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666852
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666921
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666990
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667057
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667126
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667196
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667263
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667332
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667400
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667469
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667538
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667606
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667675
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667742
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667813
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667882
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667950
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668018
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668086
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668154
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668226
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668295
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668363
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668431
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668499
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668567
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668635
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668703
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668771
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668839
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668908
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668977
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669046
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669114
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669182
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669250
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669318
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669386
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669455
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669523
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669599
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669668

When I measured the difference between the beginning of one sequence and the beginning of the next - this is what I got -

GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 85039 - difference = 85039
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665146 - difference = 580107
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665215 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665283 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665351 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665418 - difference = 67
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665488 - difference = 70
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665556 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665624 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665692 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665760 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665827 - difference = 67
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665896 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 665963 - difference = 67
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666031 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666099 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666167 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666235 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666302 - difference = 67
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666371 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666439 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666507 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666576 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666642 - difference = 66
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666715 - difference = 73
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666784 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666852 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666921 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 666990 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667057 - difference = 67
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667126 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667196 - difference = 70
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667263 - difference = 67
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667332 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667400 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667469 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667538 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667606 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667675 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667742 - difference = 67
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667813 - difference = 71
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667882 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 667950 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668018 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668086 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668154 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668226 - difference = 72
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668295 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668363 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668431 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668499 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668567 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668635 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668703 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668771 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668839 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668908 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 668977 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669046 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669114 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669182 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669250 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669318 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669386 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669455 - difference = 69
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669523 - difference = 68
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669599 - difference = 76
GTTGAAATCAGACTAATGTAGGATTGAAAG - 0 - 669668 - difference = 69

So this 30 letter sequence repeats itself 68 times, and each time is separated by 68 bases !

GCGCTATTCTTTTACTTCTCGCAATGCGGTAATCAATGGTTGAAATCAGACTAATGTAGGATTGAAAGAAGCGCTTGGCTGCTTACTGCGTGGCTCAGTACTGCGCGTTGAAATCAGACTAATGTAGGATTGAAAGGAGGTGGTTAGAGACATCTTGAATAAAATGATGGGCAAGTTGAAATCAGACTAATGTAGGATTGAAAGCTTATTGACTTCTTCAATTCTCCCTTTTAGATATAGATGTTGAAATCAGACTAATGTAGGATTGAAAGGCATAACAGAGTGCGCCCTCCGTGCCTCCCGTTCCGTGTTGAAATCAGACTAATGTAGGATTGAAAGATGGGGGAACTGGCTCGAACTCGCTGATTTCTTCGAAGAGTTGAAATCAGACTAATGTAGGATTGAAAGCAAATTAGCAAGGGCTACAACGATACCATCATACGGAGTTGAAATCAGACTAATGTAGGATTGAAAGATCAACATCGGTCCGGGACAATATCAGTTCAAGGCGAGGTTGAAATCAGACTAATGTAGGATTGAAAGCTATGAGTACTGTAAAAGGAATTTAAATTTAAAATAGGGTTGAAATCAGACTAATGTAGGATTGAAAGTTCTAATGCCGTGAAGTATGCTCTTTTTGCTTCGGGAGGTTGAAATCAGACTAATGTAGGATTGAAAGGATTGTGGCGGTGATTTGGTGGACTTAGGCAACAATGAGTTGAAATCAGACTAATGTAGGATTGAAAGTACTTCAGACAATACAATATCCCAATCGAACAACACAGTTGAAATCAGACTAATGTAGGATTGAAAGACGGTGAACGTTGATCATCCTTTGATTGTCTTGCTGGTTGTTGAAATCAGACTAATGTAGGATTGAAAGTGTCCATCAAAAGGCCTAACGGAGGATGAGGTTCAAGAGTTGAAATCAGACTAATGTAGGATTGAAAGTTCCCACTTGATGATACCCTGACCTTCATTCTTTTCGAGTTGAAATCAGACTAATGTAGGATTGAAAGTTGTGGTATAATTAATAGTTCTCATCTGCCATTTTTTATGTTGAAATCAGACTAATGTAGGATTGAAAGGCTCGCAAGGAAATGCTGAACGGAAATAGGCACACCGTTGAAATCAGACTAATGTAGGATTGAAAGACTCTTTCACCTCCTAACAGCTTTTTTCTATATACTGTCACTGGTTGAAATCAGACTAATGTAGGATTGAAAGTCTATACCACCTCAAAAAATGTTTAAGGAAGCAGAATCTGTTGAAATCAGACTAATGTAGGATTGAAAGTTTGGGGTAACTGGCTTGAGCTTGCAGACTTCTTAGAGGTTGAAATCAGACTAATGTAGGATTGAAAGGCGAGCATGAACGGGAACTGCACGAATCCGCGGGCTCCCGTTGAAATCAGACTAATGTAGGATTGAAAGATAGCTGCAAAAATCTTTTGCATCCCTGCAAGTCCCCTTGTTGAAATCAGACTAATGTAGGATTGAAAGGCAAATAATCTATAAACTGTCTCGCTAAGTCGAAAGCGTTGAAATCAGACTAATGTAGGATTGAAAGCTCGAGTTCCCTGCTCATCCTGCTATATCTCTCAAAAATGTTGAAATCAGACTAATGTAGGATTGAAAGAGGATTACAGCCGGAATCCTCAGCCAGCCGACGAAGTAGAGTTGAAATCAGACTAATGTAGGATTGAAAGAGATTGCCCCCCTCAGCATGGCGGGGGACATTCGCGAGTTGAAATCAGACTAATGTAGGATTGAAAGTATTGTCAACCTTTTAATTCGCTCTTTCAATGCCCTAAAGTTGAAATCAGACTAATGTAGGATTGAAAGTAGAGGAATGGGAAATTCTGGGGAAGGATGACGAAAAAGTTGAAATCAGACTAATGTAGGATTGAAAGCTAAGAAGGCTTATAGTTGAGCACAGACACTTGGATTTAGTTGAAATCAGACTAATGTAGGATTGAAAGTAGGGGGGAGGTAGATGAGGATTGAAAGAGGGAATCTCAGTTGAAATCAGACTAATGTAGGATTGAAAGGAGAAGAGCTGACGTTCACTGAAGAGGACGGAAACAACGTTGAAATCAGACTAATGTAGGATTGAAAGCTGGGGTAACTTTCCGAAGTTCAAATGCAGGCATGAACTGTTGAAATCAGACTAATGTAGGATTGAAAGGATTCTAAGTTAAAATAGATGGGTTGAATAAAAAAAGGTTGAAATCAGACTAATGTAGGATTGAAAGGGACATGGGGCAATAAGCAGCCATTCAGCCCAGAGCCATCTGTTGAAATCAGACTAATGTAGGATTGAAAGATAGAGTTCTATGTTGTAGTATGTTCCTGCAGAGAGATCGTTGAAATCAGACTAATGTAGGATTGAAAGCATGCTCTTGCTGAGCCATAGCCAAGCCCATGAAGCGGGTTGAAATCAGACTAATGTAGGATTGAAAGACGTAGATTTGACATTCTCAGCGGGCGCAACGAGGCAGGTTGAAATCAGACTAATGTAGGATTGAAAGTTACATATATCTACAAAAGCTGCACAATCTGCAGCATCGTTGAAATCAGACTAATGTAGGATTGAAAGAATAAGAGCGGGCGAGTATGGCGACATGAGCGAGGCAAGTTGAAATCAGACTAATGTAGGATTGAAAGCTTCGAATGGGATAAGGGCTTAGAGGATTTGGCCGAATGGGAGTTGAAATCAGACTAATGTAGGATTGAAAGCATGCTGAAAACTACATTGTCTTCAAGGGCGTTTCTTATGTTGAAATCAGACTAATGTAGGATTGAAAGAATCCTGTGACAGTGCAACGTATTACTTTACTATTCATGTTGAAATCAGACTAATGTAGGATTGAAAGGACTCTGTGTGTGAGGTGGTGGTAGAGATGGAGCAAAAGTTGAAATCAGACTAATGTAGGATTGAAAGTTTTGAGAAAATAAGACGTATAAACCTTTGTAACAATGGTTGAAATCAGACTAATGTAGGATTGAAAGAAATTCCTCGGACGGGATAAGTTCAGCAAAGTTATCGCGTTGAAATCAGACTAATGTAGGATTGAAAGGTTTGATGTGCGAATGTGGATACTCGAAGAGGTATGTTGTTGAAATCAGACTAATGTAGGATTGAAAGTTGAACCATATCTCGTTCTGAGGGAAACTCATCTGAACGTTGAAATCAGACTAATGTAGGATTGAAAGGTTCCAGCCTGTGATAATAAAAAGCACCACAGCACCGAGTTGAAATCAGACTAATGTAGGATTGAAAGACATAAATCATTATCCTGTCGAAGTTTCAGATTATTTGGTTGAAATCAGACTAATGTAGGATTGAAAGGATATACCATCAACTGAATTACCTGAATTGCCATTAACAGTTGAAATCAGACTAATGTAGGATTGAAAGGTTGCGTATGCGCCCAGATATGGTACGAACTGTTCAAAAGTTGAAATCAGACTAATGTAGGATTGAAAGCGCTGGGACGATAATCAGACTTGTAAACGCTGATGCTACGTTGAAATCAGACTAATGTAGGATTGAAAGCGCTGATACAGCTTCAGCGGAGTCATACGCTTAATCCTGTTGAAATCAGACTAATGTAGGATTGAAAGTTCTATTTCTCGTCCGATATCCCTACTGTAGTGTCTTAGTTGAAATCAGACTAATGTAGGATTGAAAGAATTTATTGGGCCGATGGCAACACGCCCAGCTGAGCTTGTTGAAATCAGACTAATGTAGGATTGAAAGTTTTGTCCCGCTGAGCACGTGCACGCTGCGAATGATTGGTTGAAATCAGACTAATGTAGGATTGAAAGTTCATTCTCTCTTTCAATCTCAGCAATCTTACTCATTAGTTGAAATCAGACTAATGTAGGATTGAAAGTACTTGCGGAGCTACTACGACAAATACGCCAAACTGCTCGTTGAAATCAGACTAATGTAGGATTGAAAGGTAGGGATTTCTTTAATCCAGGTTTCGTCAATTGCTGCGTTGAAATCAGACTAATGTAGGATTGAAAGACATCTGAATCTTCGAATACCTCAATGCTGTATATGCTCGCTGTGCGTTGAAATCAGACTAATGTAGGATTGAAAGAGTACCCTTTCAAGCACAATCACGGCGATATCGTCTGCAGTTGAAATCAGACTAATGTAGGATTGAAAGCTTTGTACCTCAGCATAGCTCCCACAACACCGAGCAGCCCGGGTTGAAAT CAGACTAATGTAAGATTGAGAAACGTACTGTAGTGGAGTTTAGAAGGATC AAAAAATTCTCATCATTATTAACAAAAATATCGCTATTCTTAATAAGTAT TTGTTATCAAAAAATCAGCCAATGCCCAAAAGGGTCTCAACAGGAATTCC CGGATTCGATGAACTTTGCGGGGGTGGGCTGCCGCAAGGAGGTACGTATC TTGTTGTAGGAGCTGCAGAATCTGGAAAAACTGTTTTTTCTATGCAGTAT CTGGTAAATGGGGCGAGGATGTTTGGGGAAGCAGGAATATTCATCACC

GCGCTATTCTTTTACTTCTCGCAATGCGGTAATCAATG
GTTGAAATCAGACTAATGTAGGATTGAAAGAAGCGCTTGGCTGCTTACTGCGTGGCTCAGTACTGCGC
GTTGAAATCAGACTAATGTAGGATTGAAAGGAGGTGGTTAGAGACATCTTGAATAAAATGATGGGCAA
GTTGAAATCAGACTAATGTAGGATTGAAAGCTTATTGACTTCTTCAATTCTCCCTTTTAGATATAGAT
GTTGAAATCAGACTAATGTAGGATTGAAAGGCATAACAGAGTGCGCCCTCCGTGCCTCCCGTTCCGT
GTTGAAATCAGACTAATGTAGGATTGAAAGATGGGGGAACTGGCTCGAACTCGCTGATTTCTTCGAAGA
GTTGAAATCAGACTAATGTAGGATTGAAAGCAAATTAGCAAGGGCTACAACGATACCATCATACGGA
GTTGAAATCAGACTAATGTAGGATTGAAAGATCAACATCGGTCCGGGACAATATCAGTTCAAGGCGAG
GTTGAAATCAGACTAATGTAGGATTGAAAGCTATGAGTACTGTAAAAGGAATTTAAATTTAAAATAGG
GTTGAAATCAGACTAATGTAGGATTGAAAGTTCTAATGCCGTGAAGTATGCTCTTTTTGCTTCGGGAG
GTTGAAATCAGACTAATGTAGGATTGAAAGGATTGTGGCGGTGATTTGGTGGACTTAGGCAACAATGA
GTTGAAATCAGACTAATGTAGGATTGAAAGTACTTCAGACAATACAATATCCCAATCGAACAACACA
GTTGAAATCAGACTAATGTAGGATTGAAAGACGGTGAACGTTGATCATCCTTTGATTGTCTTGCTGGTT
GTTGAAATCAGACTAATGTAGGATTGAAAGTGTCCATCAAAAGGCCTAACGGAGGATGAGGTTCAAGA
GTTGAAATCAGACTAATGTAGGATTGAAAGTTCCCACTTGATGATACCCTGACCTTCATTCTTTTCGA
GTTGAAATCAGACTAATGTAGGATTGAAAGTTGTGGTATAATTAATAGTTCTCATCTGCCATTTTTTAT
GTTGAAATCAGACTAATGTAGGATTGAAAGGCTCGCAAGGAAATGCTGAACGGAAATAGGCACACC
GTTGAAATCAGACTAATGTAGGATTGAAAGACTCTTTCACCTCCTAACAGCTTTTTTCTATATACTGTCACTG
GTTGAAATCAGACTAATGTAGGATTGAAAGTCTATACCACCTCAAAAAATGTTTAAGGAAGCAGAATCT
GTTGAAATCAGACTAATGTAGGATTGAAAGTTTGGGGTAACTGGCTTGAGCTTGCAGACTTCTTAGAG
GTTGAAATCAGACTAATGTAGGATTGAAAGGCGAGCATGAACGGGAACTGCACGAATCCGCGGGCTCCC
GTTGAAATCAGACTAATGTAGGATTGAAAGATAGCTGCAAAAATCTTTTGCATCCCTGCAAGTCCCCTT
GTTGAAATCAGACTAATGTAGGATTGAAAGGCAAATAATCTATAAACTGTCTCGCTAAGTCGAAAGC
GTTGAAATCAGACTAATGTAGGATTGAAAGCTCGAGTTCCCTGCTCATCCTGCTATATCTCTCAAAAAT
GTTGAAATCAGACTAATGTAGGATTGAAAGAGGATTACAGCCGGAATCCTCAGCCAGCCGACGAAGTAGA
GTTGAAATCAGACTAATGTAGGATTGAAAGAGATTGCCCCCCTCAGCATGGCGGGGGACATTCGCGA
GTTGAAATCAGACTAATGTAGGATTGAAAGTATTGTCAACCTTTTAATTCGCTCTTTCAATGCCCTAAA
GTTGAAATCAGACTAATGTAGGATTGAAAGTAGAGGAATGGGAAATTCTGGGGAAGGATGACGAAAAA
GTTGAAATCAGACTAATGTAGGATTGAAAGCTAAGAAGGCTTATAGTTGAGCACAGACACTTGGATTTA
GTTGAAATCAGACTAATGTAGGATTGAAAGTAGGGGGGAGGTAGATGAGGATTGAAAGAGGGAATCTCA
GTTGAAATCAGACTAATGTAGGATTGAAAGGAGAAGAGCTGACGTTCACTGAAGAGGACGGAAACAAC
GTTGAAATCAGACTAATGTAGGATTGAAAGCTGGGGTAACTTTCCGAAGTTCAAATGCAGGCATGAACT
GTTGAAATCAGACTAATGTAGGATTGAAAGGATTCTAAGTTAAAATAGATGGGTTGAATAAAAAAAG
GTTGAAATCAGACTAATGTAGGATTGAAAGGGACATGGGGCAATAAGCAGCCATTCAGCCCAGAGCCATCT
GTTGAAATCAGACTAATGTAGGATTGAAAGATAGAGTTCTATGTTGTAGTATGTTCCTGCAGAGAGATC
GTTGAAATCAGACTAATGTAGGATTGAAAGCATGCTCTTGCTGAGCCATAGCCAAGCCCATGAAGCGG
GTTGAAATCAGACTAATGTAGGATTGAAAGACGTAGATTTGACATTCTCAGCGGGCGCAACGAGGCAG
GTTGAAATCAGACTAATGTAGGATTGAAAGTTACATATATCTACAAAAGCTGCACAATCTGCAGCATC
GTTGAAATCAGACTAATGTAGGATTGAAAGAATAAGAGCGGGCGAGTATGGCGACATGAGCGAGGCAA
GTTGAAATCAGACTAATGTAGGATTGAAAGCTTCGAATGGGATAAGGGCTTAGAGGATTTGGCCGAATGGGA
GTTGAAATCAGACTAATGTAGGATTGAAAGCATGCTGAAAACTACATTGTCTTCAAGGGCGTTTCTTAT
GTTGAAATCAGACTAATGTAGGATTGAAAGAATCCTGTGACAGTGCAACGTATTACTTTACTATTCAT
GTTGAAATCAGACTAATGTAGGATTGAAAGGACTCTGTGTGTGAGGTGGTGGTAGAGATGGAGCAAAA
GTTGAAATCAGACTAATGTAGGATTGAAAGTTTTGAGAAAATAAGACGTATAAACCTTTGTAACAATG
GTTGAAATCAGACTAATGTAGGATTGAAAGAAATTCCTCGGACGGGATAAGTTCAGCAAAGTTATCGC
GTTGAAATCAGACTAATGTAGGATTGAAAGGTTTGATGTGCGAATGTGGATACTCGAAGAGGTATGTT
GTTGAAATCAGACTAATGTAGGATTGAAAGTTGAACCATATCTCGTTCTGAGGGAAACTCATCTGAAC
GTTGAAATCAGACTAATGTAGGATTGAAAGGTTCCAGCCTGTGATAATAAAAAGCACCACAGCACCGA
GTTGAAATCAGACTAATGTAGGATTGAAAGACATAAATCATTATCCTGTCGAAGTTTCAGATTATTTG
GTTGAAATCAGACTAATGTAGGATTGAAAGGATATACCATCAACTGAATTACCTGAATTGCCATTAACA
GTTGAAATCAGACTAATGTAGGATTGAAAGGTTGCGTATGCGCCCAGATATGGTACGAACTGTTCAAAA
GTTGAAATCAGACTAATGTAGGATTGAAAGCGCTGGGACGATAATCAGACTTGTAAACGCTGATGCTAC
GTTGAAATCAGACTAATGTAGGATTGAAAGCGCTGATACAGCTTCAGCGGAGTCATACGCTTAATCCT
GTTGAAATCAGACTAATGTAGGATTGAAAGTTCTATTTCTCGTCCGATATCCCTACTGTAGTGTCTTA
GTTGAAATCAGACTAATGTAGGATTGAAAGAATTTATTGGGCCGATGGCAACACGCCCAGCTGAGCTT
GTTGAAATCAGACTAATGTAGGATTGAAAGTTTTGTCCCGCTGAGCACGTGCACGCTGCGAATGATTG
GTTGAAATCAGACTAATGTAGGATTGAAAGTTCATTCTCTCTTTCAATCTCAGCAATCTTACTCATTA
GTTGAAATCAGACTAATGTAGGATTGAAAGTACTTGCGGAGCTACTACGACAAATACGCCAAACTGCTC
GTTGAAATCAGACTAATGTAGGATTGAAAGGTAGGGATTTCTTTAATCCAGGTTTCGTCAATTGCTGC
GTTGAAATCAGACTAATGTAGGATTGAAAGACATCTGAATCTTCGAATACCTCAATGCTGTATATGCTCGCTGTGC
GTTGAAATCAGACTAATGTAGGATTGAAAGAGTACCCTTTCAAGCACAATCACGGCGATATCGTCTGCA
GTTGAAATCAGACTAATGTAGGATTGAAAGCTTTGTACCTCAGCATAGCTCCCACAACACCGAGCAGCCCGG
GTTGAAATCAGACTAATGTAAGATTGAGAAACGTACTGTAGTGGAGTTTAGAAGGATCAAAAAATTCTCATCATTATTAA CAAAAATATCGCTATTCTTAATAAGTATTTGTTATCAAAAAATCAGCCAA TGCCCAAAAGGGTCTCAACAGGAATTCCCGGATTCGATGAACTTTGCGGG GGTGGGCTGCCGCAAGGAGGTACGTATCTTGTTGTAGGAGCTGCAGAATC TGGAAAAACTGTTTTTTCTATGCAGTATCTGGTAAATGGGGCGAGGATGT TTGGGGAAGCAGGAATATTCATCACC

These findings have been confirmed here - http://crispr.u-psud.fr/crispr/crisp...5D=NC_015320_3

There is another area in this Archaea where an almost identical pattern occurs. The word differs by only one letter, and the spacing between words is about 68 bases.
The word occurs 20 times.

GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 83682 - difference = 83682
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 83749 - difference = 67
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 83824 - difference = 75
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 83892 - difference = 68
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 83960 - difference = 68
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84027 - difference = 67
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84094 - difference = 67
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84161 - difference = 67
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84229 - difference = 68
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84295 - difference = 66
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84363 - difference = 68
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84429 - difference = 66
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84497 - difference = 68
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84565 - difference = 68
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84631 - difference = 66
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84699 - difference = 68
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84768 - difference = 69
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84836 - difference = 68
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84903 - difference = 67
GTTGAAATCAGACTATTGTAGGATTGAAAG - 2 - 84971 - difference = 68
Last edited by Craig.Paardekooper; 07-23-2012 at 03:44 PM.

## Pattern found in Elusimicrobium Minutum Archaea

Here is a pattern that I found in Elusimicrobium Minutum Archaea.

The longest repeating sequence is 36 bases long (6 x 6)
Each of the repeats is separated from the previous one by 66 bases
There are 13 repeats altogether

There are 12 spacers between these 13 repeats.

Total length of the 12 spacers = 13 x 30 - 1
Total length of the 13 repeats = 13 x 6 x 6
Total length of whole sequence = 13 x 66 - 1

ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC

Total = 13
ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266139 - difference = 266139
ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266204 - difference = 65
ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266270 - difference = 66
ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266336 - difference = 66
ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266402 - difference = 66
ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266468 - difference = 66
ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266534 - difference = 66
ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266600 - difference = 66
ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266666 - difference = 66
ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266732 - difference = 66
ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266798 - difference = 66
ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266930 - difference = 132
ATTCTATAAAATCAATTCTCGGAGGGCAACCCTAAC - 0 - 266996 - difference = 66

You might compare this to Termite Bacterium. Here the sequence length is 36 or 6 x 6, and the sequence repeats 21 times, with 66 bases between each repetition.

Total length of the 20 spacers = 3 x 7 x 30
Total length of the 21 repeats = 3 x 7 x 6 x 6
Total length of whole sequence = 3 x 7 x 66

GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 21

Total = 21
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 343558 - difference = 343558
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 343624 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 343755 - difference = 131
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 343821 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 343888 - difference = 67
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 343954 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344020 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344086 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344152 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344218 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344284 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344350 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344416 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344482 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344548 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344614 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344680 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344746 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344812 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344878 - difference = 66
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT - 0 - 344944 - difference = 66

GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATATAATAGCGATAGCAGTGATATTCTTGATG
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCAATATTTTCAAGCCCAGTGCTGTCTCCAAGTTTAGTTTCCTTCCTCTCT CAGATGTGCTATAATATTACCATTAGCATCTAATCCAAATGATTT
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATACCGTATGCTTCTATAGTGTTTAATCTACA
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATTGTATTCAGCTTCAATGGCGATTTTTGGTTC
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATTTACGAACCGATTAAACAAACATGGGACGC
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCCAGAGATACGACGAACGTCACGGATTGAA
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCATTATATCAGCGATTGAGAGCATAAAACC
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATTTGATTCGCTGAGAGCCTTTACGGCCTGTG
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCAAAGTAAATACCATACGGGAACCACGTAG
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATGTACTAAAGGGTGCTACTACAGTAAAGCCC
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCAATTGTACGGAATATATTAACTTTCTTAC
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATGTAAGGTGCATAGCGTACTCCGGTAGCTGG
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATTGGATATCATATTACCTGCATACGCTGATA
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATTAACCAAAGCAGGCATATCGTATACATTGG
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATACATTGTTAAAAAATATAGAGAGAATTACG
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCAGGGATTATTACGTCCTCGCCTGCTTTTA
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATGCAAAGTAACTTAATCTAAACATTTTTACA
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATATGACGAATATAAAACTATGGCTGATATAG
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATGAATTCAAGTACGACGATATCAGAGATGGT
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCATGCAAAGCACTTTTATCACACTTACGTA
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAAT

Termite Bacterium also displays longer sequences -

Here the sequence is 2 x 19 bases long, and repeats 6 times
The total length of the 5 spacers = 19 x 66

GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCA - 0 - 343624 - difference = 343624
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCA - 0 - 344020 - difference = 396
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCA - 0 - 344152 - difference = 132
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCA - 0 - 344284 - difference = 132
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCA - 0 - 344614 - difference = 330
GTTATAGTTTCCTTCCTCTCTCAGATGTGCTATAATCA - 0 - 344878 - difference = 264

Once again the interludes are multiples of 66.

And here we have a sequence of 2 x 20 bases, occurring 2 x 2 times, and each one separated by a multiple of 22

GTTTGGTTTTTATGTGTTAGTAGTTTGGTTTTTATGTGTT - 0 - 3559 - difference = 3559
GTTTGGTTTTTATGTGTTAGTAGTTTGGTTTTTATGTGTT - 0 - 3581 - difference = 22
GTTTGGTTTTTATGTGTTAGTAGTTTGGTTTTTATGTGTT - 0 - 3625 - difference = 22 x 2
GTTTGGTTTTTATGTGTTAGTAGTTTGGTTTTTATGTGTT - 0 - 3647 - difference = 22

And here we have a sequence of 43 bases

GTAGTTTGGTTTTTATGTGTTAGTAGTTTGGTTTTTATGTGTT - 0 - 3556 - difference = 3556
GTAGTTTGGTTTTTATGTGTTAGTAGTTTGGTTTTTATGTGTT - 0 - 3578 - difference = 22
GTAGTTTGGTTTTTATGTGTTAGTAGTTTGGTTTTTATGTGTT - 0 - 3644 - difference = 66

Repeats can be thought of as cycles when the repeat occurs at regular intervals. A cycle of 66 reminds me of Richards Biblewheel with 66 books divided into 3 cycles of 22 books each. In the Termite Bacterium we have 66 bases in each cycle = 22 codons.

It is interesting to find cycles within DNA, since they probably have a nested structure, with super-cycles encompassing smaller ones. Perhaps a grand scheme will emerge. Vernon found that when the values of the words in Genesis 1 v 1 are laid out as blocks of height 37 units, then each word turns out to be a multiple of 37 - a multiple of 6.

In the same way, if repeats define the cyclic nature of DNA, then laying out DNA in accordance with the cycle length may reveal some striking patterns. One way to detect this would be to create a program that automatically high-lights all repeats within a sequence. This would have to be done using a rich-textbox that allows font manipulation. Once the repeats are highlighted, then a slider could adjust the width (in base numbers) of the DNA display to see when repeats aligned or were in synchrony.

It would be interesting to see what part prime numbers play in these repeats. I understand that there are some ready-made online programs that I can use to help with this task - BLAST is one of them. I will read up on them and post a link to the resources.

What lies between the repeats is also of interest. It is here that I would expect smaller cycles to exist.

Approaching the topic on a statistical level, it would be interesting to find out which numbers stand out as the most common lengths of repeats, and most common lengths between repeats. The different lengths could be displayed on a graph - with the highest peaks indicating the most common.
Last edited by Craig.Paardekooper; 08-02-2012 at 03:29 AM.

## Pattern Found in Yeast

I think that this is an interesting repeating pattern - found in common yeast. The word length is 20 bases and the word occurs 8 times - every time separated by a multiple of 21 bases

GACCACTCGATTCGCGCGCA - 0 - 129413 - difference = 129413
GACCACTCGATTCGCGCGCA - 0 - 129455 - difference = 42
GACCACTCGATTCGCGCGCA - 0 - 129476 - difference = 21
GACCACTCGATTCGCGCGCA - 0 - 129497 - difference = 21
GACCACTCGATTCGCGCGCA - 0 - 129518 - difference = 21
GACCACTCGATTCGCGCGCA - 0 - 129539 - difference = 21
GACCACTCGATTCGCGCGCA - 0 - 129623 - difference = 84
GACCACTCGATTCGCGCGCA - 0 - 129644 - difference = 21

This sequence also occurs

TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129402 - difference = 129402
TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129465 - difference = 63
TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129486 - difference = 21
TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129507 - difference = 21
TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129612 - difference = 105
TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129633 - difference = 21

Each sequence is also separated by a multiple of 21 bases

This sequence also occurs

TTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCG CA - 0 - 129465 - difference = 129465
TTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCG CA - 0 - 129486 - difference = 21
TTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCG CA - 0 - 129612 - difference = 126

Once again each sequence is separated by a multiple of 21
Last edited by Craig.Paardekooper; 08-02-2012 at 03:05 AM.

Originally Posted by Craig.Paardekooper
I think that this is an interesting repeating pattern - found in common yeast. The word length is 20 bases and the word occurs 8 times - every time separated by a multiple of 21 bases

GACCACTCGATTCGCGCGCA - 0 - 129413 - difference = 129413
GACCACTCGATTCGCGCGCA - 0 - 129455 - difference = 42
GACCACTCGATTCGCGCGCA - 0 - 129476 - difference = 21
GACCACTCGATTCGCGCGCA - 0 - 129497 - difference = 21
GACCACTCGATTCGCGCGCA - 0 - 129518 - difference = 21
GACCACTCGATTCGCGCGCA - 0 - 129539 - difference = 21
GACCACTCGATTCGCGCGCA - 0 - 129623 - difference = 84
GACCACTCGATTCGCGCGCA - 0 - 129644 - difference = 21

This sequence also occurs

TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129402 - difference = 129402
TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129465 - difference = 63
TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129486 - difference = 21
TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129507 - difference = 21
TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129612 - difference = 105
TTCGCGCGCAGGACCACTCGATTCGCGCGCA - 0 - 129633 - difference = 21

Each sequence is also separated by a multiple of 21 bases

This sequence also occurs

TTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCG CA - 0 - 129465 - difference = 129465
TTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCG CA - 0 - 129486 - difference = 21
TTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCG CA - 0 - 129612 - difference = 126

Once again each sequence is separated by a multiple of 21
That is very curious. Any idea what it might mean? Is there anything in common amongst the sequences of length 21n that divide the repeated sequences?

Here is the full sequence, so you can see what lies between each sequence -

GACCACTCGATTCGCGCGCAGGACCACTCGGTTCGCGCGCAAGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCGCAGGACCACTCGATTCGCGCGCAAGACCACTCGATTCGCGCGCAAGACCACCTGATTCGCGCGCAGGACCACCTGATTCGCGCGCAGGACCATCCGGTTCGCGCGCAGGACCACTCGATTCGCGCGCAG

## Pattern in Thermococcus Archaea

Here are the results for Thermococcus Archaea

The sequence has 29 letters and repeats itself 39 times
The distance between the beginning of each sequence is approx 66 letters.

Total = 39
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100515 - difference = 100515
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100582 - difference = 67
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100648 - difference = 66
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100716 - difference = 68
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100782 - difference = 66
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100850 - difference = 68
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100917 - difference = 67
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 100982 - difference = 65
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101049 - difference = 67
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101113 - difference = 64
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101180 - difference = 67
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101245 - difference = 65
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101312 - difference = 67
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101380 - difference = 68
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101447 - difference = 67
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101513 - difference = 66
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101578 - difference = 65
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101646 - difference = 68
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101713 - difference = 67
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101781 - difference = 68
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101849 - difference = 68
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101915 - difference = 66
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 101980 - difference = 65
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102047 - difference = 67
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102114 - difference = 67
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102182 - difference = 68
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102248 - difference = 66
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102314 - difference = 66
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102381 - difference = 67
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102444 - difference = 63
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102512 - difference = 68
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102578 - difference = 66
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102648 - difference = 70
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102715 - difference = 67
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102782 - difference = 67
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102849 - difference = 67
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102917 - difference = 68
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 102984 - difference = 67
TTTCAATTCTCCCAGAGTCTTATTGCAAC - 0 - 103050 - difference = 66

Another sequence is 30 letters long, and repeats 37 times. The space between the beginning of each sequence varies between 66 - 69 letters

Total = 37
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347446 - difference = 347446
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347510 - difference = 64
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347579 - difference = 69
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347648 - difference = 69
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347716 - difference = 68
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347783 - difference = 67
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347849 - difference = 66
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347915 - difference = 66
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 347983 - difference = 68
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348050 - difference = 67
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348117 - difference = 67
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348185 - difference = 68
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348252 - difference = 67
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348319 - difference = 67
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348387 - difference = 68
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348454 - difference = 67
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348523 - difference = 69
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348589 - difference = 66
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348657 - difference = 68
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348724 - difference = 67
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348791 - difference = 67
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348857 - difference = 66
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348923 - difference = 66
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 348989 - difference = 66
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349057 - difference = 68
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349125 - difference = 68
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349192 - difference = 67
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349259 - difference = 67
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349326 - difference = 67
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349393 - difference = 67
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349461 - difference = 68
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349528 - difference = 67
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349596 - difference = 68
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349662 - difference = 66
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349730 - difference = 68
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349800 - difference = 70
GTTGCAATAAGACTCTAGGAGAATTGAAAC - 0 - 349867 - difference = 67

Here is a 16 letter sequence = 4 x 4 that repeats 14 times
The distance between the beginning of one sequence and the next is 16 = 4 x 4

Total = 11
TAAGGAGGTGATATAG - 0 - 909684 - difference = 909684
TAAGGAGGTGATATAG - 0 - 909700 - difference = 16
TAAGGAGGTGATATAG - 0 - 909716 - difference = 16
TAAGGAGGTGATATAG - 0 - 909732 - difference = 16
TAAGGAGGTGATATAG - 0 - 909748 - difference = 16
TAAGGAGGTGATATAG - 0 - 909764 - difference = 16
TAAGGAGGTGATATAG - 0 - 909780 - difference = 16
TAAGGAGGTGATATAG - 0 - 909796 - difference = 16
TAAGGAGGTGATATAG - 0 - 909812 - difference = 16
TAAGGAGGTGATATAG - 0 - 909828 - difference = 16
TAAGGAGGTGATATAG - 0 - 909844 - difference = 16
TAAGGAGGTGATATAG - 0 - 909860 - difference = 16
TAAGGAGGTGATATAG - 0 - 909876 - difference = 16
TAAGGAGGTGATATAG - 0 - 909892 - difference = 16

If we take off the last letter, then it becomes a 64 letter sequence = 4 x 4 x 4
Each sequence itself consists of 4 distinct parts TAAGGAGGTGATATAG each of 4 x 4
And each sequence begins 4 x 4 after the beginning of the previous one.

Here is the actual DNA sequence so you can see what is going on -

TAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAGTAAGGAGGTGATATAG
Last edited by Craig.Paardekooper; 07-25-2012 at 07:46 AM.

