
Originally Posted by
Craig.Paardekooper
Hi Luke,
The signature I am looking for is a string of 22 different aminoacid letters, all occuring consecutively. This MAY correspond to the 22 letters of the alphabet, hence providing an inbuilt KEY for translating non-coding DNA into spoken language. Once a string of 22 different aminoacids is found, then it's value as a key will be judged solely upon it's ability to translate DNA into meaningful language. I would also expect a genuine key to occur in multiple places.
An effort to find this key is currently underway. 3 days ago I set in motion a software program that is slowly proceeding through the X Chromosome in search of potential keys. It will take an estimated 200 hours to complete analysis.
I have not looked at Boulay's theory yet. Though I am persuaded by Shcherbak's and Rakocevic's. Research is advancing all the time, and you might like to search Pubmed to see what advances have been made.
Isolating Coding from Noncoding Areas
I have worked out a way of identifying WORDS within NonCoding DNA. However, to implement this, I first need to isolate the non-coding DNA areas from the coding areas.
Here is the computer code that I will use to isolate the non-coding DNA -
1. Create an array of all the non-overlapping codons within a sequence
2. Create the variables that we will use
Dim IndexA as Long = 0
Dim IndexB As Long = 0
Dim n as long = 0
Dim Start as Integer = 0
Dim NonCodingDNA as String = ""
Dim CodingDNA as String = ""
3. Next we loop through the array detecting ATG (Start codons) and TAA, TAG, TGA (Stop codons)
For Each codon in CodonArray
If codon = "ATG" then
IndexA = n
Start = 1
NonCodingDNA &= DNA.Substring(IndexB, IndexA - IndexB)
End If
If (codon = "TAA" or codon = "TAG" or codon = "TGA") and Start = 1 then
If n - IndexA > 100 Then
IndexB = n
Start = 0
CodingDNA &= DNA.Substring(IndexA, IndexB - IndexA)
End If
End If
n += 1
Next
This code will isolate all noncoding areas and coding areas based on the criteria that a coding sequence -
1. starts with ATG
2. ends with TAA, TAG or TGA
3. is atleast 100 codons long
Once I have separated out the non-coding areas, then I can convert the codons in these areas into single letter aminoacids.
This should speed up the search for the 22 letter alphabet, since I can concentrate the search on non-coding areas.
Now, what I could also look for are actual words made out of aminoacid letters.
For example, if we took a string of letters such as THECHURCHWASFULLONSUNDAY, then potential 6 letter words would be THECHU, HECHUR, ECHURC, CHURCH etc.
So I can create a list of all possible 6 letter words by looping through the aminoacid letters, incrementing the start position by 1, and length fixed at 6.
So how will we know that any of these words represents a real word such as CHURCH, or is just nonsense such as HECHUR? Well we could see if the frequency of occurrence of a word stands out from the frequency of occurrence of other words.
We would expect real words to have a higher frequency than nonsense.
So I will have to create a separate program that can store all 6 letter words, for example, and then record the frequency of each.
Once we have identified such a "word", then we will have to match it to a spoken word that occurs with similar frequency. It will be a bit like a cross word puzzle. The more "words" we have, the more their letters will overlap, and help in solving the puzzle.
Anyhow, the discovery of an alphabet key would be much faster than this word search, so that is what I am engaged in at present.
Bookmarks