we wrote some code last year to build a big Trie of the whole transcriptome -- you could use it to fuzzy-search to see if this mRNA is within some edit distance of any piece of normal human RNA, because then it could theoretically cause side effects via RNA interference. stopped the project because I can't afford to develop a gene therapy right now, but the fuzzy search worked
to make the trie use the function here. the variable K is the length of the Kmers (runs of RNA). Larger values are gonna take a lot longer. ( warning: big job, uses multiprocessing...pypy recommended for speed )
https://github.com/bionicles/coronavirus/blob/b6f0db9dd8aaf7...
https://github.com/bionicles/coronavirus
to make the trie use the function here. the variable K is the length of the Kmers (runs of RNA). Larger values are gonna take a lot longer. ( warning: big job, uses multiprocessing...pypy recommended for speed ) https://github.com/bionicles/coronavirus/blob/b6f0db9dd8aaf7...
then you could use this recursive function to generate potential matches within some cutoff https://github.com/bionicles/coronavirus/blob/b6f0db9dd8aaf7...
the function right below it converts the generator to a list. then you could save that
enjoy