13

This is an assignment that had confused me for a long time. So I think you guys who study computational biology might be interested. The original question is:

Find the two most similar DNA sequences of length 20 that Blast using a word length of 5 will fail to align.

theforestecologist
  • 29,867
  • 10
  • 122
  • 204
NonalcoholicBeer
  • 413
  • 2
  • 10

2 Answers2

13

BLAST works by finding a perfect match between sequences of a length equal to this "word length" and then enlarging it in a standard way -- yet there will be no alignment without this perfectly matched word.

So in your case, you must look for two 20bp sequences with no common 5bp sub-sequence; for instance:

AAAAAAAAAAAAAAAAAAAA

and

AAAACAAAACAAAACAAAAC
-1

I am not sure I understand BLAST correctly.

When using BLAST in DNA and protein, they are different. There is a threshold T in protein "seeding" step, which means the seeding sequence is not perfectly matched. However, it seems there is no T in seeding step and we are looking for perfect match and extend them to neighbouring sequence.

So, if there is no word length 5 exact match between these 2 sequence, BLAST will fail.

jonsca
  • 4,761
  • 3
  • 31
  • 58