全局序列比对

Needleman–Wunsch algorithm

序列比对是生物信息中常见的问题,算法有全局比对 Needleman-Wunsch和Smith-Waterman局部比对算法,两者非常相似,区别只是打分矩阵的差异。这里只以简单的双序列全局比对作为引子,具体描述可以参考维基百科(https://en.wikipedia.org/wiki/Needleman%E2%80%93Wunsch_algorithm)。算法用Python实现如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
import numpy as np


def Needleman_Wunsch(seq1, seq2):
a = len(seq1)
b = len(seq2)
if a == 0 or b == 0:
return
matrix = np.zeros((a+1, b+1))
matrix[0, :] = [-1*x for x in range(b+1)]
matrix[:, 0] = [-1*x for x in range(a+1)]
for i in range(1, a+1):
for j in range(1, b+1):
if seq1[i-1] == seq2[j-1]:
top_left = matrix[i-1, j-1] + 1
else:
top_left = matrix[i-1, j-1] + -1
up = matrix[i-1, j] + -1
right = matrix[i, j-1] + -1
score = max(top_left, up, right)
matrix[i, j] = score
newseq1 = ''
newseq2 = ''

while 1:
if a == 0 or b == 0:
break
up = matrix[a-1, b]
right = matrix[a, b-1]
top_left = matrix[a-1, b-1]
if top_left >= right and top_left >= up:
newseq1 += seq1[a-1]
newseq2 += seq2[b-1]
a -= 1
b -= 1
elif up > top_left and up >= right:
newseq2 += '-'
newseq1 += seq1[a-1]
a -= 1
elif right > top_left and right > up:
newseq1 += '-'
newseq2 += seq2[b-1]
b -= 1
newseq1 = newseq1[::-1]
newseq2 = newseq2[::-1]
print newseq1
print ''.join(['|' if newseq1[i] == newseq2[i]
else ' ' for i in range(len(newseq2))])
print newseq2
-------------本文结束感谢您的阅读-------------