1) 若直接以20种氨基酸统计k_word: (以ZD98数据集为例)
k | Dimension |
2 | 400 |
3 | 6490 |
4 | 22265 |
维数太大不适用构造特征向量
考虑氨基酸约化后特征提取
约化方案:
Classification | Abbreviation | Abbreviation |
Strongly hydrophilic or polar | L | R, D, E, N, Q, K, H |
Strongly hydrophobic | B | L, I, V, A, M, F |
Weakly hydrophilic or weakly hydrophobic | W | S, T, Y, W |
Proline | P | P |
Glycine | G | G |
Cysteine | C | C |
约化后的特征
k | dimension |
2 | 36 |
3 | 211 |
4 | 1071 |
5 | 3732 |
6 | 8698 |
7 | 14620 |