catalogue
1. 隐写术 2. Substitution cipher 3. Transposition cipher 4. Bacon's cipher 5. LSB-Steganography
1. 隐写术
0x1: 文件Binary拼接隐藏(增加数据)
1. 制作一个1.zip,把想要隐藏的东西放进去 2. 再需要一张jpg图片2.jpg 3. 执行一个命令 copy /b 2.jpg+1.zip output.jpg 4. 得到一张隐写图片,这是利用了copy命令,将两个文件已二进制方式连接起来,生成output.jpg的新文件。而在jpg中,是有结束符的,16进制是FF D9,利用winhex可以看到正常的jpg结尾都是FF D9的,图片查看器会忽视jpg结束符之后的内容,所以我们附加的zip,自然也就不会影响到图像的正常显示
0x2: LSB(Least Significant Bit)隐写(修改数据)
LSB也就是最低有效位 (Least Significant Bit)。原理就是图片中的像数一般是由三种颜色组成,即三原色,由这三种原色可以组成其他各种颜色,例如在PNG图片的储存中,每个颜色会有8bit,LSB隐写就是修改了像数中的最低的1bit,在人眼看来是看不出来区别的,也把信息隐藏起来了。譬如我们想把’A’隐藏进来的话,如下图,就可以把A转成16进制的0x61再转成二进制的01100001,再修改为红色通道的最低位为这些二进制串
如果是要寻找这种LSB隐藏痕迹的话,Stegsolve可以来辅助我们进行分析,http://www.caesum.com/handbook/Stegsolve.jar
打开之后,使用Stegsolve——Analyse——Frame Browser这个可以浏览三个颜色通道中的每一位
在这个过程中,我们要注意到,隐写的载体是PNG的格式,如果是像之前的jpg图片的话就是不行的,原因是jpg图片对像数进行了有损的压缩,你修改的信息可能会被压缩的过程破坏。而PNG图片虽然也有压缩,但却是无损的压缩,这样子可以保持你修改的信息得到正确的表达,不至于丢失。BMP的图片也是一样的,是没有经过压缩的,可以发现BMP图片是特别的大的,因为BMP把所有的像数都按原样储存,没有压缩的过程
0x3: Beacon加密(基于外形形态的编码隐写)
0x4: 载体
数据在隐藏的时候,我们常常是需要先分析是数据隐藏在哪里,也就是他在利用是什么做载体,之后才可以进一步的分析是加密或编码的,例如
1. bmp、png的LSB/MSB Bit 2. 图片后copy跟上附加数据 3. jpg的exif的部分。exif的信息是jpg的头部插入了数码照片的信息
Relevant Link:
http://drops.wooyun.org/tips/4862 https://www.ibm.com/developerworks/cn/web/wa-steganalysis/ http://www.alloyteam.com/2016/03/image-steganography/ http://www.guokr.com/article/3741/ http://bobao.360.cn/learning/detail/441.html
2. Substitution cipher
In cryptography, a substitution cipher is a method of encoding by which units of plaintext are replaced with ciphertext, according to a fixed system; the "units" may be single letters (the most common), pairs of letters, triplets of letters, mixtures of the above, and so forth. The receiver deciphers the text by performing the inverse substitution.
There are a number of different types of substitution cipher
1. simple substitution cipher(单表): If the cipher operates on single letters, it is termed a simple substitution cipher; 单表替代 就是明文的每一个字母用相应的数字代替。代替规律是根据密钥形成的一个新的字母表,与原明文字母有映射关系 1) 凯撒加密就是一种单表替换 2) Rot13: 和凯撒加密的区别在于偏移量不同 2. 多表替代 a cipher that operates on larger groups of letters is termed polygraphic. A monoalphabetic cipher uses fixed substitution over the entire message, whereas a polyalphabetic cipher uses a number of substitutions at different positions in the message, where a unit from the plaintext is mapped to one of several possibilities in the ciphertext and vice versa. 1) 维吉尼亚(Vigenere): 由多个单字母密码和替代密码组成,维吉尼亚密码是一种常用的多表替代密码。维吉尼亚密码循环使用有限个字母(密钥字符串)来生成密钥并实现替代。每个密钥字母用来加密一个明文字母。第一个密钥字母用来加密明文的第一个字母。第二个密钥字母加密明文的第二个字母,以此类推
0x1: 凯撒密码
凯撒密码是一种简单的加密方法,即将文本中的每一个字符都位移相同的位置
如选定位移3位:
原文:a b c
密文:d e f
由于出现了字母频度分析,凯撒密码变得很容易破解
//凯撒密码的"全量等长平移"不改变原文的词频 1. 计算其中每个字母的出现频率。我们将频率最高的字母标为1号,频率排第2的标为2号,第三标为3号,依次类推,直到数完样品文章中所有字母 2. 观察需要破译的密文,同样分类出所有的字母的词频,即也分为1号、2号、3号 3. 对比原文和密文的1号、2号、3号..的偏移差值是否都相等,如果相等,则它们的差值即为key
0x2: 凯撒解密
""" # -*- coding: utf-8 -*- # Author: zhenghan <zhenghan.zh@alibaba-inc.com> # Date: 2016/6/12 14:35 @version: undo @license: Apache Licence @site: http://littlehann.cnblogs.com/ @software: PyCharm Community Edition @file: Crypto.py """ def convert(c, key, start = 'a', n = 26): a = ord(start) offset = ((ord(c) - a + key)%n) return chr(a + offset) def caesarEncode(s, key): o = "" for c in s: if c.islower(): o+= convert(c, key, 'a') elif c.isupper(): o+= convert(c, key, 'A') else: o+= c return o def caesarDecode(s, key): return caesarEncode(s, -key) def forDecode(source): for key in range(50): print "key = ", key encoded = caesarEncode(source, key) decoded = caesarDecode(encoded, key) print "encoded: ", encoded print "decoded: ", decoded print "----------" if __name__ == '__main__': source = 'LW GUN QTBZAUGW AXD WXR VQC MNQD GUZJ BW YMZNCD Z QB ZBSMNJJND ONMW FNTT DXCN WXRM JXTRGZXC ENW ZJ XBZUQMBMJBCJ GUZJ TZGGTN VUQTTNCAN FQJ CXG GXX UQMD FQJ ZG' forDecode(source)
0x3: 维吉尼亚密码
维吉尼亚密码引入了"密钥"的概念,即根据密钥来决定用哪一行的密表来进行替换,以此来对抗字频统计。假如以上面第一行代表明文字母,左面第一列代表密钥字母,对如下明文加密
1. 明文: TO BE OR NOT TO BE THAT IS THE QUESTION 2. 当选定RELATIONS作为密钥时,加密过程是 1) 明文一个字母为T,第一个密钥字母为R,因此可以找到在R行中代替T的为K 2) 依此类推,得出对应关系如下: 密钥:RELAT IONSR ELATI ONSRE LATIO NSREL 明文:TOBEO RNOTT OBETH ATIST HEQUE STION 密文:KSMEH ZBBLK SMEMP OGAJX SEJCS FLZSY
0x4: 维尼吉亚的破解
维吉尼亚密码分解后实则就是多个凯撒密码,只要知道密钥的长度,我们就可以将其分解
1. 如密文为:ABCDEFGHIJKLMN 2. 如果我们知道密钥长度为3,就可将其分解为三组: 组1:A D G J N 组2:B E H K 组3:C F I M 3. 分解后每组就是一个凯撒密码,即组内的位移量是一致的,对每一组即可用频度分析法来解密,每组得到的key最终拼接起来最终得到整个key短语 4. 所以破解维吉尼亚密码的关键就是确定密钥的长度
1. 确定密钥长度
确定密钥长度主要有两种方法,Kasiski 测试法相对简单很多,但Friedman 测试法的效果明显优于Kasiski 测试法
1. Kasiski 测试法: 在英文中,一些常见的单词如the有几率被密钥的相同部分加密,即原文中的the可能在密文中呈现为相同的三个字母。在这种情况下,相同片段的间距就是密文长度的倍数。所以我们可以通过在密文中找到相同的片段,计算出这些相同片段之间的间距,而密钥长度理论上就是这些间距的公约数 2. Friedman 测试法: 首先我们要知道,对于一种特定的自然语言,如果文本足够长,那么各个字母出现的概率是相对稳定的
2. 字母频度分析
在知道了密钥长度n以后,就可将密文分解为n组,每一组都是一个凯撒密码,然后对每一组用字母频度分析进行解密,和在一起就能成功解密凯撒密码
0x5: 基于高频字典是否命中智能判定"替换加密"密码key: 智能推断key的弗尼吉亚解密
""" # -*- coding: utf-8 -*- # Author: zhenghan <zhenghan.zh@alibaba-inc.com> # Date: 2016/6/12 13:52 @version: undo @license: Apache Licence @site: http://littlehann.cnblogs.com/ @software: PyCharm Community Edition @file: cipher.py """ import copy import re from itertools import combinations try: from string import maketrans except ImportError: maketrans = str.maketrans # In decrypt.py set MAX_GOODNESS_LEVEL with number 1 - 7, how many word dicts to use(see words/ for wordlists) # In decrypt.py set MAX_BAD_WORDS_RATE with number 0.0 - 1.0, the max rate of bad words MAX_GOODNESS_LEVEL = 7 # 1-7 MAX_BAD_WORDS_RATE = 0.06 ABC = "abcdefghijklmnopqrstuvwxyz" class WordList: MAX_WORD_LENGTH_TO_CACHE = 8 def __init__(self): # words struct is # {(length,different_chars)}=[words] if len > MAX_WORD_LENGTH_TO_CACHE # {(length,different_chars)}=set([words and templates]) else self.words = {} for goodness in range(MAX_GOODNESS_LEVEL): for word in open("words/" + str(goodness) + ".txt"): word = word.strip() word_len = len(word) properties = (word_len, len(set(word))) if word_len > WordList.MAX_WORD_LENGTH_TO_CACHE: words = self.words.get(properties, []) words.append(word) self.words[properties] = words else: # add all possible combinations of the word and dots words = self.words.get(properties, set([])) for i in range(word_len + 1): for dots_positions in combinations(range(word_len), i): adding_word = list(word) for j in dots_positions: adding_word[j] = '.' words.add(''.join(adding_word)) self.words[properties] = words def find_word_by_template(self, template, different_chars): """ Finds the word in the dict by template. Template can contain alpha characters and dots only """ properties = (len(template), different_chars) if properties not in self.words: return False words = self.words[properties] if properties[0] > WordList.MAX_WORD_LENGTH_TO_CACHE: template = re.compile(template) for word in words: if template.match(word): return True else: if template in words: return True return False class KeyFinder: def __init__(self, enc_words): self.points_threshhold = int(len(enc_words) * MAX_BAD_WORDS_RATE) self.dict_wordlist = WordList() self.enc_words = enc_words self.different_chars = {} self.found_keys = {} # key => bad words for enc_word in enc_words: self.different_chars[enc_word] = len(set(enc_word)) def get_key_points(self, key): """ The key is 26 byte alpha string with dots on unknown places """ trans = maketrans(ABC, key) points = 0 for enc_word in self.enc_words: different_chars = self.different_chars[enc_word] translated_word = enc_word.translate(trans) if not self.dict_wordlist.find_word_by_template(translated_word, different_chars): points += 1 return points def recursive_calc_key(self, key, possible_letters, level): """ Tries to place a possible letters on places with dots """ print("Level: %3d, key: %s" % (level, key)) if '.' not in key: points = self.get_key_points(key) print("Found: %s, bad words: %d" % (key, points)) self.found_keys[key] = points return nextpos = -1 # a pos with a minimum length of possible letters minlen = len(ABC) + 1 for pos in range(len(ABC)): if key[pos] == ".": for letter in list(possible_letters[pos]): new_key = key[:pos] + letter + key[pos + 1:] if self.get_key_points(new_key) > self.points_threshhold: possible_letters[pos].remove(letter) if not possible_letters[pos]: return if len(possible_letters[pos]) < minlen: minlen = len(possible_letters[pos]) nextpos = pos while possible_letters[nextpos]: letter = possible_letters[nextpos].pop() new_possible_letters = copy.deepcopy(possible_letters) for pos in range(len(ABC)): new_possible_letters[pos] -= set([letter]) new_possible_letters[nextpos] = set([letter]) new_key = key[:nextpos] + letter + key[nextpos + 1:] self.recursive_calc_key(new_key, new_possible_letters, level + 1) def find(self): if not self.found_keys: possible_letters = [set(ABC) for i in range(len(ABC))] self.recursive_calc_key("." * len(ABC), possible_letters, 1) return self.found_keys def main(): enc_text = open("encrypted.txt").read().lower() enc_words = re.findall(r"[a-z']+", enc_text) # skip the words with apostrophs enc_words = [word for word in enc_words if "'" not in word and len(word) <= WordList.MAX_WORD_LENGTH_TO_CACHE ] enc_words = enc_words[:200] print("Loaded %d words in encrypted.txt, loading dicts" % len(enc_words)) keys = KeyFinder(enc_words).find() if not keys: print("Key not founded, try to increase MAX_BAD_WORDS_RATE") for key, bad_words in keys.items(): print("Possible key: %s, bad words:%d" % (key, bad_words)) best_key = min(keys, key=keys.get) print("Best key: %s, bad_words %d" % (best_key, keys[best_key])) trans = maketrans(ABC, best_key) decrypted = open("encrypted.txt").read().translate(trans) try: decryptedFile = open("decrypted.txt", "w") try: decryptedFile.write(decrypted) finally: decryptedFile.close() except IOError: print("[*] Decrypted text not saved") print(decrypted) if __name__ == "__main__": try: #import cProfile #cProfile.run('main()') main() except Exception as E: print("Error: %s" % E)
Relevant Link:
http://substitution.webmasters.sk/simple-substitution-cipher.php http://rumkin.com/tools/cipher/substitution.php https://www.douban.com/group/topic/13381765/ http://baike.baidu.com/view/541906.htm https://github.com/alexbers/substitution_cipher_solver https://github.com/larz258/Crypto http://www.cnblogs.com/gaopeng527/p/4518070.html https://en.wikipedia.org/wiki/Substitution_cipher#Simple_substitution http://lazynight.me/2859.html http://cizixs.com/2014/11/30/two-encryption-methods-and-cracks http://crypto.interactive-maths.com/monoalphabetic-substitution-ciphers.html http://baike.baidu.com/view/270838.htm http://blog.csdn.net/limisky/article/details/16885959
3. Transposition cipher
https://en.wikipedia.org/wiki/Transposition_cipher
4. Bacon's cipher
Bacon's cipher or the Baconian cipher is a method of steganography (a method of hiding a secret message as opposed to a true cipher) devised by Francis Bacon in 1605. A message is concealed in the presentation of text, rather than its content.
0x1: Cipher details(I=J & U=V)
To encode a message, each letter of the plaintext is replaced by a group of five of the letters 'A' or 'B'. This replacement is a binary encoding and is done according to the alphabet of the Baconian cipher, shown below.
a AAAAA g AABBA n ABBAA t BAABA b AAAAB h AABBB o ABBAB u-v BAABB c AAABA i-j ABAAA p ABBBA w BABAA d AAABB k ABAAB q ABBBB x BABAB e AABAA l ABABA r BAAAA y BABBA f AABAB m ABABB s BAAAB z BABBB
0x2: Cipher details(I != J or U != V)
0x3: 使用方式
隐写术的强大之处在于"隐写"后的"密文"在字符内容上可以和原文没有任何关系,它们之间可以以任何的映射关系完成映射,例如
BaCoN's cIphEr or THE bacOnIAN CiPHer iS a meThOD oF sTEGaNOGrapHY (a METhoD Of HidIng A sECRet MeSsaGe as OpPOsEd TO a TRUe CiPHeR) dEVIseD BY francis bAcoN. a MessAge Is coNCeALED in THe pRESenTatIoN OF TexT, ratHer thaN iTs coNteNt. tO enCODe A MEsSaGe, eaCh lETter Of THe pLAInText Is rePLAcED By A groUp oF fIvE OF the LetTeRS 'a' OR 'b'. thIS REplaCEmENT is doNE acCORding tO thE alpHABet oF THe BACOnIAN cIpHeR, sHoWn bElOw. NoTe: A SeCoNd vErSiOn oF BaCoN'S CiPhEr uSeS A UnIqUe cOdE FoR EaCh lEtTeR. iN OtHeR WoRdS, i aNd j eAcH HaS ItS OwN PaTtErN. tHe wRiTeR MuSt mAkE UsE Of tWo dIfFeReNt tYpEfAcEs fOr tHiS CiPhEr. AfTeR PrEpArInG A FaLsE MeSsAgE WiTh tHe sAmE NuMbEr oF LeTtErS As aLl oF ThE As aNd bS In tHe rEaL, sEcReT MeSsAgE, tWo tYpEfAcEs aRe cHoSeN, oNe tO RePrEsEnT As aNd tHe oThEr bS. tHeN EaCh lEtTeR Of tHe fAlSe mEsSaGe mUsT Be pReSeNtEd iN ThE ApPrOpRiAtE TyPeFaCe, AcCoRdInG To wHeThEr iT StAnDs fOr aN A Or a b. To dEcOdE ThE MeSsAgE, tHe rEvErSe mEtHoD Is aPpLiEd. EaCh 'TyPeFaCe 1' LeTtEr iN ThE FaLsE MeSsAgE Is rEpLaCeD WiTh aN A AnD EaCh 'TyPeFaCe 2' LeTtEr iS RePlAcEd wItH A B. tHe bAcOnIaN AlPhAbEt iS ThEn uSeD To rEcOvEr tHe oRiGiNaL MeSsAgE. aNy mEtHoD Of wRiTiNg tHe mEsSaGe tHaT AlLoWs tWo dIsTiNcT RePrEsEnTaTiOnS FoR EaCh cHaRaCtEr cAn bE UsEd fOr tHe bAcOn cIpHeR. bAcOn hImSeLf pRePaReD A BiLiTeRaL AlPhAbEt[2] FoR HaNdWrItTeN CaPiTaL AnD SmAlL LeTtErS WiTh eAcH HaViNg tWo aLtErNaTiVe fOrMs, OnE To bE UsEd aS A AnD ThE OtHeR As b. ThIs wAs pUbLiShEd aS An iLlUsTrAtEd pLaTe iN HiS De aUgMeNtIs sCiEnTiArUm (ThE AdVaNcEmEnT Of lEaRnInG). BeCaUsE AnY MeSsAgE Of tHe rIgHt lEnGtH CaN Be uSeD To cArRy tHe eNcOdInG, tHe sEcReT MeSsAgE Is eFfEcTiVeLy hIdDeN In pLaIn sIgHt. ThE FaLsE MeSsAgE CaN Be oN AnY ToPiC AnD ThUs cAn dIsTrAcT A PeRsOn sEeKiNg tO FiNd tHe rEaL MeSsAgE. /* 1. 定义映射关系: 大写字母代表B、小写字母代表a 2. 对密文进行预处理 1) 去除空格 2) 去除标点、单双引号等字符 2. 将字母根据大小写映射关系翻译为AB..的组合 BABABAABAABAAABBBAAABABBBBABBAAABAAABABBABABBBABBBAAABBABBBAABBABAABAABABBBAABABAABAAABABBABABBABBBABABBABABBBAABBBAAAAAAAABAABABAAABAABAAABBABBBBAABBAABBBAABAABABBBBAABAAABAAAAABABAAABAABAABAABBBABBBABABAAABAABBAAABABBAABBBABAAABAAABBBABBBABAAABAABABABBBAAABAABABBABBAAABBBBAAABBABBBAAAABBAABBBAAAAABAABAAABBBAAABBBABBBBABBBABABABABABAABABABABABBABABAABABABAABBABABBBABABAABABBBABABAABABBABBABAABABABABBABABBABABAABAAABABBABBABBABBABABABABAABABABBABAABABBABBAABAABABABABAABABABABAABAABABBABABABABABBABABABABBBABABBABABABBABAABAABABBABABAABBABABABBAABAABBABBAABAABBAABAABABABABABBABABABABAABABABABAABAABABABABAABBABABABABBAABAABAABABAABABABBABAABABABBAABAABABAABABABAABABBAABABABABAABBABBABABABABABBABABABABABABABABBAABABABAABBABABAABAABBBAAABAABABABBABBABABABABAABABABAABABABBAABABABABABABABABABABABABAABBABBABABBABABABBAABABABABBABAABBBABBABABABABABABABABAABBABABABAABABBBABAABABABABBABABABAABBABAABABBAABABABAABAABABABABBABABABABAABABABBAABABABAABAABABABAABABBABABAABAABABABABBABABABABABABABBABBABAABABABABAABAABBABAABAABAABABAABABABABABAABABABAABABABABBBABABABABBABABABABABBABABABABABBABABABBABBABABBABABABBABAABABBABABAABAABABABABABAABABABABBAABBABAABBBABBABBABABBAABABAABAABABABABAABBAABABABABABAABABAABBABBAABABABABAABABABABABABABBABABABABABBAABABABABBABABABBABBABABABBAABAABABAABABABBABBAABABBAABABAABAABABABABABAABABABBABABABBAABABABABABAABABABBAABABAABABABABBABABBABABABBABBAABBABBABABBABBABAABAABABABABBBABABAABABABAABBABAABAABABBABABAB 3. 使用bacon算法进行翻译 veryxwellxdonexfellowxhackerxthexsecretxkeywordxisxclmpdodhashdxxkvfk su jouw kwwurnw vfnfwjksvewvlkxlk jnjvmtmtevlkuvjfknkZeuvuvskksZktnkwvkvsu soevwvjkkZkvkvjwwvsvu vkvjvjosvvjuwkskwvjlfjfjnjflkvlnfkjuskkvfjk k vnkwvwwvuwusvjkZu wwkjktfkstmvjkvnkwkwvwvskk fsskvfnlfkswkkwwvwnvwskxkktjfv 4. 将x替换为空格 very well done fellow hacker the secret keyword is clmpdodhashd kvfk su jouw kwwurnw vfnfwjksvewvlk lk jnjvmtmtevlkuvjfknkZeuvuvskksZktnkwvkvsu soevwvjkkZkvkvjwwvsvu vkvjvjosvvjuwkskwvjlfjfjnjflkvlnfkjuskkvfjk k vnkwvwwvuwusvjkZu wwkjktfkstmvjkvnkwkwvwvskk fsskvfnlfkswkkwwvwnvwsk kktjfv */
Relevant Link:
https://github.com/mathiasbynens/bacon-cipher http://www.geocachingtoolbox.com/index.php?page=baconianCipher http://rumkin.com/tools/cipher/baconian.php https://en.wikipedia.org/wiki/Bacon%27s_cipher
5. LSB-Steganography
LSB和Beacon编码不一样,它是另一种隐写思路
1. Beacon编码和密文内容无关,它是利用密文的"presentation(外形)"来表示0 1这2种状态码,这里的外形可以是任何的形式,任何图形,甚至声音等物理因素 2. LSB的思路某种程度上来说正好相反,LSB的目的是隐藏自身的编码痕迹,利用人眼对颜色对比度的感知能力较弱,将目标明文的binary流分组写入图像像素的低位上(RGB)
LSB隐写解密的难点在于,即使不考虑 alpha 通道,随便勾选 RGB 某一通道的某一位,共有 3*8=24 种单项选择(复合选择暂未考虑)
Relevant Link:
https://github.com/RobinDavid/LSB-Steganography http://drops.wooyun.org/tips/4862
Copyright (c) 2016 LittleHann All rights reserved