Being very new to shell scripts, I have pieced together the following to search /dev/sdd1
, sector by sector, to find a string. How do I get the sector data into the $HAYSTACK
variable?
对于shell脚本来说,这是一个非常新的概念,我已经将以下内容拼凑在一起,以搜索/dev/sdd1,扇区按扇区查找一个字符串。如何将扇区数据放入$HAYSTACK变量中?
#!/bin/bash
HAYSTACK=""
START_SEARCH=$1
NEEDLE=$2
START_SECTOR=2048
END_SECTOR=226512895+1
SECTOR_NUMBER=$((START_SEARCH + START_SECTOR))
while [ $SECTOR_NUMBER -lt $END_SECTOR ]; do
$HAYSTACK=`dd if=/dev/sdd1 skip=$SECTOR_NUMBER count=1 bs=512`
if [[ "$HAYSTACK" =~ "$NEEDLE" ]]; then
echo "Match found at sector $SECTOR_NUMBER"
break
fi
let SECTOR_NUMBER=SECTOR_NUMBER+1
done
Update
更新
The intention is not to make a perfect script to handle fragmented file scenarios (I doubt that is possible at all).
其目的并不是要编写一个完美的脚本来处理片段文件场景(我怀疑这是完全可能的)。
In my case not being able to distinguish stings with nulls is also a non-issue.
在我的例子中,不能区分带有null的字符串也不是问题。
If you could expand the pipe suggestions into an answer it would be more than enough. Thanks!
如果您能够将管道建议扩展为一个答案,就足够了。谢谢!
Background
背景
I have managed to wipe my www folder and have been trying to recover as much of my source files as possible. I have used Scalpel to recover my php and html files. But the version I could get working on my Ubuntu 16.04 is Version 1.60 which does not support regex in header/footer so I cannot make a good pattern for css, js, and json files.
我已经设法删除我的www文件夹,并一直试图恢复尽可能多的源文件。我使用了Scalpel来恢复php和html文件。但是我在Ubuntu 16.04上使用的版本是1.60版本,它不支持页眉/页脚的regex,所以我不能为css、js和json文件创建一个好的模式。
I remember fairly rare strings to search for and find my files, but have no idea where in a block the string could be. The solution I came up with is this shell script to read blocks from the partition and look for the substring and if a match is found print out the LSB number and exit.
我记得相当罕见的字符串要搜索和找到我的文件,但不知道在一个块中的字符串可能在哪里。我想到的解决方案是使用这个shell脚本从分区读取块并查找子字符串,如果找到匹配,打印出LSB编号并退出。
2 个解决方案
#1
2
-
If the searched for item is a text string, consider using the
-t
option of thestrings
command to print the offset of where the string is found. Sincestrings
doesn't care where the data is from, it works on files, block devices, and piped input fromdd
.如果搜索项是一个文本字符串,请考虑使用string命令的-t选项来打印查找字符串的位置的偏移量。因为字符串不关心数据来自哪里,所以它可以处理文件、块设备和来自dd的管道输入。
Example from the start of a hard disk:
从硬盘开始的例子:
sudo strings -t d /dev/sda | head -5
Output:
输出:
165 ZRr= 286 `|f 295 \|f1 392 GRUB 398 Geom
Instead of
head
that could be piped togrep -m 1 GRUB
, which would output only the first line with "GRUB":而不是头可以管道到grep - m1 GRUB,它只输出第一行的“GRUB”:
sudo strings -t d /dev/sda | grep -m 1 GRUB
Output:
输出:
392 GRUB
From there,
bash
can do quite a lot. This code finds the first 5 instances of "GRUB" on my boot partition /dev/sda7:这段代码在我的引导分区/dev/sda7上找到了“GRUB”的前5个实例:
s=GRUB ; sudo strings -t d /dev/sda7 | grep "$s" | while read a b ; do n=${b%%${s}*} printf "String %-10.10s found %3i bytes into sector %i\n" \ "\"${b#${n}}\"" $(( (a % 512) + ${#n} )) $((a/512 + 1)) done | head -5
Output (the sector numbers here are relative to the start of the partition):
输出(这里的扇区编号相对于分区的开始):
String "GRUB Boot found 7 bytes into sector 17074 String "GRUB." found 548 bytes into sector 25702 String "GRUB." found 317 bytes into sector 25873 String "GRUBLAYO" found 269 bytes into sector 25972 String "GRUB" found 392 bytes into sector 26457
Things to watch out for:
注意事项:
-
Don't do
dd
-based single-block searches withstrings
as it would fail if the string spanned two blocks. Usestrings
to get the offset first, then convert that offset to blocks, (or sectors).不要使用字符串进行基于ddt的单块搜索,因为如果字符串跨越两个块将会失败。首先使用字符串获取偏移量,然后将该偏移量转换为块(或扇区)。
-
strings -t d
can return big strings, and the "needle" might be several bytes into a string, in which case the offset would be the start of the big string, rather than thegrep
string (or "needle"). The abovebash
code allows for that and uses the$n
to calculate a corrected offset.字符串-t d可以返回大字符串,“指针”可能是一个字符串中的几个字节,在这种情况下,偏移量将是大字符串的开始,而不是grep字符串(或“指针”)。上面的bash代码允许这样做,并使用$n来计算修正后的偏移量。
-
-
Lazy all-in-one util
rafind2
method. Example, search for the first instance of "GRUB" on /dev/sda7 as before:懒人合一方法。例如,像以前一样在/dev/sda7上搜索“GRUB”的第一个实例:
sudo rafind2 -Xs GRUB /dev/sda7 | head -7
Output:
输出:
0x856207 - offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF 0x00856207 4752 5542 2042 6f6f 7420 4d65 6e75 006e GRUB Boot Menu.n 0x00856217 6f20 666f 6e74 206c 6f61 6465 6400 6963 o font loaded.ic 0x00856227 6f6e 732f 0069 636f 6e64 6972 0025 733a ons/.icondir.%s: 0x00856237 2564 3a25 6420 6578 7072 6573 7369 6f6e %d:%d expression 0x00856247 2065 7870 6563 7465 6420 696e 2074 expected in t
With some
bash
andsed
that output can be reworked into the same format as thestrings
output:通过一些bash和sed,可以将输出重新转换为与字符串输出相同的格式:
s=GRUB ; sudo rafind2 -Xs "$s" /dev/sda7 | sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g" | sed -r -n 'h;n;n;s/.{52}//;H;n;n;n;n;g;s/\n//p' | while read a b ; do printf "String %-10.10s\" found %3i bytes into sector %i\n" \ "\"${b}" $((a%512)) $((a/512 + 1)) done | head -5
The first
sed
instance is borrowed from jfs' answer to "Program that passes STDIN to STDOUT with color codes stripped?", since therafind2
outputs non-text color codes.第一个sed实例是从jfs的答案中借来的,“程序将STDIN传递到STDOUT,并去掉颜色代码吗?”,因为rafind2输出非文本颜色代码。
Output:
输出:
String "GRUB Boot" found 7 bytes into sector 17074 String "GRUB....L" found 36 bytes into sector 25703 String "GRUB...LI" found 317 bytes into sector 25873 String "GRUBLAYO." found 269 bytes into sector 25972 String "GRUB .Geo" found 392 bytes into sector 26457
#2
1
Have you thought about some like this
你想过像这样的吗
cat /dev/sdd1 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ F l/'g > v1
cat /dev/sdd1 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/x F l/'g > v2
cmp -lb v1 v2
for example applying this to a .pdf file
例如,将它应用到.pdf文件中
od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ F l/'g > v1
od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ x l/'g > v2
cmp -l v1 v2
gives the output
给出了输出
228 106 F 170 x
23525 106 F 170 x
37737 106 F 170 x
48787 106 F 170 x
52577 106 F 170 x
56833 106 F 170 x
57869 106 F 170 x
118322 106 F 170 x
119342 106 F 170 x
where numbers in first column will be the byte offsets where the pattern being sought starts. These byte offsets are multiplied by four since od uses four bytes for every byte.
在第一个列中的数字将是字节偏移量,在这里开始寻找模式。这些字节偏移量乘以4,因为od对每个字节使用4个字节。
A single line form (in a bash shell), without writing large temporary files, would be
不编写大型临时文件的单行表单(在bash shell中)将是
od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ x l/'g | cmp -lb - <(od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ F l/'g )
this avoids needing to write the contents of /dev/sdd1 to temporary files somewhere.
这就避免了将/dev/sdd1的内容写入临时文件。
Here is an example looking for PDF on a USB drive device and dividing by 4 and 512 to get block numbers
这里有一个在USB驱动器设备上查找PDF的示例,并除以4和512以获得块号
dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | cmp -lb - <(dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/P D F/x D F/'g ) | awk '{print int($1/512/4)}' | head -10
testing this gives
测试这给
100000+0 records in
100000+0 records out
51200000 bytes transferred in 18.784280 secs (2725683 bytes/sec)
100000+0 records in
100000+0 records out
51200000 bytes transferred in 40.915697 secs (1251353 bytes/sec)
cmp: EOF on -
28913
32370
32425
33885
35097
35224
37177
38522
39981
41570
where numbers are 512 byte block numbers. Checking gives
其中数字为512字节的块号。检查了
dd if=/dev/disk5s1 bs=512 skip=35224 count=1 | od -vc | grep P
0000340 \0 \0 \0 001 P D F C A R O \0 \0 \0 \0
Here is what an actual full example looks like with a disk and looking for character sequence live and where characters are separated by NUL
下面是一个实际的完整示例,它使用磁盘查找字符序列,并在其中使用NUL分隔字符
dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/l \\0 i \\0 v \\0 e/x \\0 i \\0 v \\0 e/'g | cmp -lb - <(dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/l \\0 i \\0 v \\0 e/l \\0 i \\0 v \\0 e/'g )
Note
请注意
- this would not deal with fragmentation into non-consecutive blocks where that splits the pattern. The second sed, which does pattern and substitution, could be replaced by a custom program that does some partial pattern match and makes a substitution if number of matching characters is above some level. That might return false positives, but is probably the only way to deal with fragmentation.
- 这不会处理分割成非连续块的情况,在这种情况下分割模式。第二个sed执行模式和替换,它可以被一个定制程序替换,该程序执行一些部分模式匹配,并在匹配字符数高于某个级别时进行替换。这可能会返回假阳性,但可能是处理碎片化的唯一方法。
#1
2
-
If the searched for item is a text string, consider using the
-t
option of thestrings
command to print the offset of where the string is found. Sincestrings
doesn't care where the data is from, it works on files, block devices, and piped input fromdd
.如果搜索项是一个文本字符串,请考虑使用string命令的-t选项来打印查找字符串的位置的偏移量。因为字符串不关心数据来自哪里,所以它可以处理文件、块设备和来自dd的管道输入。
Example from the start of a hard disk:
从硬盘开始的例子:
sudo strings -t d /dev/sda | head -5
Output:
输出:
165 ZRr= 286 `|f 295 \|f1 392 GRUB 398 Geom
Instead of
head
that could be piped togrep -m 1 GRUB
, which would output only the first line with "GRUB":而不是头可以管道到grep - m1 GRUB,它只输出第一行的“GRUB”:
sudo strings -t d /dev/sda | grep -m 1 GRUB
Output:
输出:
392 GRUB
From there,
bash
can do quite a lot. This code finds the first 5 instances of "GRUB" on my boot partition /dev/sda7:这段代码在我的引导分区/dev/sda7上找到了“GRUB”的前5个实例:
s=GRUB ; sudo strings -t d /dev/sda7 | grep "$s" | while read a b ; do n=${b%%${s}*} printf "String %-10.10s found %3i bytes into sector %i\n" \ "\"${b#${n}}\"" $(( (a % 512) + ${#n} )) $((a/512 + 1)) done | head -5
Output (the sector numbers here are relative to the start of the partition):
输出(这里的扇区编号相对于分区的开始):
String "GRUB Boot found 7 bytes into sector 17074 String "GRUB." found 548 bytes into sector 25702 String "GRUB." found 317 bytes into sector 25873 String "GRUBLAYO" found 269 bytes into sector 25972 String "GRUB" found 392 bytes into sector 26457
Things to watch out for:
注意事项:
-
Don't do
dd
-based single-block searches withstrings
as it would fail if the string spanned two blocks. Usestrings
to get the offset first, then convert that offset to blocks, (or sectors).不要使用字符串进行基于ddt的单块搜索,因为如果字符串跨越两个块将会失败。首先使用字符串获取偏移量,然后将该偏移量转换为块(或扇区)。
-
strings -t d
can return big strings, and the "needle" might be several bytes into a string, in which case the offset would be the start of the big string, rather than thegrep
string (or "needle"). The abovebash
code allows for that and uses the$n
to calculate a corrected offset.字符串-t d可以返回大字符串,“指针”可能是一个字符串中的几个字节,在这种情况下,偏移量将是大字符串的开始,而不是grep字符串(或“指针”)。上面的bash代码允许这样做,并使用$n来计算修正后的偏移量。
-
-
Lazy all-in-one util
rafind2
method. Example, search for the first instance of "GRUB" on /dev/sda7 as before:懒人合一方法。例如,像以前一样在/dev/sda7上搜索“GRUB”的第一个实例:
sudo rafind2 -Xs GRUB /dev/sda7 | head -7
Output:
输出:
0x856207 - offset - 0 1 2 3 4 5 6 7 8 9 A B C D E F 0123456789ABCDEF 0x00856207 4752 5542 2042 6f6f 7420 4d65 6e75 006e GRUB Boot Menu.n 0x00856217 6f20 666f 6e74 206c 6f61 6465 6400 6963 o font loaded.ic 0x00856227 6f6e 732f 0069 636f 6e64 6972 0025 733a ons/.icondir.%s: 0x00856237 2564 3a25 6420 6578 7072 6573 7369 6f6e %d:%d expression 0x00856247 2065 7870 6563 7465 6420 696e 2074 expected in t
With some
bash
andsed
that output can be reworked into the same format as thestrings
output:通过一些bash和sed,可以将输出重新转换为与字符串输出相同的格式:
s=GRUB ; sudo rafind2 -Xs "$s" /dev/sda7 | sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[m|K]//g" | sed -r -n 'h;n;n;s/.{52}//;H;n;n;n;n;g;s/\n//p' | while read a b ; do printf "String %-10.10s\" found %3i bytes into sector %i\n" \ "\"${b}" $((a%512)) $((a/512 + 1)) done | head -5
The first
sed
instance is borrowed from jfs' answer to "Program that passes STDIN to STDOUT with color codes stripped?", since therafind2
outputs non-text color codes.第一个sed实例是从jfs的答案中借来的,“程序将STDIN传递到STDOUT,并去掉颜色代码吗?”,因为rafind2输出非文本颜色代码。
Output:
输出:
String "GRUB Boot" found 7 bytes into sector 17074 String "GRUB....L" found 36 bytes into sector 25703 String "GRUB...LI" found 317 bytes into sector 25873 String "GRUBLAYO." found 269 bytes into sector 25972 String "GRUB .Geo" found 392 bytes into sector 26457
#2
1
Have you thought about some like this
你想过像这样的吗
cat /dev/sdd1 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ F l/'g > v1
cat /dev/sdd1 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/x F l/'g > v2
cmp -lb v1 v2
for example applying this to a .pdf file
例如,将它应用到.pdf文件中
od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ F l/'g > v1
od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ x l/'g > v2
cmp -l v1 v2
gives the output
给出了输出
228 106 F 170 x
23525 106 F 170 x
37737 106 F 170 x
48787 106 F 170 x
52577 106 F 170 x
56833 106 F 170 x
57869 106 F 170 x
118322 106 F 170 x
119342 106 F 170 x
where numbers in first column will be the byte offsets where the pattern being sought starts. These byte offsets are multiplied by four since od uses four bytes for every byte.
在第一个列中的数字将是字节偏移量,在这里开始寻找模式。这些字节偏移量乘以4,因为od对每个字节使用4个字节。
A single line form (in a bash shell), without writing large temporary files, would be
不编写大型临时文件的单行表单(在bash shell中)将是
od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ x l/'g | cmp -lb - <(od -cv phase-2-guidance.pdf | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/ F l/ F l/'g )
this avoids needing to write the contents of /dev/sdd1 to temporary files somewhere.
这就避免了将/dev/sdd1的内容写入临时文件。
Here is an example looking for PDF on a USB drive device and dividing by 4 and 512 to get block numbers
这里有一个在USB驱动器设备上查找PDF的示例,并除以4和512以获得块号
dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | cmp -lb - <(dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/P D F/x D F/'g ) | awk '{print int($1/512/4)}' | head -10
testing this gives
测试这给
100000+0 records in
100000+0 records out
51200000 bytes transferred in 18.784280 secs (2725683 bytes/sec)
100000+0 records in
100000+0 records out
51200000 bytes transferred in 40.915697 secs (1251353 bytes/sec)
cmp: EOF on -
28913
32370
32425
33885
35097
35224
37177
38522
39981
41570
where numbers are 512 byte block numbers. Checking gives
其中数字为512字节的块号。检查了
dd if=/dev/disk5s1 bs=512 skip=35224 count=1 | od -vc | grep P
0000340 \0 \0 \0 001 P D F C A R O \0 \0 \0 \0
Here is what an actual full example looks like with a disk and looking for character sequence live and where characters are separated by NUL
下面是一个实际的完整示例,它使用磁盘查找字符序列,并在其中使用NUL分隔字符
dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/l \\0 i \\0 v \\0 e/x \\0 i \\0 v \\0 e/'g | cmp -lb - <(dd if=/dev/disk5s1 bs=512 count=100000 | od -cv | sed s'/[0-9]* \(.*\)/\1/' | tr -d '\n' | sed s'/l \\0 i \\0 v \\0 e/l \\0 i \\0 v \\0 e/'g )
Note
请注意
- this would not deal with fragmentation into non-consecutive blocks where that splits the pattern. The second sed, which does pattern and substitution, could be replaced by a custom program that does some partial pattern match and makes a substitution if number of matching characters is above some level. That might return false positives, but is probably the only way to deal with fragmentation.
- 这不会处理分割成非连续块的情况,在这种情况下分割模式。第二个sed执行模式和替换,它可以被一个定制程序替换,该程序执行一些部分模式匹配,并在匹配字符数高于某个级别时进行替换。这可能会返回假阳性,但可能是处理碎片化的唯一方法。