What i would like to do is scan a disc or a drive (usb, main hdd, etc) for files and store its info in a db. Then i would search the db to a particular file to find where it is stored. Alternatively i cans search how old copys are for archiving reasons or if i have dupes of something and dont need to rearchive it or look for a dupe in the case i back it up purposely several times and one of my disc was scratch or drive was corrupted.
我想要做的是扫描光盘或驱动器(usb,主硬盘等)的文件并将其信息存储在数据库中。然后我将搜索数据库到特定文件以查找它的存储位置。或者,我可以搜索多少旧复制用于归档的原因,或者如果我有一些东西,并且不需要重新归档它或寻找一个欺骗,我故意多次备份它,我的一个光盘是划痕或驱动器已损坏。
Here is what i am thinking
这就是我的想法
os + fs flag (1 byte?) st_mode (even if not in Linux) 2bytes win32_attr (even if not on windows) 4bytes (this covers hiddent, dir vs file, locked, etc) file size (64bits) a/m/c time, 64bits. index/unique key as fileID
os + fs标志(1字节?)st_mode(即使不在Linux中)2bytes win32_attr(即使不在Windows上)4bytes(这包括hiddent,dir vs file,locked等)文件大小(64bits)a / m / c时间,64位。索引/唯一键作为fileID
Should i have the name as a variable length inside its own table looked up by its matching fileID? or should i have a 260 length filename in the db or should i have a variable length filename in the db?
我应该将名称作为变量长度在其自己的表中由其匹配的fileID查找吗?或者我应该在数据库中有260长度的文件名,或者我应该在数据库中有一个可变长度的文件名?
Then i have blobs of XYZ bits required for my checksum (md5, sha1, sha512, etc, one blob for each) in a checksum/hash table looked up by fileID.
然后我在我的校验和(md5,sha1,sha512等,每个一个blob)所需的XYZ位blob中查找由fileID查询的校验和/哈希表。
I was thinking my hash table should have fileID (int which is same length as index?), hashType (int), hashValue(varchar).
我在想我的哈希表应该有fileID(int与索引的长度相同?),hashType(int),hashValue(varchar)。
1 个解决方案
#1
put the filename as a varchar in the file table, at least varchar[ 1024 ], windows has a limit on total path length in some OS combos, similar to ISO CD/DVDs.
将文件名作为varchar放在文件表中,至少是varchar [1024],windows对某些OS组合中的总路径长度有限制,类似于ISO CD / DVD。
put the hashes in a association table like:
将哈希值放在关联表中,如:
Hash
{
fileId int,
hash_type int, -- or enum
hash varchar[ 255 ], -- or largest hashtype
PK ( fileId, hash_type ),
index( fileID ),
}
so you can add new hash types later and allows you to not support all hash types, for all files.
因此,您可以稍后添加新的哈希类型,并允许您不支持所有文件的所有哈希类型。
#1
put the filename as a varchar in the file table, at least varchar[ 1024 ], windows has a limit on total path length in some OS combos, similar to ISO CD/DVDs.
将文件名作为varchar放在文件表中,至少是varchar [1024],windows对某些OS组合中的总路径长度有限制,类似于ISO CD / DVD。
put the hashes in a association table like:
将哈希值放在关联表中,如:
Hash
{
fileId int,
hash_type int, -- or enum
hash varchar[ 255 ], -- or largest hashtype
PK ( fileId, hash_type ),
index( fileID ),
}
so you can add new hash types later and allows you to not support all hash types, for all files.
因此,您可以稍后添加新的哈希类型,并允许您不支持所有文件的所有哈希类型。