I'm being asked to add queryability to a database (Oracle) filled with mostly binary data. So I need to be able to query binary ranges within a blobs of a few kilobytes. I've never done this before, so I'm wondering what are some good practices and pitfalls to consider when starting a project like this.
我被要求将可查询性添加到主要填充二进制数据的数据库(Oracle)中。所以我需要能够在几千字节的blob内查询二进制范围。我之前从未这样做过,所以我想知道在开始这样的项目时需要考虑哪些好的做法和陷阱。
Thanks
3 个解决方案
#1
3
Add a column of MD5, which is the MD5 checksum of the BLOB data. Or, you can create a new table with the same primary key and the MD5 column.
添加一列MD5,它是BLOB数据的MD5校验和。或者,您可以使用相同的主键和MD5列创建新表。
Your cache module outside the database can make use of that column not to have to retrieve the BLOB column twice in a cache-hit.
数据库外部的缓存模块可以使用该列,而不必在缓存命中中检索BLOB列两次。
OR, you could drop the BLOB data in the database and store it in a file system with the MD5 value as a filename with an http server as a network file server.
或者,您可以将BLOB数据放入数据库并将其存储在文件系统中,其中MD5值作为文件名,http服务器作为网络文件服务器。
#2
2
Without knowing your exact requirements, I can make only some general comments.
在不知道您的确切要求的情况下,我只能提出一些一般性意见。
BLOBS in oracle are not the most speedy of types. Ensure that you don't build to many performance bottlenecks into your design, and do run performance tests on the functionality you do build as soon as possible to ensure it'll meet requirements.
oracle中的BLOBS并不是最快速的类型。确保您不会在设计中构建许多性能瓶颈,并尽快对您构建的功能进行性能测试,以确保它符合要求。
dbms_lob is your friend. In particular you may find the read and substr function (for reading parts of the blob) out useful.
dbms_lob是你的朋友。特别是你可能会发现read和substr函数(用于读取blob的部分)非常有用。
Stay away from external C style procedures - they're likely to be much to slow. PL/SQL functions tend to be much faster. I don't know about Java procedures. As the java engine is more integrated into Oracle they may be very good to use. It may be worth doing an initial proof of concept to compare PL/SQL against Java.
远离外部C风格的程序 - 它们可能会变慢。 PL / SQL函数往往要快得多。我不知道Java程序。随着Java引擎更集成到Oracle中,它们可能非常适合使用。将PL / SQL与Java进行比较可能值得做一个初步的概念验证。
With Java, you'll be able to read the data in as a byte [] stream and manipulate it to your hearts content using the world f Java. External procedures for Java are easy to do - you can even just give Oracle the Java source code.
使用Java,您将能够以byte []流的形式读取数据,并使用Java世界将其操作到您的内容。 Java的外部过程很容易 - 您甚至可以向Oracle提供Java源代码。
For PL/SQL, one technique we've found very useful is to convert the blob to a raw, cast that to a varchar, and then convert that to hex, then manipulating the hex (strings) with standard Oracle string functions. I.e:
对于PL / SQL,我们发现非常有用的一种技术是将blob转换为raw,将其转换为varchar,然后将其转换为hex,然后使用标准Oracle字符串函数操作hex(字符串)。即:
create or replace function retrieve_data_from_blob (
b blob
, tag_code
)
as
lw long raw;
data varchar(30000);
result varchar(100);
amount pls_integer := 30000;
begin
-- covert blob to long raw.
-- amount will hold total bytes read.
dbms_lob.read(b, amount, 1, lw);
data := util_raw.rawtohex(lw);
-- retrieve_embedded retrieves data tagged with tag_code
-- from internal binary structure by reading hex data
return retrieve_embedded(data, tag_code);
end;
/
This would work for blobs up to 15Kb in size. The retrieve_embedded function may, for example, be able to read the first "byte" by doing a substr(data, 1, 8)
, converting that to a decimal via to_number(hexdata, 'xxxxxxxx')
use it as an offset... etc.
这适用于最大15Kb的blob。例如,retrieve_embedded函数可以通过执行substr(数据,1,8)来读取第一个“字节”,通过to_number(hexdata,'xxxxxxxx')将其转换为十进制,将其用作偏移量。等等
#3
1
Storage parameters can make quite a significant difference when it comes to both storing and retrieving relatively small BLOBs (< DB_BLOCK_SIZE * 2 or so). In general, you want to minimize row migration and row chaining, as well as minimize wasted free space.
当存储和检索相对较小的BLOB(
Perhaps the greatest effect on the performance is enabling or disabling 'IN ROW' storage - it's definitely worth experimenting with.
也许对性能的最大影响是启用或禁用“IN ROW”存储 - 它绝对值得尝试。
#1
3
Add a column of MD5, which is the MD5 checksum of the BLOB data. Or, you can create a new table with the same primary key and the MD5 column.
添加一列MD5,它是BLOB数据的MD5校验和。或者,您可以使用相同的主键和MD5列创建新表。
Your cache module outside the database can make use of that column not to have to retrieve the BLOB column twice in a cache-hit.
数据库外部的缓存模块可以使用该列,而不必在缓存命中中检索BLOB列两次。
OR, you could drop the BLOB data in the database and store it in a file system with the MD5 value as a filename with an http server as a network file server.
或者,您可以将BLOB数据放入数据库并将其存储在文件系统中,其中MD5值作为文件名,http服务器作为网络文件服务器。
#2
2
Without knowing your exact requirements, I can make only some general comments.
在不知道您的确切要求的情况下,我只能提出一些一般性意见。
BLOBS in oracle are not the most speedy of types. Ensure that you don't build to many performance bottlenecks into your design, and do run performance tests on the functionality you do build as soon as possible to ensure it'll meet requirements.
oracle中的BLOBS并不是最快速的类型。确保您不会在设计中构建许多性能瓶颈,并尽快对您构建的功能进行性能测试,以确保它符合要求。
dbms_lob is your friend. In particular you may find the read and substr function (for reading parts of the blob) out useful.
dbms_lob是你的朋友。特别是你可能会发现read和substr函数(用于读取blob的部分)非常有用。
Stay away from external C style procedures - they're likely to be much to slow. PL/SQL functions tend to be much faster. I don't know about Java procedures. As the java engine is more integrated into Oracle they may be very good to use. It may be worth doing an initial proof of concept to compare PL/SQL against Java.
远离外部C风格的程序 - 它们可能会变慢。 PL / SQL函数往往要快得多。我不知道Java程序。随着Java引擎更集成到Oracle中,它们可能非常适合使用。将PL / SQL与Java进行比较可能值得做一个初步的概念验证。
With Java, you'll be able to read the data in as a byte [] stream and manipulate it to your hearts content using the world f Java. External procedures for Java are easy to do - you can even just give Oracle the Java source code.
使用Java,您将能够以byte []流的形式读取数据,并使用Java世界将其操作到您的内容。 Java的外部过程很容易 - 您甚至可以向Oracle提供Java源代码。
For PL/SQL, one technique we've found very useful is to convert the blob to a raw, cast that to a varchar, and then convert that to hex, then manipulating the hex (strings) with standard Oracle string functions. I.e:
对于PL / SQL,我们发现非常有用的一种技术是将blob转换为raw,将其转换为varchar,然后将其转换为hex,然后使用标准Oracle字符串函数操作hex(字符串)。即:
create or replace function retrieve_data_from_blob (
b blob
, tag_code
)
as
lw long raw;
data varchar(30000);
result varchar(100);
amount pls_integer := 30000;
begin
-- covert blob to long raw.
-- amount will hold total bytes read.
dbms_lob.read(b, amount, 1, lw);
data := util_raw.rawtohex(lw);
-- retrieve_embedded retrieves data tagged with tag_code
-- from internal binary structure by reading hex data
return retrieve_embedded(data, tag_code);
end;
/
This would work for blobs up to 15Kb in size. The retrieve_embedded function may, for example, be able to read the first "byte" by doing a substr(data, 1, 8)
, converting that to a decimal via to_number(hexdata, 'xxxxxxxx')
use it as an offset... etc.
这适用于最大15Kb的blob。例如,retrieve_embedded函数可以通过执行substr(数据,1,8)来读取第一个“字节”,通过to_number(hexdata,'xxxxxxxx')将其转换为十进制,将其用作偏移量。等等
#3
1
Storage parameters can make quite a significant difference when it comes to both storing and retrieving relatively small BLOBs (< DB_BLOCK_SIZE * 2 or so). In general, you want to minimize row migration and row chaining, as well as minimize wasted free space.
当存储和检索相对较小的BLOB(
Perhaps the greatest effect on the performance is enabling or disabling 'IN ROW' storage - it's definitely worth experimenting with.
也许对性能的最大影响是启用或禁用“IN ROW”存储 - 它绝对值得尝试。