I'd be grateful for any advice that anyone has regarding:
我很感激任何人的建议:
How you you effectively store a gps (or any floating point number) in a varchar field that can be indexed.
如何在varchar字段中有效地存储一个gps(或任何浮点数),该字段可以被编入索引。
Background:
背景:
We develop a content management system that can effectively store files of any type together with a collection of metadata. This file/metadata is stored as follows:
我们开发了一个内容管理系统,它可以有效地将任何类型的文件与元数据集合一起存储。该文件/元数据存储如下:
file_table metadata_table
---------- --------------
file_id -> file_id (number)
file_name metadata_id (number)
file_location metadata_value (varchar)
...etc
I've been asked to provide support for geo-tagging files (ie. storing gps coordinates as metadata). Additionally, we'd also like to support files that have multiple geo-tags.
我被要求提供对地理标签文件的支持。将gps坐标存储为元数据。此外,我们还希望支持具有多个地理标记的文件。
Now as far as I see I have a few options:
现在我有几个选择:
1) Store latitude and longitude within the same metadata_value varchar (eg. '52.4343242,-1.32324').
1)在相同的元adata_value varchar中存储纬度和经度(如。“52.4343242、-1.32324”)。
How would I query against this string? Is there anything clever I can do with sql that will allow me to query against "components" of a string? Could I store the coordinate as an xml string - would this help? How can this be effectively indexed?
如何查询这个字符串?我能用sql做什么聪明的事情来查询字符串的“组件”吗?我可以将坐标存储为xml字符串吗?如何有效地建立索引?
2) Store latitude and longitude as separate rows in the metadata_table.
2)将纬度和经度存储为metadata_table中的独立行。
This solution fixes the problem of supporting easier querying (at the expense of complexity and unwieldiness, especially when I'll be storing multiple geo-tags per file), however I'm still faced with the problem of indexing.
这个解决方案解决了支持更容易的查询的问题(以复杂性和笨拙为代价,特别是当我将为每个文件存储多个地理标记时),但是我仍然面临索引的问题。
I can convert the varchars to floating point when querying, however I'm not sure whether this will ignore the index I have on metadata_table.metadata_value and perform table-scans instead.
我可以在查询时将varchars转换为浮点数,但是我不确定这是否会忽略我在metadata_table上的索引。metadata_value并执行表扫描。
3) Create dedicated floating point fields to store gps data.
3)创建专用的浮点字段来存储gps数据。
This is the least desirable option since it goes against the grain of the design to add database fields for a specific metadata. Not all files will store gps data.
这是最不可取的选项,因为为特定的元数据添加数据库字段违背了设计的初衷。并不是所有的文件都会存储gps数据。
Any help or advise appreciated.
如有任何帮助或建议,我们将不胜感激。
7 个解决方案
#1
4
You can use Oracle locator. The free subset of Oracle Spatial to do all kind of different geographical manipulations and indexing of spatial data: http://www.oracle.com/technology/products/spatial/index.html
您可以使用Oracle locator。Oracle空间的*子集可以做各种不同的地理操作和索引空间数据:http://www.oracle.com/technology/products/spatial al/index.html。
With the use of column type mdsys.sdo_geometry you can store points, clouds of points, lines, polygons and 3D things in the database.
使用列式mdsys。sdo_geometry可以在数据库中存储点、点、线、多边形和3D内容的云。
#3
3
Although you've tagged this with Oracle, I figured this would be useful for anyone using MySQL: use the spatial extensions to store location data.
虽然您已经在Oracle上标记了这一点,但我认为这对于任何使用MySQL的人都是有用的:使用空间扩展来存储位置数据。
#4
2
Using dedicated floating point fields or columns of type mdsys.sdo_geometry are the best way to store this data. If a file doesn't have GPS data those fields will be empty but why should that be a problem? If a file could have more than one point associated use a detail table.
使用专用的浮点字段或mdsys类型的列。sdo_geometry是存储这些数据的最佳方式。如果一个文件没有GPS数据,那么这些字段将是空的,但是为什么会有问题呢?如果一个文件可以有多个点关联,请使用详细表。
Options 1 and 2 are a 'generic' solution. Generic database solutions are slow because they are more difficult to index and collecting statistics becomes harder, so life becomes more difficult for the query optimizer.
选项1和选项2是一个“通用”解决方案。通用的数据库解决方案比较慢,因为索引和收集统计信息比较困难,因此查询优化器的工作变得更加困难。
Also reporting for collecting management information with tools like Cognos (business intelligence) over a generic solution is harder for your users.
此外,对于您的用户来说,使用诸如Cognos(业务智能)之类的工具来收集管理信息比一般的解决方案更加困难。
Store dates in a date field, numbers in a number field and geographical information in a geographical field (mdsys.sdo_geometry).
在日期字段中存储日期、数字字段中的数字和地理字段中的地理信息(mdsy .sdo_geometry)。
Here it is explained why storing a date like '20031603' in a number field slows things down: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:77598210939534 .
这里解释了为什么在数字字段中存储“20031603”这样的日期会使事情变慢:http://asktom.oracle.com/pls/asktom/f?p = 100:11:0::::P11_QUESTION_ID:77598210939534。
#5
1
To option 1, I can say: Use Gps eXchange Format (GPX). It is the standard way to save GPS points. There are options to mark waypoints, tracks and point of interests.
Nontheless, it's not easy to query.
对于选项1,我可以说:使用Gps交换格式(GPX)。这是保存GPS点的标准方法。这里有标记路标、轨迹和兴趣点的选项。然而,查询起来并不容易。
#6
1
Generally, if I'm having a one-size-fits-all table (and I'm not arguing they're not useful), I tend to allow a range of datatypes for storage, and enfore the types. E.g.
通常,如果我有一个通用的表(我并不是说它们没有用),我倾向于允许一系列数据类型进行存储,并在类型之前。如。
CREATE TABLE MetaDataType (
MetaDataID int IDENTITY(1,1) not null,
MetaDataType varchar(10) not null,
constraint PK_MetaDataType PRIMARY KEY (MetaDataID),
constraint UQ_MetaDataType_TypeCheck UNIQUE (MetaDataID,MetaDataType),
constraint CK_MetaDataType CHECK (MetaDataType in ('INT','CHAR','FLOAT'))
)
And then the meta data table would look like:
然后元数据表会是:
CREATE TABLE MetaData (
FileID int not null,
MetaDataID int not null,
MetaDataType varchar(10) not null,
IntValue int null,
CharValue varchar(max) null,
FloatValue float null,
constraint PK_MetaData PRIMARY KEY (FileID,MetaDataID),
constraint FK_MetaData_Files FORIEGN KEY (FileID) references /* File table */,
constraint FK_MetaData_Types FOREIGN KEY (MetaDataID,MetaDataType) references MetaDataTypes (MetaDataID,MetaDataType),
constraint CK_MetaData_ValidTypes ((MetaDataType = 'INT' or IntValue is null) and (MetaDataType = 'CHAR' or CharValue is null) and (MetaDataType = 'FLOAT' or FloatValue is null))
)
The whole point being that 1) You store for each Meta data item the expected type, and 2) You enforce that in the MetaData table.
关键在于:1)为每个元数据项存储预期类型,2)在元数据表中执行该类型。
#7
1
EDIT: see comments for where this falls short.
编辑:如果有不足,请参阅评论。
To answer your base question, ignoring any of the reasoning behind it, you could used function-based indexes. If you go with your option #2, this should be straight-forward.
要回答基本问题,忽略后面的任何推理,可以使用基于函数的索引。如果你选择2号选项,这应该是直接的。
If you stick with #1, you'll just have to add some instr/substr voodoo; for example:
如果你坚持使用#1,你只需要添加一些instr/substr巫毒;例如:
select
to_number(
substr(
'52.4343242,-1.32324'
, 1
, instr( '52.4343242,-1.32324', ',' ) - 1
)
) as lattitude
, to_number(
substr(
'52.4343242,-1.32324'
, instr( '52.4343242,-1.32324', ',' ) + 1
)
) as longitude
from dual;
So you'd do something like:
所以你会这样做:
create index lat_long_idx on metadata_table (
to_number(
substr(
metadata_value
, 1
, instr( metadata_value, ',' ) - 1
)
)
, to_number(
substr(
metadata_value
, instr( metadata_value, ',' ) + 1
)
)
);
#1
4
You can use Oracle locator. The free subset of Oracle Spatial to do all kind of different geographical manipulations and indexing of spatial data: http://www.oracle.com/technology/products/spatial/index.html
您可以使用Oracle locator。Oracle空间的*子集可以做各种不同的地理操作和索引空间数据:http://www.oracle.com/technology/products/spatial al/index.html。
With the use of column type mdsys.sdo_geometry you can store points, clouds of points, lines, polygons and 3D things in the database.
使用列式mdsys。sdo_geometry可以在数据库中存储点、点、线、多边形和3D内容的云。
#2
#3
3
Although you've tagged this with Oracle, I figured this would be useful for anyone using MySQL: use the spatial extensions to store location data.
虽然您已经在Oracle上标记了这一点,但我认为这对于任何使用MySQL的人都是有用的:使用空间扩展来存储位置数据。
#4
2
Using dedicated floating point fields or columns of type mdsys.sdo_geometry are the best way to store this data. If a file doesn't have GPS data those fields will be empty but why should that be a problem? If a file could have more than one point associated use a detail table.
使用专用的浮点字段或mdsys类型的列。sdo_geometry是存储这些数据的最佳方式。如果一个文件没有GPS数据,那么这些字段将是空的,但是为什么会有问题呢?如果一个文件可以有多个点关联,请使用详细表。
Options 1 and 2 are a 'generic' solution. Generic database solutions are slow because they are more difficult to index and collecting statistics becomes harder, so life becomes more difficult for the query optimizer.
选项1和选项2是一个“通用”解决方案。通用的数据库解决方案比较慢,因为索引和收集统计信息比较困难,因此查询优化器的工作变得更加困难。
Also reporting for collecting management information with tools like Cognos (business intelligence) over a generic solution is harder for your users.
此外,对于您的用户来说,使用诸如Cognos(业务智能)之类的工具来收集管理信息比一般的解决方案更加困难。
Store dates in a date field, numbers in a number field and geographical information in a geographical field (mdsys.sdo_geometry).
在日期字段中存储日期、数字字段中的数字和地理字段中的地理信息(mdsy .sdo_geometry)。
Here it is explained why storing a date like '20031603' in a number field slows things down: http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:77598210939534 .
这里解释了为什么在数字字段中存储“20031603”这样的日期会使事情变慢:http://asktom.oracle.com/pls/asktom/f?p = 100:11:0::::P11_QUESTION_ID:77598210939534。
#5
1
To option 1, I can say: Use Gps eXchange Format (GPX). It is the standard way to save GPS points. There are options to mark waypoints, tracks and point of interests.
Nontheless, it's not easy to query.
对于选项1,我可以说:使用Gps交换格式(GPX)。这是保存GPS点的标准方法。这里有标记路标、轨迹和兴趣点的选项。然而,查询起来并不容易。
#6
1
Generally, if I'm having a one-size-fits-all table (and I'm not arguing they're not useful), I tend to allow a range of datatypes for storage, and enfore the types. E.g.
通常,如果我有一个通用的表(我并不是说它们没有用),我倾向于允许一系列数据类型进行存储,并在类型之前。如。
CREATE TABLE MetaDataType (
MetaDataID int IDENTITY(1,1) not null,
MetaDataType varchar(10) not null,
constraint PK_MetaDataType PRIMARY KEY (MetaDataID),
constraint UQ_MetaDataType_TypeCheck UNIQUE (MetaDataID,MetaDataType),
constraint CK_MetaDataType CHECK (MetaDataType in ('INT','CHAR','FLOAT'))
)
And then the meta data table would look like:
然后元数据表会是:
CREATE TABLE MetaData (
FileID int not null,
MetaDataID int not null,
MetaDataType varchar(10) not null,
IntValue int null,
CharValue varchar(max) null,
FloatValue float null,
constraint PK_MetaData PRIMARY KEY (FileID,MetaDataID),
constraint FK_MetaData_Files FORIEGN KEY (FileID) references /* File table */,
constraint FK_MetaData_Types FOREIGN KEY (MetaDataID,MetaDataType) references MetaDataTypes (MetaDataID,MetaDataType),
constraint CK_MetaData_ValidTypes ((MetaDataType = 'INT' or IntValue is null) and (MetaDataType = 'CHAR' or CharValue is null) and (MetaDataType = 'FLOAT' or FloatValue is null))
)
The whole point being that 1) You store for each Meta data item the expected type, and 2) You enforce that in the MetaData table.
关键在于:1)为每个元数据项存储预期类型,2)在元数据表中执行该类型。
#7
1
EDIT: see comments for where this falls short.
编辑:如果有不足,请参阅评论。
To answer your base question, ignoring any of the reasoning behind it, you could used function-based indexes. If you go with your option #2, this should be straight-forward.
要回答基本问题,忽略后面的任何推理,可以使用基于函数的索引。如果你选择2号选项,这应该是直接的。
If you stick with #1, you'll just have to add some instr/substr voodoo; for example:
如果你坚持使用#1,你只需要添加一些instr/substr巫毒;例如:
select
to_number(
substr(
'52.4343242,-1.32324'
, 1
, instr( '52.4343242,-1.32324', ',' ) - 1
)
) as lattitude
, to_number(
substr(
'52.4343242,-1.32324'
, instr( '52.4343242,-1.32324', ',' ) + 1
)
) as longitude
from dual;
So you'd do something like:
所以你会这样做:
create index lat_long_idx on metadata_table (
to_number(
substr(
metadata_value
, 1
, instr( metadata_value, ',' ) - 1
)
)
, to_number(
substr(
metadata_value
, instr( metadata_value, ',' ) + 1
)
)
);