I have a service which fetches the data for the following
我有一个服务,提取以下数据
Func(List<symbols>,List<fields>,StartDate,EndDate)
It will return a 3 Dimensional array of values.
它将返回一个3维数组值。
Sym1 field1 field2 field3
Sym2 field1 field2 field3
Date1
Date2
Date3
That is the x axis is fields, y axis is symbols and z axis is dates.
即x轴是场,y轴是符号,z轴是日期。
I also have a cache of some of the above values (which I fetched earlier in) in the format of dictionary
我还以字典的格式缓存了一些上述值(我之前提到的)
<Date<Symbol<field,value>>>
The service charges money based on each of the data points it will return. So if we have 3 symbol, 4 fields and 2 dates of data then we will get charged for 24 points.
该服务根据将返回的每个数据点收取费用。因此,如果我们有3个符号,4个字段和2个数据日期,那么我们将收取24个点的费用。
I need to break the original bigger requests into multiple smaller requests only for the data not found in cache.
我需要将原来较大的请求分解为多个较小的请求,仅用于缓存中未找到的数据。
Eg. If I have an original request for 5 symbols A,B,C,D,E and 4 fields F1,F2,F3,F4 for 3 dates D1,D2,D3.
例如。如果我有5个符号A,B,C,D,E和4个字段F1,F2,F3,F4的原始请求3个日期D1,D2,D3。
A,B,C,D,E F1,F2,F3,F4 D1,D2
Assuming in the cache I have data for the following fields already
假设在缓存中我已经拥有以下字段的数据
B,F2,D2
C,F4,D1
Then the subsequent requests which I will make to service if optimized and broken will be
然后,如果优化和破坏,我将进行的后续请求将是
Request1 A,B,C,D F1,F3 D1,D2
Request2 A,D F2,F4 D1,D2
Request3 B F2,F4 D1
Request4 B F4 D2
Request5 C F2,F4 D2
Request6 C F2 D1
Is there some standard way to divide the above into smaller requests/3D array. What is the best way in which I can achieve this? What type of data-structure will suit my needs?
是否有一些标准的方法将上述划分为较小的请求/ 3D阵列。我能做到这一点的最佳方式是什么?什么类型的数据结构将满足我的需求?
3 个解决方案
#1
3
Your problem can be visualized as follows:
您的问题可以看作如下:
You have a dense 3D grid where some cells (the gray ones) are already occupied by the cache. The task is to find the minimal number of cuboids to fill the free space. However, your case is a bit special in that the axes for symbols and fields are not ordered.
你有一个密集的3D网格,其中一些单元格(灰色的)已被缓存占用。任务是找到填充*空间的最小数量的长方体。但是,您的情况有点特殊,因为符号和字段的轴没有排序。
I have a feeling that there is no polynomial solution for this problem. So if you really need to find the optimal solution, chances are good that you need to search the entire solution space (using e.g. back tracking). Here is an idea of an approximative greedy algorithm:
我觉得这个问题没有多项式解决方案。因此,如果您确实需要找到最佳解决方案,那么您需要搜索整个解决方案空间(例如使用后向跟踪)。这是近似贪婪算法的概念:
for each iSymbol
for each iField
for each iDate
{
if(values[iSymbol, iField, iDate] != null) //already filled
continue;
set<int> symbols = {iSymbol}; //the symbols in the current cuboid
set<int> fields = {iField}; //the fields in the current cuboid
int maxDate = iDate; //the maximum date index
bool dateAxisFinished = false;
bool symbolAxisFinished = false;
bool fieldAxisFinished = false;
for(int i = 0; i < 3; ++i) //extend along all three axis
{
//check which axis allows the greatest extension
int extDate;
if(!dateAxisFinished)
extDate = checkExtensionDate(iDate, symbols, fields);
set<int> extSymbols;
if(!symbolAxisFinished)
extSymbols = checkExtensionSymbol(iDate, maxDate, iSymbol, fields);
set<int> extFields;
if(!fieldAxisFinished)
extFields = checkExtensionField(iDate, maxDate, symbols, iField);
}
if(!dateAxisFinished && extDate-iDate+1 >= extSymbols.size && extDate-iDate+1 >= extFields.size)
{
//fix this extension
maxDate = extDate;
dateAxisFinished = true;
}
else if(!symbolAxisFinished && extSymbols.size >= extFields.size)
{
symbols = extSymbols;
symbolAxisFinished = true:
}
else
{
fields = extFields;
fieldAxisFinished = true;
}
}
perform a query for symbols, fields from iDate to maxDate and put result into values
}
// -----------------------
//returns the maximum date index that can be included in the current cuboid
int checkExtensionDate(int dateFrom, set<int> symbols, set<int> fields)
{
for iDate from dateFrom + 1 to maxDate
for each iSymbol in symbols
for each iField in fields
if(values[iSymbol, iField, iDate] != null
return iDate - 1;
}
//returns the maximum set of symbols that can be included in the current cuboid
set<int> checkExtensionSymbol(int dateFrom, int dateTo, int startSymbol, set<int> fields)
{
set<int> result = { startSymbol };
for each iSymbol in allSymbols \ { iSymbol }
{
bool symbolOk = true;
for each iDate from dateFrom to dateTo
{
if(!symbolOk)
break;
for each iField in fields
{
if(!symbolOk)
break;
if(values[iSymbol, iField, iDate] != null
symbolOk = false;
}
}
if(symbolOk)
result.add(iSymbol);
}
return result;
}
//similar method for fields
This is just a basic idea and might need some improvements.
这只是一个基本想法,可能需要一些改进。
#2
2
Suggested approach:
-
Instantiate your result data structure before you call the data fetch API service
在调用数据获取API服务之前实例化结果数据结构
-
Populate the result structure using data from the cache for whatever is available.
使用缓存中的数据填充结果结构以获取可用的内容。
-
Call the external service/API for the values that are not populated (using the result structure).
调用外部服务/ API以获取未填充的值(使用结果结构)。
Voila you're done. For the #3 step, you can use Linq to figure out the empty slots that need to be filled.
瞧,你已经完成了。对于#3步骤,您可以使用Linq找出需要填充的空插槽。
#3
0
Preamble
It's quite hard to understand what exactly you want. So, sorry if I have understood something wrong.
很难理解你想要什么。所以,对不起,如果我理解错了。
Answer
For the answer I assume that (this is important):
对于答案,我认为(这很重要):
- Dates are discreet. So, if you query the service for 1 symbol, 1 field for 3 days (i.e. A, F1, 2014 May 01 — May 03) then you will get charged for 3 points.
-
You are not charged for requests. I.e. you will be charged the same for:
您无需收取任何费用。即您将被收取相同的费用:
A, F1, 2013 May 01 — May 03 (3 points)
and
A, F1, 2013 May 01 (1 point) A, F1, 2013 May 02 (1 point) A, F1, 2013 May 03 (1 point) (same 3 points charged totally)
日期是谨慎的。因此,如果您查询服务1个符号,1个字段3天(即A,F1,2014年5月1日 - 5月03日),那么您将收取3个点的费用。
Assuming this, the code will be straightforward and will minimize your bills :D
假设这一点,代码将是直截了当的,并将最大限度地减少您的账单:D
// Replace Field, Symbol and SomeType with actual types you use for fields, symbols and values.
SomeType? GetCachedData(Tuple<Field, Symbol, DateTime> point)
{
//your caching code here
}
void CacheData(Tuple<Field, Symbol, DateTime> point, SomeType value)
{
//your caching code here
}
SomeType GetDataFromService(Tuple<Field, Symbol, DateTime> point)
{
//your service requesting code here
}
Tuple<Field, Symbol, DateTime, SomeType>[] GetData(IEnumerable<Field> fields, IEnumerable<Symbol> symbols, IEnumerable<DateTime> dates)
{
var result = new List<Tuple<Field, Symbol, DateTime, SomeType>>();
foreach (var field in fields)
foreach (var symbol in symbols)
foreach (var date in dates)
{
var point = new Tuple<Field, Symbol, DateTime>(field, symbol, date);
var cachedValue = GetCachedData(point);
if (cachedValue.HasValue)
{
result.Add(new Tuple<Field, Symbol, DateTime, SomeType>(field, symbol, date, cachedValue.Value);
continue;
}
var serviceValue = GetDataFromService(point);
CacheData(point, serviceValue);
result.Add(new Tuple<Field, Symbol, DateTime, SomeType>(field, symbol, date, serviceValue);
}
return result.ToArray();
}
#1
3
Your problem can be visualized as follows:
您的问题可以看作如下:
You have a dense 3D grid where some cells (the gray ones) are already occupied by the cache. The task is to find the minimal number of cuboids to fill the free space. However, your case is a bit special in that the axes for symbols and fields are not ordered.
你有一个密集的3D网格,其中一些单元格(灰色的)已被缓存占用。任务是找到填充*空间的最小数量的长方体。但是,您的情况有点特殊,因为符号和字段的轴没有排序。
I have a feeling that there is no polynomial solution for this problem. So if you really need to find the optimal solution, chances are good that you need to search the entire solution space (using e.g. back tracking). Here is an idea of an approximative greedy algorithm:
我觉得这个问题没有多项式解决方案。因此,如果您确实需要找到最佳解决方案,那么您需要搜索整个解决方案空间(例如使用后向跟踪)。这是近似贪婪算法的概念:
for each iSymbol
for each iField
for each iDate
{
if(values[iSymbol, iField, iDate] != null) //already filled
continue;
set<int> symbols = {iSymbol}; //the symbols in the current cuboid
set<int> fields = {iField}; //the fields in the current cuboid
int maxDate = iDate; //the maximum date index
bool dateAxisFinished = false;
bool symbolAxisFinished = false;
bool fieldAxisFinished = false;
for(int i = 0; i < 3; ++i) //extend along all three axis
{
//check which axis allows the greatest extension
int extDate;
if(!dateAxisFinished)
extDate = checkExtensionDate(iDate, symbols, fields);
set<int> extSymbols;
if(!symbolAxisFinished)
extSymbols = checkExtensionSymbol(iDate, maxDate, iSymbol, fields);
set<int> extFields;
if(!fieldAxisFinished)
extFields = checkExtensionField(iDate, maxDate, symbols, iField);
}
if(!dateAxisFinished && extDate-iDate+1 >= extSymbols.size && extDate-iDate+1 >= extFields.size)
{
//fix this extension
maxDate = extDate;
dateAxisFinished = true;
}
else if(!symbolAxisFinished && extSymbols.size >= extFields.size)
{
symbols = extSymbols;
symbolAxisFinished = true:
}
else
{
fields = extFields;
fieldAxisFinished = true;
}
}
perform a query for symbols, fields from iDate to maxDate and put result into values
}
// -----------------------
//returns the maximum date index that can be included in the current cuboid
int checkExtensionDate(int dateFrom, set<int> symbols, set<int> fields)
{
for iDate from dateFrom + 1 to maxDate
for each iSymbol in symbols
for each iField in fields
if(values[iSymbol, iField, iDate] != null
return iDate - 1;
}
//returns the maximum set of symbols that can be included in the current cuboid
set<int> checkExtensionSymbol(int dateFrom, int dateTo, int startSymbol, set<int> fields)
{
set<int> result = { startSymbol };
for each iSymbol in allSymbols \ { iSymbol }
{
bool symbolOk = true;
for each iDate from dateFrom to dateTo
{
if(!symbolOk)
break;
for each iField in fields
{
if(!symbolOk)
break;
if(values[iSymbol, iField, iDate] != null
symbolOk = false;
}
}
if(symbolOk)
result.add(iSymbol);
}
return result;
}
//similar method for fields
This is just a basic idea and might need some improvements.
这只是一个基本想法,可能需要一些改进。
#2
2
Suggested approach:
-
Instantiate your result data structure before you call the data fetch API service
在调用数据获取API服务之前实例化结果数据结构
-
Populate the result structure using data from the cache for whatever is available.
使用缓存中的数据填充结果结构以获取可用的内容。
-
Call the external service/API for the values that are not populated (using the result structure).
调用外部服务/ API以获取未填充的值(使用结果结构)。
Voila you're done. For the #3 step, you can use Linq to figure out the empty slots that need to be filled.
瞧,你已经完成了。对于#3步骤,您可以使用Linq找出需要填充的空插槽。
#3
0
Preamble
It's quite hard to understand what exactly you want. So, sorry if I have understood something wrong.
很难理解你想要什么。所以,对不起,如果我理解错了。
Answer
For the answer I assume that (this is important):
对于答案,我认为(这很重要):
- Dates are discreet. So, if you query the service for 1 symbol, 1 field for 3 days (i.e. A, F1, 2014 May 01 — May 03) then you will get charged for 3 points.
-
You are not charged for requests. I.e. you will be charged the same for:
您无需收取任何费用。即您将被收取相同的费用:
A, F1, 2013 May 01 — May 03 (3 points)
and
A, F1, 2013 May 01 (1 point) A, F1, 2013 May 02 (1 point) A, F1, 2013 May 03 (1 point) (same 3 points charged totally)
日期是谨慎的。因此,如果您查询服务1个符号,1个字段3天(即A,F1,2014年5月1日 - 5月03日),那么您将收取3个点的费用。
Assuming this, the code will be straightforward and will minimize your bills :D
假设这一点,代码将是直截了当的,并将最大限度地减少您的账单:D
// Replace Field, Symbol and SomeType with actual types you use for fields, symbols and values.
SomeType? GetCachedData(Tuple<Field, Symbol, DateTime> point)
{
//your caching code here
}
void CacheData(Tuple<Field, Symbol, DateTime> point, SomeType value)
{
//your caching code here
}
SomeType GetDataFromService(Tuple<Field, Symbol, DateTime> point)
{
//your service requesting code here
}
Tuple<Field, Symbol, DateTime, SomeType>[] GetData(IEnumerable<Field> fields, IEnumerable<Symbol> symbols, IEnumerable<DateTime> dates)
{
var result = new List<Tuple<Field, Symbol, DateTime, SomeType>>();
foreach (var field in fields)
foreach (var symbol in symbols)
foreach (var date in dates)
{
var point = new Tuple<Field, Symbol, DateTime>(field, symbol, date);
var cachedValue = GetCachedData(point);
if (cachedValue.HasValue)
{
result.Add(new Tuple<Field, Symbol, DateTime, SomeType>(field, symbol, date, cachedValue.Value);
continue;
}
var serviceValue = GetDataFromService(point);
CacheData(point, serviceValue);
result.Add(new Tuple<Field, Symbol, DateTime, SomeType>(field, symbol, date, serviceValue);
}
return result.ToArray();
}