在Delphi中频繁存储,搜索和修改大型数据集的最佳方法

时间:2022-06-27 16:28:35

What would be the best way, in delphi, to create and store data which will often be searched on and modified?

在delphi中,创建和存储经常被搜索和修改的数据的最佳方法是什么?

Basically, I would like to write a function that searches an existing database for telephone numbers and keeps track of how many times each telephone number has been used, the first date used, and the latest date used. The database that is being searched is basically a log of orders placed, containing the telephone number that was used to place the order. It's not an SQL database or anything that can easily be queried for such things (it's an old btrieve database), so I need to create a way of gaining this information (to eventually output to a text file).

基本上,我想编写一个功能,在现有数据库中搜索电话号码,并跟踪每个电话号码的使用次数,使用的第一个日期以及使用的最新日期。正在搜索的数据库基本上是所下订单的日志,其中包含用于下订单的电话号码。它不是一个SQL数据库或任何可以轻易查询的东西(它是一个旧的btrieve数据库),所以我需要创建一种获取此信息的方法(最终输出到文本文件)。

I am thinking of creating a record containing the phone number, the two dates, and the number of times used, and then adding a record to a dynamic array for each telephone number. I would then search the array, entry by entry, for each record in the database, to see if the phone number for the current record is already in the array. Then updating or creating a record as necessary.

我正在考虑创建一个包含电话号码,两个日期和使用次数的记录,然后为每个电话号码添加一个记录到动态数组中。然后,我将按数据条目为数据库中的每条记录搜索数组,以查看当前记录的电话号码是否已在数组中。然后根据需要更新或创建记录。

This seems like it would work, but as there are tens of thousands of entries in the database, it may not be the best way, and a rather slow and inefficient way of doing things. Is there a better way, given the limited actions I can perform on the database?

这似乎可行,但由于数据库中有数万个条目,它可能不是最好的方式,而且是一种相当缓慢且低效的处理方式。考虑到我可以对数据库执行的操作有限,是否有更好的方法?

Someone suggested that rather than using an array, use a MySQL table to keep track of the numbers, and then query each number for every database record. This seems like even more overhead though!

有人建议使用MySQL表来跟踪数字,而不是使用数组,然后查询每个数据库记录的每个数字。这看起来更像是开销!

Thanks a lot for your time.

非常感谢你的时间。

5 个解决方案

#1


I would register the aggregates in a totally disconnected TClientDataset(cds), and updating the values as you get them from the looping. If the Btrieve could be sorted by telephone number, much better. Then use the data on the cds to generate the report.

我会在一个完全断开连接的TClientDataset(cds)中注册聚合,并在循环中获取它们时更新这些值。如果Btrieve可以按电话号码排序,那就更好了。然后使用cds上的数据生成报告。

(If you go this way, I suggest get Midas SpeedFix from the Andreas Hausladen' blog, along with the other finest stuff you can find there).

(如果你这样走,我建议从Andreas Hausladen的博客那里获得Midas SpeedFix,以及你可以找到的其他最好的东西)。

#2


Ok, here is a double pass old-school method that works well and should scale well (I used this approach against a multi-million record database once, it took time but gave accurate results).

好的,这是一个双通旧式的方法,运行良好,应该可以很好地扩展(我曾经使用这种方法对一个数百万的记录数据库进行一次,需要时间,但给出了准确的结果)。

  1. Download and install Turbo Power SysTools -- the sort engine works very well for this process.
  2. 下载并安装Turbo Power SysTools - 排序引擎非常适合此过程。

  3. create a sort, with a fixed record of phone number, you will be using this to sort.
  4. 创建一个排序,具有固定的电话号码记录,您将使用它来排序。

  5. Loop thru your records, at each order, add the phone number to the sort.
  6. 通过您的记录循环,在每个订单,添加电话号码到排序。

  7. Once the first iteration is done, start popping the phone numbers from the sort, increment a counter if the phone number is the same as the last one read, otherwise report the number and clear your counter.
  8. 第一次迭代完成后,开始从排序中弹出电话号码,如果电话号码与上次读取的电话号码相同,则递增计数器,否则报告号码并清除计数器。

This process can also be done with any SQL Database, but my experience has been that the sort method is faster than managing a temporary table and generates the same results.

此过程也可以使用任何SQL数据库完成,但我的经验是,排序方法比管理临时表更快并生成相同的结果。

EDIT -- You stated that this is a BTrieve database, why not just create a key on the phone number, sort on that key, then apply step 4 over this table (next instead of pop). Either way you will need to touch every record in your database to get counts, the index/sort just makes your decision process easier.

编辑 - 你说这是一个BTrieve数据库,为什么不只是在电话号码上创建一个键,对该键进行排序,然后在该表上应用第4步(下一步而不是pop)。无论哪种方式,您都需要触摸数据库中的每条记录来获取计数,索引/排序只会使您的决策过程更容易。

For example, lets say that you have two tables, one the customer table is where the results will be stored, and the other the orders table. Sort both by the same phone number. Then start a cursor at the top of both lists and then apply the following psuedocode:

例如,假设您有两个表,一个是customer表,结果将存储在哪里,另一个表是orders表。按相同的电话号码排序。然后在两个列表的顶部启动一个游标,然后应用以下psuedocode:

Count := 0;
While (CustomerTable <> eof) and (OrderTable <> eof) do
  begin
    comp = comparetext( customer.phone, order.phone );
    while (comp = 0) and (not orderTable eof) do 
      begin
        inc( Count );
        order.next;
        comp = comparetext( customer.phone, order.phone );
      end;
    if comp < 0 then
      begin
        Customer.TotalCount = count;
        save customer;
        count := 0;
        Customer.next;
      end
    else if (Comp > 0) and (not OrderTable EOF) then
      begin
        Order.Next;  // order no customer
      end;  
   end;

// handle case where end of orders reached
if (OrdersTable EOF) and (not CustomersTable EOF) then
  begin
    Customer.TotalCount = count;
    save customer;
  end;

This code has the benefit of walking both lists once. There are no lookups necessary since both lists are sorted the same, they can be walked top to bottom taking action only when necessary. The only requirement is that both lists have something in common (in this example phone number) and both lists can be sorted.

此代码的好处是可以同时遍历两个列表。没有必要进行查找,因为两个列表的排序方式相同,只有在必要时才可以从上到下进行操作。唯一的要求是两个列表都有一些共同点(在本例中为电话号码),两个列表都可以进行排序。

I did not handle the case where there is an order and no customer. My assumption was that orders do not exist without customers and would be skipped for counting.

我没有处理有订单而没有客户的情况。我的假设是没有客户就不存在订单,并且会跳过计数。

#3


Sorry, couldn't edit my post (wasn't registered at the time). The data will be thrown away once all the records in the database have been iterated through. The function won't be called often. It's basically going to be used as a way of determining how often people have ordered over a period of time from records we already have, so really it's just needed to produce a one off list.

对不起,无法编辑我的帖子(当时没有注册)。一旦迭代完数据库中的所有记录,数据将被丢弃。该功能不会经常调用。它基本上将用作一种方式来确定人们在一段时间内从我们已有的记录中订购的频率,因此实际上只需要生成一个一次性列表。

The data will be persistent for the duration of the creation of the list. That is, all telephone numbers will need to be present to be searched on until the very last database record is read.

数据将在创建列表期间保持不变。也就是说,需要存在所有电话号码才能被搜索,直到读取最后一个数据库记录。

#4


If you were going to keep it in memory and don't want anything fancy, you'd be better off using a TStringList so you can use the Find function. Find uses Hoare's selection or Quick-select, an O(n) locator. For instance, define a type:

如果你要将它保存在内存中并且不想要任何花哨的东西,那么最好使用TStringList,这样你就可以使用Find函数了。查找使用Hoare的选择或快速选择,O(n)定位器。例如,定义一个类型:

type
   TPhoneData = class
      private
         fPhone:string;
         fFirstCalledDate:TDateTime;
         fLastCalledDate:TDateTime;
         fCallCount:integer;
      public
         constructor Create(phone:string; firstDate, lastDate:TDateTime);
         procedure updateCallData(date:TDateTime);
         property phoneNumber:string read fPhone write fPhone;
         property firstCalledDate:TDateTime read fFirstCalledDate write fFirstCalledDate;
         property lastCalledDate:TDateTime read fLastCalledDate write fLastCalledDate;
         property callCount:integer read fCallCount write fCallCount;
      end;

{ TPhoneData }

constructor TPhoneData.Create(phone: string; firstDate, lastDate: TDateTime);
begin
fCallCount:=1;
fFirstCalledDate:=firstDate;
fLastCalledDate:=lastDate;
fPhone:=phone;
end;

procedure TPhoneData.updateCallData(date: TDateTime);
begin
inc(fCallCount);
if fFirstCalledDate<date then fFirstCalledDate:=date;
if date>fLastCalledDate then fLastCalledDate:=date;
end;

and then fill it, report on it:

然后填写它,报告它:

procedure TForm1.btnSortExampleClick(Sender: TObject);
const phoneSeed:array[0..9] of string = ('111-111-1111','222-222-2222','333-333-3333','444-444-4444','555-555-5555','666-666-6666','777-777-7777','888-888-8888','999-999-9999','000-000-0000');

var TSL:TStringList;
    TPD:TPhoneData;
    i,index:integer;
    phone:string;
begin
randseed;
TSL:=TStringList.Create;
TSL.Sorted:=true;
for i := 0 to 100 do
   begin
   phone:=phoneSeed[random(9)];
   if TSL.Find(phone, index) then
      TPhoneData(TSL.Objects[index]).updateCallData(now-random(100))
   else
      TSL.AddObject(phone,TPhoneData.Create(phone,now,now));
   end;
for i := 0 to 9 do
   begin
   if TSL.Find(phoneSeed[i], index) then
      begin
      TPD:=TPhoneData(TSL.Objects[index]);
      ShowMessage(Format('Phone # %s, first called %s, last called %s, num calls %d', [TPD.PhoneNumber, FormatDateTime('mm-dd-yyyy',TPD.firstCalledDate), FormatDateTime('mm-dd-yyyy',TPD.lastCalledDate), TPD.callCount]));
      end;
   end;
end;

#5


Instead of a TStringList I would recommend using DeCAL's (on sf.net) DMap to store the items in memory. You could specify the phone is the key and store a Record/Class structure containing the rest of the record.

我建议使用DeCAL(在sf.net上)DMap将项目存储在内存中,而不是TStringList。您可以指定手机是密钥并存储包含记录其余部分的记录/类结构。

So your Record class will be:

所以你的Record类将是:


  TPhoneData = class
    number: string;
    access_count: integer;
    added: TDateTime.
     ...
  end;

Then in code:

然后在代码中:


  procedure TSomeClass.RegisterPhone(number, phoneData);
  begin
    //FStore created in Constructor as FStore := DMap.Create;
    FStore.putPair([number, phoneData])
  end;
  ...
  procedure TSoemClass.GetPhoneAndIncrement(number);
  var
    Iter: DIterator;
    lPhoneData: TPhoneData;
  begin
    Iter := FStore.locate([number]);
    if atEnd(Iter) then
      raise Exception.CreateFmt('Number %s not found',[number])
    else
    begin
      lPhoneData := GetObject(Iter) as TPhoneData;
      lPhoneData.access_count = lPhoneData.access_count + 1;
      //no need to save back to FStore as it holds a pointer to lPhoneData
    end;
  end;

DMap implements a red/black tree so the data structure sorts the keys for you for free. You can also use a DHashMap for the same affect and (arguably) increased speed.

DMap实现了一个红/黑树,因此数据结构可以免费为您排序。您也可以使用DHashMap获得相同的效果和(可以说)提高速度。

DeCAL is one of my favourite data structure libraries and would recommend anybody doing in-memory storage operations to have a look.

DeCAL是我最喜欢的数据结构库之一,它会推荐任何进行内存存储操作的人来查看。

Hope that helps

希望有所帮助

#1


I would register the aggregates in a totally disconnected TClientDataset(cds), and updating the values as you get them from the looping. If the Btrieve could be sorted by telephone number, much better. Then use the data on the cds to generate the report.

我会在一个完全断开连接的TClientDataset(cds)中注册聚合,并在循环中获取它们时更新这些值。如果Btrieve可以按电话号码排序,那就更好了。然后使用cds上的数据生成报告。

(If you go this way, I suggest get Midas SpeedFix from the Andreas Hausladen' blog, along with the other finest stuff you can find there).

(如果你这样走,我建议从Andreas Hausladen的博客那里获得Midas SpeedFix,以及你可以找到的其他最好的东西)。

#2


Ok, here is a double pass old-school method that works well and should scale well (I used this approach against a multi-million record database once, it took time but gave accurate results).

好的,这是一个双通旧式的方法,运行良好,应该可以很好地扩展(我曾经使用这种方法对一个数百万的记录数据库进行一次,需要时间,但给出了准确的结果)。

  1. Download and install Turbo Power SysTools -- the sort engine works very well for this process.
  2. 下载并安装Turbo Power SysTools - 排序引擎非常适合此过程。

  3. create a sort, with a fixed record of phone number, you will be using this to sort.
  4. 创建一个排序,具有固定的电话号码记录,您将使用它来排序。

  5. Loop thru your records, at each order, add the phone number to the sort.
  6. 通过您的记录循环,在每个订单,添加电话号码到排序。

  7. Once the first iteration is done, start popping the phone numbers from the sort, increment a counter if the phone number is the same as the last one read, otherwise report the number and clear your counter.
  8. 第一次迭代完成后,开始从排序中弹出电话号码,如果电话号码与上次读取的电话号码相同,则递增计数器,否则报告号码并清除计数器。

This process can also be done with any SQL Database, but my experience has been that the sort method is faster than managing a temporary table and generates the same results.

此过程也可以使用任何SQL数据库完成,但我的经验是,排序方法比管理临时表更快并生成相同的结果。

EDIT -- You stated that this is a BTrieve database, why not just create a key on the phone number, sort on that key, then apply step 4 over this table (next instead of pop). Either way you will need to touch every record in your database to get counts, the index/sort just makes your decision process easier.

编辑 - 你说这是一个BTrieve数据库,为什么不只是在电话号码上创建一个键,对该键进行排序,然后在该表上应用第4步(下一步而不是pop)。无论哪种方式,您都需要触摸数据库中的每条记录来获取计数,索引/排序只会使您的决策过程更容易。

For example, lets say that you have two tables, one the customer table is where the results will be stored, and the other the orders table. Sort both by the same phone number. Then start a cursor at the top of both lists and then apply the following psuedocode:

例如,假设您有两个表,一个是customer表,结果将存储在哪里,另一个表是orders表。按相同的电话号码排序。然后在两个列表的顶部启动一个游标,然后应用以下psuedocode:

Count := 0;
While (CustomerTable <> eof) and (OrderTable <> eof) do
  begin
    comp = comparetext( customer.phone, order.phone );
    while (comp = 0) and (not orderTable eof) do 
      begin
        inc( Count );
        order.next;
        comp = comparetext( customer.phone, order.phone );
      end;
    if comp < 0 then
      begin
        Customer.TotalCount = count;
        save customer;
        count := 0;
        Customer.next;
      end
    else if (Comp > 0) and (not OrderTable EOF) then
      begin
        Order.Next;  // order no customer
      end;  
   end;

// handle case where end of orders reached
if (OrdersTable EOF) and (not CustomersTable EOF) then
  begin
    Customer.TotalCount = count;
    save customer;
  end;

This code has the benefit of walking both lists once. There are no lookups necessary since both lists are sorted the same, they can be walked top to bottom taking action only when necessary. The only requirement is that both lists have something in common (in this example phone number) and both lists can be sorted.

此代码的好处是可以同时遍历两个列表。没有必要进行查找,因为两个列表的排序方式相同,只有在必要时才可以从上到下进行操作。唯一的要求是两个列表都有一些共同点(在本例中为电话号码),两个列表都可以进行排序。

I did not handle the case where there is an order and no customer. My assumption was that orders do not exist without customers and would be skipped for counting.

我没有处理有订单而没有客户的情况。我的假设是没有客户就不存在订单,并且会跳过计数。

#3


Sorry, couldn't edit my post (wasn't registered at the time). The data will be thrown away once all the records in the database have been iterated through. The function won't be called often. It's basically going to be used as a way of determining how often people have ordered over a period of time from records we already have, so really it's just needed to produce a one off list.

对不起,无法编辑我的帖子(当时没有注册)。一旦迭代完数据库中的所有记录,数据将被丢弃。该功能不会经常调用。它基本上将用作一种方式来确定人们在一段时间内从我们已有的记录中订购的频率,因此实际上只需要生成一个一次性列表。

The data will be persistent for the duration of the creation of the list. That is, all telephone numbers will need to be present to be searched on until the very last database record is read.

数据将在创建列表期间保持不变。也就是说,需要存在所有电话号码才能被搜索,直到读取最后一个数据库记录。

#4


If you were going to keep it in memory and don't want anything fancy, you'd be better off using a TStringList so you can use the Find function. Find uses Hoare's selection or Quick-select, an O(n) locator. For instance, define a type:

如果你要将它保存在内存中并且不想要任何花哨的东西,那么最好使用TStringList,这样你就可以使用Find函数了。查找使用Hoare的选择或快速选择,O(n)定位器。例如,定义一个类型:

type
   TPhoneData = class
      private
         fPhone:string;
         fFirstCalledDate:TDateTime;
         fLastCalledDate:TDateTime;
         fCallCount:integer;
      public
         constructor Create(phone:string; firstDate, lastDate:TDateTime);
         procedure updateCallData(date:TDateTime);
         property phoneNumber:string read fPhone write fPhone;
         property firstCalledDate:TDateTime read fFirstCalledDate write fFirstCalledDate;
         property lastCalledDate:TDateTime read fLastCalledDate write fLastCalledDate;
         property callCount:integer read fCallCount write fCallCount;
      end;

{ TPhoneData }

constructor TPhoneData.Create(phone: string; firstDate, lastDate: TDateTime);
begin
fCallCount:=1;
fFirstCalledDate:=firstDate;
fLastCalledDate:=lastDate;
fPhone:=phone;
end;

procedure TPhoneData.updateCallData(date: TDateTime);
begin
inc(fCallCount);
if fFirstCalledDate<date then fFirstCalledDate:=date;
if date>fLastCalledDate then fLastCalledDate:=date;
end;

and then fill it, report on it:

然后填写它,报告它:

procedure TForm1.btnSortExampleClick(Sender: TObject);
const phoneSeed:array[0..9] of string = ('111-111-1111','222-222-2222','333-333-3333','444-444-4444','555-555-5555','666-666-6666','777-777-7777','888-888-8888','999-999-9999','000-000-0000');

var TSL:TStringList;
    TPD:TPhoneData;
    i,index:integer;
    phone:string;
begin
randseed;
TSL:=TStringList.Create;
TSL.Sorted:=true;
for i := 0 to 100 do
   begin
   phone:=phoneSeed[random(9)];
   if TSL.Find(phone, index) then
      TPhoneData(TSL.Objects[index]).updateCallData(now-random(100))
   else
      TSL.AddObject(phone,TPhoneData.Create(phone,now,now));
   end;
for i := 0 to 9 do
   begin
   if TSL.Find(phoneSeed[i], index) then
      begin
      TPD:=TPhoneData(TSL.Objects[index]);
      ShowMessage(Format('Phone # %s, first called %s, last called %s, num calls %d', [TPD.PhoneNumber, FormatDateTime('mm-dd-yyyy',TPD.firstCalledDate), FormatDateTime('mm-dd-yyyy',TPD.lastCalledDate), TPD.callCount]));
      end;
   end;
end;

#5


Instead of a TStringList I would recommend using DeCAL's (on sf.net) DMap to store the items in memory. You could specify the phone is the key and store a Record/Class structure containing the rest of the record.

我建议使用DeCAL(在sf.net上)DMap将项目存储在内存中,而不是TStringList。您可以指定手机是密钥并存储包含记录其余部分的记录/类结构。

So your Record class will be:

所以你的Record类将是:


  TPhoneData = class
    number: string;
    access_count: integer;
    added: TDateTime.
     ...
  end;

Then in code:

然后在代码中:


  procedure TSomeClass.RegisterPhone(number, phoneData);
  begin
    //FStore created in Constructor as FStore := DMap.Create;
    FStore.putPair([number, phoneData])
  end;
  ...
  procedure TSoemClass.GetPhoneAndIncrement(number);
  var
    Iter: DIterator;
    lPhoneData: TPhoneData;
  begin
    Iter := FStore.locate([number]);
    if atEnd(Iter) then
      raise Exception.CreateFmt('Number %s not found',[number])
    else
    begin
      lPhoneData := GetObject(Iter) as TPhoneData;
      lPhoneData.access_count = lPhoneData.access_count + 1;
      //no need to save back to FStore as it holds a pointer to lPhoneData
    end;
  end;

DMap implements a red/black tree so the data structure sorts the keys for you for free. You can also use a DHashMap for the same affect and (arguably) increased speed.

DMap实现了一个红/黑树,因此数据结构可以免费为您排序。您也可以使用DHashMap获得相同的效果和(可以说)提高速度。

DeCAL is one of my favourite data structure libraries and would recommend anybody doing in-memory storage operations to have a look.

DeCAL是我最喜欢的数据结构库之一,它会推荐任何进行内存存储操作的人来查看。

Hope that helps

希望有所帮助