I have a structure in C which resembles that of a database table record. Now when I query the table using select, I do not know how many records I will get. I want to store all the returned records from the select query in a array of my structure data type.
我在C中有一个类似于数据库表记录的结构。现在,当我使用select查询表时,我不知道会得到多少条记录。我想将select查询中的所有返回记录存储在我的结构数据类型的数组中。
Which method is best?
哪种方法最好?
Method 1: find array size and allocate
- first get the count of records by doing select count(*) from table
- allocate a static array
- run select * from table and then store each records in my structure in a loop.
首先通过从表中选择count(*)来获取记录数
分配一个静态数组
从表中运行select *,然后在循环中将每个记录存储在我的结构中。
Method 2: use single linked list
while ( records returned )
{
create new node
store the record in node
}
Which implementation is best?
哪种实施最好?
My requirement is that when I have all the records, I will probably make copies of them or something. But I do not need random access and I will not be doing any search of a particular record.
我的要求是,当我拥有所有记录时,我可能会复制它们或其他东西。但我不需要随机访问,我不会对特定记录进行任何搜索。
Thanks
7 个解决方案
#1
And I forgot option #4. Allocate an array of fixed size. When that array is full, allocate another. You can keep track of the arrays by linking them in a linked list, or having a higher level array that keeps the pointers to the data arrays. This two-level scheme is great when you need random access, you just need to break your index into two parts.
我忘了#4选项。分配固定大小的数组。当该数组已满时,分配另一个。您可以通过在链接列表中链接它们来跟踪数组,或者使用更高级别的数组来保持指向数据数组的指针。当你需要随机访问时,这个两级方案很棒,你只需要将索引分成两部分。
#2
A problem with 'select count(*)' is that the value might change between calls, so your "real" select will have a number of items different from the count you'd expect.
'select count(*)'的一个问题是,值可能会在调用之间发生变化,因此您的“真实”选择将包含一些与您期望的计数不同的项目。
I think the best solution is your "2". Instead of a linked list, I would personally allocate an array (reallocating as necessary). This is easier in languages that support growing arrays (e.g. std::vector<myrecord>
in C++ and List<myrecord>
in C#).
我认为最好的解决方案是你的“2”。我会亲自分配一个数组(根据需要重新分配),而不是链表。这在支持增长数组的语言中更容易(例如,C ++中的std :: vector
#3
You forgot option 3, it's a little more complicated but it might be best for your particular case. This is the way it's typically done in C++ std::vector.
你忘记了选项3,它有点复杂,但它可能最适合你的特殊情况。这是它通常在C ++ std :: vector中完成的方式。
Allocate an array of any comfortable size. When that array is filled, allocate a new larger array of 1.5x to 2x the size of the filled one, then copy the filled array to this one. Free the original array and replace it with the new one. Lather, rinse, repeat.
分配任何舒适大小的阵列。填充该数组时,分配一个1.5x到2x大的新数组,然后将填充的数组复制到此数组。释放原始阵列并将其替换为新阵列。泡沫,冲洗,重复。
#4
There are a good many possible critiques that should be made.
应该提出许多可能的批评。
-
You are not talking about a static array at all - a static array would be of pre-determined size fixed at compile time, and either local to a source file or local to a function. You are talking about a dynamically allocated array.
你根本不是在讨论静态数组 - 静态数组在编译时是固定大小的,在源文件本地或函数本地。您正在谈论动态分配的数组。
-
You do not give any indication of record size or record count, nor of how dynamic the database underneath is (that is, could any other process change any of the data while yours is running). The sizing information isn't dreadfully critical, but the other factor is. If you're doing a report of some sort, then fetching the data into memory is fine; you aren't going to modify the database and the data is an accurate snapshot. However, if other people could be modifying the records while you are modifying records, your outline solution is a major example of how to lose other people's updates. That is a BAD thing!
您不会给出任何记录大小或记录数的指示,也不会给出下面的数据库的动态信息(也就是说,当您的数据运行时,任何其他进程都可以更改任何数据)。尺寸信息不是非常关键,但另一个因素是。如果您正在进行某种报告,那么将数据提取到内存中就可以了。您不打算修改数据库,数据是准确的快照。但是,如果在修改记录时其他人可能正在修改记录,则大纲解决方案是如何丢失其他人更新的主要示例。那是件坏事!
-
Why do you need all the data in memory at once? Ignoring size constraints, what exactly is the benefit of that compared with processing each relevant record once in the correct sequence? You see, DBMS put a lot of effort into being able to select the relevant records (WHERE clauses) and the relevant data (SELECT lists) and allow you to specify the sequence (ORDER BY clauses) and they have the best sort systems they can afford (better than the ones you or I are likely to produce).
为什么一次需要内存中的所有数据?忽略大小限制,与按正确顺序处理每个相关记录相比,它的好处究竟是什么?你看,DBMS花了很多精力来选择相关的记录(WHERE子句)和相关的数据(SELECT列表),并允许你指定序列(ORDER BY子句),他们有最好的排序系统,他们可以负担得起(比你或我可能产生的更好)。
-
Beware of quadratic behaviour if you allocate your array in chunks. Each time you reallocate, there's a decent chance the old memory will have to be copied to the new location. This will fragment your memory (the old location will be available for reuse, but by definition will be too small to reuse). Mark Ransom points out a reasonable alternative - not the world's simplest scheme overall (but it avoids the quadratic behaviour I referred to). Of course, you can (and would) abstract that away by a set of suitable functions.
如果以块的形式分配数组,请注意二次行为。每次重新分配时,都必须将旧内存复制到新位置。这会破坏您的内存(旧位置可供重用,但根据定义,它太小而无法重用)。 Mark Ransom指出了一个合理的选择 - 不是整个世界上最简单的方案(但它避免了我提到的二次方案)。当然,你可以(并且会)通过一组合适的函数来抽象它。
-
Bulk fetching (also mentioned by Mark Ransom) is also useful. You would want to preallocate the array into which a bulk fetch fetches so that you don't have to do extra copying. This is just linear behaviour though, so it is less serious.
批量提取(Mark Ransom也提到)也很有用。您可能希望预先分配批量提取所在的数组,这样您就不必进行额外的复制。这只是线性行为,所以它不那么严重。
#5
Create a data structure to represent your array or list. Pretend you're in an OO language and create accessors and constructors for everything you need. Inside that data structure, keep an array, and, as others have said, when the array is filled to capacity, allocate a new array 2x as large and copy into it. Access the structure only through your defined routines for accessing it.
创建一个数据结构来表示您的数组或列表。假装您使用的是OO语言,并为您需要的一切创建访问器和构造器。在该数据结构内部,保留一个数组,并且正如其他人所说,当数组填充到容量时,将新的数组2x分配为大并复制到其中。仅通过定义的例程访问该结构以进行访问。
This is the way Java and other languages do this. Internally, this is even how Perl is implemented in C.
这是Java和其他语言执行此操作的方式。在内部,这甚至是用C语言实现Perl的方式。
I was going to say your best option is to look for a library that already does this ... maybe you can borrow Perl's C implementation of this kind of data structure. I'm sure it's more well tested than anything you or I could roll up from scratch. :)
我打算说你最好的选择是找一个已经做到这一点的库...也许你可以借用这种数据结构的Perl C实现。我确信它比你或我从头开始卷起的任何东西都经过了更好的测试。 :)
#6
while(record = get_record()) {
records++;
records_array = (record_struct *) realloc(records_array, (sizeof record_struct)*records);
*records_array[records - 1] = record;
}
This is strictly an example — please don't use realloc() in production.
这是一个严格的例子 - 请不要在生产中使用realloc()。
#7
The linked list is a nice, simple option. I'd go with that. If you prefer the growing array, you can find an implementation as part of Dave Hanson's C Interfaces and Implementations, which as a bonus also provides linked lists.
链表是一个很好的简单选项。我会选择那个。如果您更喜欢不断增长的阵列,您可以在Dave Hanson的C接口和实现中找到一个实现,作为奖励还提供链接列表。
This looks to me like a design decision that is likely to change as your application evolves, so you should definitely hide the representation behind a suitable API. If you don't already know how to do this, Hanson's code will give you a number of nice examples.
这看起来像是一个可能随着应用程序的发展而变化的设计决策,因此您绝对应该将表示隐藏在合适的API背后。如果您还不知道如何做到这一点,Hanson的代码将为您提供一些很好的例子。
#1
And I forgot option #4. Allocate an array of fixed size. When that array is full, allocate another. You can keep track of the arrays by linking them in a linked list, or having a higher level array that keeps the pointers to the data arrays. This two-level scheme is great when you need random access, you just need to break your index into two parts.
我忘了#4选项。分配固定大小的数组。当该数组已满时,分配另一个。您可以通过在链接列表中链接它们来跟踪数组,或者使用更高级别的数组来保持指向数据数组的指针。当你需要随机访问时,这个两级方案很棒,你只需要将索引分成两部分。
#2
A problem with 'select count(*)' is that the value might change between calls, so your "real" select will have a number of items different from the count you'd expect.
'select count(*)'的一个问题是,值可能会在调用之间发生变化,因此您的“真实”选择将包含一些与您期望的计数不同的项目。
I think the best solution is your "2". Instead of a linked list, I would personally allocate an array (reallocating as necessary). This is easier in languages that support growing arrays (e.g. std::vector<myrecord>
in C++ and List<myrecord>
in C#).
我认为最好的解决方案是你的“2”。我会亲自分配一个数组(根据需要重新分配),而不是链表。这在支持增长数组的语言中更容易(例如,C ++中的std :: vector
#3
You forgot option 3, it's a little more complicated but it might be best for your particular case. This is the way it's typically done in C++ std::vector.
你忘记了选项3,它有点复杂,但它可能最适合你的特殊情况。这是它通常在C ++ std :: vector中完成的方式。
Allocate an array of any comfortable size. When that array is filled, allocate a new larger array of 1.5x to 2x the size of the filled one, then copy the filled array to this one. Free the original array and replace it with the new one. Lather, rinse, repeat.
分配任何舒适大小的阵列。填充该数组时,分配一个1.5x到2x大的新数组,然后将填充的数组复制到此数组。释放原始阵列并将其替换为新阵列。泡沫,冲洗,重复。
#4
There are a good many possible critiques that should be made.
应该提出许多可能的批评。
-
You are not talking about a static array at all - a static array would be of pre-determined size fixed at compile time, and either local to a source file or local to a function. You are talking about a dynamically allocated array.
你根本不是在讨论静态数组 - 静态数组在编译时是固定大小的,在源文件本地或函数本地。您正在谈论动态分配的数组。
-
You do not give any indication of record size or record count, nor of how dynamic the database underneath is (that is, could any other process change any of the data while yours is running). The sizing information isn't dreadfully critical, but the other factor is. If you're doing a report of some sort, then fetching the data into memory is fine; you aren't going to modify the database and the data is an accurate snapshot. However, if other people could be modifying the records while you are modifying records, your outline solution is a major example of how to lose other people's updates. That is a BAD thing!
您不会给出任何记录大小或记录数的指示,也不会给出下面的数据库的动态信息(也就是说,当您的数据运行时,任何其他进程都可以更改任何数据)。尺寸信息不是非常关键,但另一个因素是。如果您正在进行某种报告,那么将数据提取到内存中就可以了。您不打算修改数据库,数据是准确的快照。但是,如果在修改记录时其他人可能正在修改记录,则大纲解决方案是如何丢失其他人更新的主要示例。那是件坏事!
-
Why do you need all the data in memory at once? Ignoring size constraints, what exactly is the benefit of that compared with processing each relevant record once in the correct sequence? You see, DBMS put a lot of effort into being able to select the relevant records (WHERE clauses) and the relevant data (SELECT lists) and allow you to specify the sequence (ORDER BY clauses) and they have the best sort systems they can afford (better than the ones you or I are likely to produce).
为什么一次需要内存中的所有数据?忽略大小限制,与按正确顺序处理每个相关记录相比,它的好处究竟是什么?你看,DBMS花了很多精力来选择相关的记录(WHERE子句)和相关的数据(SELECT列表),并允许你指定序列(ORDER BY子句),他们有最好的排序系统,他们可以负担得起(比你或我可能产生的更好)。
-
Beware of quadratic behaviour if you allocate your array in chunks. Each time you reallocate, there's a decent chance the old memory will have to be copied to the new location. This will fragment your memory (the old location will be available for reuse, but by definition will be too small to reuse). Mark Ransom points out a reasonable alternative - not the world's simplest scheme overall (but it avoids the quadratic behaviour I referred to). Of course, you can (and would) abstract that away by a set of suitable functions.
如果以块的形式分配数组,请注意二次行为。每次重新分配时,都必须将旧内存复制到新位置。这会破坏您的内存(旧位置可供重用,但根据定义,它太小而无法重用)。 Mark Ransom指出了一个合理的选择 - 不是整个世界上最简单的方案(但它避免了我提到的二次方案)。当然,你可以(并且会)通过一组合适的函数来抽象它。
-
Bulk fetching (also mentioned by Mark Ransom) is also useful. You would want to preallocate the array into which a bulk fetch fetches so that you don't have to do extra copying. This is just linear behaviour though, so it is less serious.
批量提取(Mark Ransom也提到)也很有用。您可能希望预先分配批量提取所在的数组,这样您就不必进行额外的复制。这只是线性行为,所以它不那么严重。
#5
Create a data structure to represent your array or list. Pretend you're in an OO language and create accessors and constructors for everything you need. Inside that data structure, keep an array, and, as others have said, when the array is filled to capacity, allocate a new array 2x as large and copy into it. Access the structure only through your defined routines for accessing it.
创建一个数据结构来表示您的数组或列表。假装您使用的是OO语言,并为您需要的一切创建访问器和构造器。在该数据结构内部,保留一个数组,并且正如其他人所说,当数组填充到容量时,将新的数组2x分配为大并复制到其中。仅通过定义的例程访问该结构以进行访问。
This is the way Java and other languages do this. Internally, this is even how Perl is implemented in C.
这是Java和其他语言执行此操作的方式。在内部,这甚至是用C语言实现Perl的方式。
I was going to say your best option is to look for a library that already does this ... maybe you can borrow Perl's C implementation of this kind of data structure. I'm sure it's more well tested than anything you or I could roll up from scratch. :)
我打算说你最好的选择是找一个已经做到这一点的库...也许你可以借用这种数据结构的Perl C实现。我确信它比你或我从头开始卷起的任何东西都经过了更好的测试。 :)
#6
while(record = get_record()) {
records++;
records_array = (record_struct *) realloc(records_array, (sizeof record_struct)*records);
*records_array[records - 1] = record;
}
This is strictly an example — please don't use realloc() in production.
这是一个严格的例子 - 请不要在生产中使用realloc()。
#7
The linked list is a nice, simple option. I'd go with that. If you prefer the growing array, you can find an implementation as part of Dave Hanson's C Interfaces and Implementations, which as a bonus also provides linked lists.
链表是一个很好的简单选项。我会选择那个。如果您更喜欢不断增长的阵列,您可以在Dave Hanson的C接口和实现中找到一个实现,作为奖励还提供链接列表。
This looks to me like a design decision that is likely to change as your application evolves, so you should definitely hide the representation behind a suitable API. If you don't already know how to do this, Hanson's code will give you a number of nice examples.
这看起来像是一个可能随着应用程序的发展而变化的设计决策,因此您绝对应该将表示隐藏在合适的API背后。如果您还不知道如何做到这一点,Hanson的代码将为您提供一些很好的例子。