在部署时将大量数据导入SQL Server(Express)数据库

时间:2022-11-08 01:13:29

For most database-backed projects I've worked on, there is a need to get "startup" or test data into the database before deploying the project. Examples of startup data: a table that lists all the countries in the world or a table that lists a bunch of colors that will be used to populate a color palette.

对于我曾经参与过的大多数数据库支持的项目,在部署项目之前需要将“启动”或测试数据导入数据库。启动数据的示例:列出世界上所有国家/地区的表格或列出将用于填充调色板的一组颜色的表格。

I've been using a system where I store all my startup data in an Excel spreadsheet (with one table per worksheet), then I have a utility script in SQL that (1) creates the database, (2) creates the schemas, (3) creates the tables (including primary and foreign keys), (4) connects to the spreadsheet as a linked server, and (5) inserts all the data into the tables.

我一直在使用一个系统,我将所有启动数据存储在Excel电子表格中(每个工作表有一个表),然后我在SQL中有一个实用程序脚本,(1)创建数据库,(2)创建模式,(2) 3)创建表(包括主键和外键),(4)作为链接服务器连接到电子表格,(5)将所有数据插入表中。

I mostly like this system. I find it very easy to lay out columns in Excel, verify foreign key relationships using simple lookup functions, perform concatenation operations, copy in data from web tables or other spreadsheets, etc. One major disadvantage of this system is the need to sync up the columns in my worksheets any time I change a table definition.

我大多喜欢这个系统。我发现在Excel中布置列,使用简单查找函数验证外键关系,执行串联操作,从Web表或其他电子表格中复制数据等非常容易。此系统的一个主要缺点是需要同步每次更改表定义时,我的工作表中的列。

I've been going through some tutorials to learn new .NET technologies or design patterns, and I've noticed that these typically involve using Visual Studio to create the database and add tables (rather than scripts), and the data is typically entered using the built-in designer. This has me wondering if maybe the way I'm doing it is not the most efficient or maintainable.

我一直在学习一些教程来学习新的.NET技术或设计模式,我注意到这些通常涉及使用Visual Studio创建数据库并添加表(而不是脚本),并且通常使用输入数据内置设计师。这让我想知道,如果我这样做的方式不是最有效或可维护的。

Questions

  1. In general, do you find it preferable to build your whole database via scripts or a GUI designer, such as SSMSE or Visual Studio?

    通常,您是否认为最好通过脚本或GUI设计器(如SSMSE或Visual Studio)构建整个数据库?

  2. What method do you recommend for populating your database with startup or test data and why?

    您建议使用启动或测试数据填充数据库的方法是什么?为什么?


Clarification

Judging by the answers so far, I think I should clarify something. Assume that I have a significant amount of data (hundreds or thousands of rows) that needs to find its way into the database. This data could be sourced from various places, such as text files, spreadsheets, web tables, etc. I've received several suggestions to script this process using INSERT statements, but is this really viable when you're talking about a lot of data?

从目前为止的答案来看,我认为我应该澄清一些事情。假设我有大量数据(数百或数千行)需要进入数据库。这些数据可以来自各个地方,例如文本文件,电子表格,网络表等。我收到了一些使用INSERT语句编写此过程脚本的建议,但是当你谈论大量数据时这是否真的可行?

Which leads me to...

这导致我......

New questions

  1. How would you write a SQL script to take the country data on this page and insert it into the database?

    您如何编写SQL脚本来获取此页面上的国家/地区数据并将其插入数据库?

    With Excel, I could just copy/paste the table into a worksheet and run my utility script, and I'd basically be done.

    使用Excel,我可以将表复制/粘贴到工作表中并运行我的实用程序脚本,我基本上就完成了。

  2. What if you later realized you needed a new column, CapitalCity?

    如果你后来意识到你需要一个新专栏,CapitalCity怎么办?

    With Excel, I could take that information from this page, paste it into Excel, and with a quick text-to-column manipulation, I'd have the data in the format I need.

    使用Excel,我可以从这个页面获取该信息,将其粘贴到Excel中,并通过快速的文本到列操作,我将获得我需要的格式的数据。

I honestly didn't write this question to defend Excel as the best way or even a good way to get data into a database, but the answers so far don't seem to be addressing my main concern--how to get all this data into your database. Writing a script with hundreds of INSERT statements by hand would be extremely time consuming and error prone. Somehow, this script needs to be machine generated, but how?

老实说,我没有写这个问题来保护Excel作为最佳方式,甚至是将数据输入数据库的好方法,但到目前为止答案似乎并未解决我的主要问题 - 如何获取所有这些数据进入你的数据库。手动编写包含数百个INSERT语句的脚本非常耗时且容易出错。不知何故,这个脚本需要机器生成,但如何?

5 个解决方案

#1


1  

I think your current process is fine for seeding the database with initial data. It's simple, easy to maintain, and works for you. If you've got a good database design with adequate constraints then it doesn't really matter how you seed the initial data. You could use an intermediate tool to generate scripts but why bother?

我认为您当前的流程适用于使用初始数据播种数据库。它简单,易于维护,适合您。如果你有一个具有足够约束的良好数据库设计,那么你如何为初始数据设定种子并不重要。您可以使用中间工具生成脚本,但为什么要这么麻烦?

SSIS has a steep learning curve, doesn't work well with source control (impossible to tell what changed between versions), and is very finicky about type conversions from Excel. There's also an issue with how many rows it reads ahead to determine the data type -- you're in deep trouble if your first x rows contain numbers stored as text.

SSIS有一个陡峭的学习曲线,在源代码控制方面效果不佳(无法分辨版本之间的变化),并且对Excel的类型转换非常挑剔。还有一个问题是它预先读取了多少行来确定数据类型 - 如果您的前x行包含存储为文本的数字,则会遇到很大问题。

#2


1  

1) I prefer to use scripts for several reasons.

1)我更喜欢使用脚本有几个原因。

• Scripts are easy to modify, and plus when I get ready to deploy my application to a production environment, I already have the scripts written so I'm all set.

•脚本易于修改,而且当我准备好将我的应用程序部署到生产环境时,我已经编写了脚本,所以我已经完成了设置。

• If I need to deploy my database to a different platform (like Oracle or MySQL) then it's easy to make minor modifications to the scripts to work on the target database.

•如果我需要将数据库部署到不同的平台(如Oracle或MySQL),则可以轻松地对脚本进行微小修改以在目标数据库上运行。

• With scripts, I'm not dependent on a tool like Visual Studio to build and maintain the database.

•使用脚本,我不依赖于像Visual Studio这样的工具来构建和维护数据库。

2) I like good old fashioned insert statements using a script. Again, at deployment time scripts are your best friend. At our shop, when we deploy our applications we have to have scripts ready for the DBA's to run, as that's what they expect.

2)我喜欢使用脚本的老式插入语句。同样,在部署时脚本是你最好的朋友。在我们的商店,当我们部署我们的应用程序时,我们必须准备好运行DBA的脚本,因为这是他们所期望的。

I just find that scripts are simple, easy to maintain, and the "least common denominator" when it comes to creating a database and loading up data to it. By least common denominator, I mean that the majority of people (i.e. DBA's, other people in your shop that might not have visual studio) will be able to use them without any trouble.

我只是发现脚本简单,易于维护,并且在创建数据库和向其加载数据时,“最小公分母”。至少是共同点,我的意思是大多数人(即DBA,你店里的其他人可能没有视觉工作室)将能够毫无困难地使用它们。

The other thing that's important with scripts is that it forces you to learn SQL and more specfically DDL (data definition language). While the hand-holding GUI tools are nice, there's no substitute for taking the time to learn SQL and DDL inside out. I've found that those skills are invaluable to have in almost any shop.

另一个对脚本很重要的是它会强迫你学习SQL和更具规律性的DDL(数据定义语言)。虽然手持GUI工具很好,但是没有什么可以替代花时间学习SQL和DDL。我发现这些技能对于几乎所有商店来说都是非常宝贵的。

#3


1  

Frankly, I find the concept of using Excel here a bit scary. It obviously works, but it's creating a dependency on an ad-hoc data source that won't be resolved until much later. Last thing you want is to be in a mad rush to deploy a database and find out that the Excel file is mangled, or worse, missing entirely. I suppose the severity of this would vary from company to company as a function of risk tolerance, but I would be actively seeking to remove Excel from the equation, or at least remove it as a permanent fixture.

坦率地说,我发现在这里使用Excel的概念有点可怕。它显然有效,但它创建了对特殊数据源的依赖,直到很久以后才能解决。你想要的最后一件事就是急于部署一个数据库并发现Excel文件被破坏,或者更糟,完全缺失。我认为这种风险的严重程度会因风险承受能力而在公司之间有所不同,但我会积极寻求从等式中删除Excel,或者至少将其作为永久固定装置删除。

I always use scripts to create databases, because scripts are portable and repeatable - you can use (almost) the same script to create a development database, a QA database, a UAT database, and a production database. For this reason it's equally important to use scripts to modify existing databases.

我总是使用脚本来创建数据库,因为脚本是可移植且可重复的 - 您可以使用(几乎)相同的脚本来创建开发数据库,​​QA数据库,UAT数据库和生产数据库。因此,使用脚本修改现有数据库同样重要。

I also always use a script to create bootstrap data (AKA startup data), and there's a very important reason for this: there's usually more scripting to be done afterward. Or at least there should be. Bootstrap data is almost invariably read-only, and as such, you should be placing it on a read-only filegroup to improve performance and prevent accidental changes. So you'll generally need to script the data first, then make the filegroup read-only.

我也总是使用脚本来创建引导数据(AKA启动数据),这是一个非常重要的原因:之后通常会有更多的脚本要做。或者至少应该有。 Bootstrap数据几乎总是只读的,因此,您应该将其放在只读文件组上以提高性能并防止意外更改。因此,您通常需要先编写数据脚本,然后将文件组设置为只读。

On a more philosophical level, though, if this startup data is required for the database to work properly - and most of the time, it is - then you really ought to consider it part of the data definition itself, the metadata. For that reason, I don't think it's appropriate to have the data defined anywhere but in the same script or set of scripts that you use to create the database itself.

但是,在更哲学的层面上,如果数据库需要这个启动数据才能正常工作 - 而且大部分时间都是如此 - 那么你真的应该把它视为数据定义本身的一部分,即元数据。出于这个原因,我不认为在任何地方定义数据是合适的,而是在用于创建数据库本身的相同脚本或脚本集中。

Test data is a little different, but in my experience you're usually trying to auto-generate that data in some fashion, which makes it even more important to use a script. You don't want to have to manually maintain an ad-hoc database of millions of rows for testing purposes.

测试数据略有不同,但根据我的经验,您通常会尝试以某种方式自动生成数据,这使得使用脚本变得更加重要。您不希望为测试目的手动维护数百万行的临时数据库。

If your problem is that the test or startup data comes from an external source - a web page, a CSV file, etc. - then I would handle this with an actual "configuration database." This way you don't have to validate references with VLOOKUPS as in Excel, you can actually enforce them.

如果您的问题是测试或启动数据来自外部源 - 网页,CSV文件等 - 那么我将使用实际的“配置数据库”处理此问题。这样您就不必像在Excel中那样使用VLOOKUPS验证引用,您实际上可以强制执行它们。

  • Use SQL Server Integration Services (formerly DTS) to pull your external data from CSV, Excel, or wherever, into your configuration database - if you need to periodically refresh the data, you can save the SSIS package so it ends up being just a couple of clicks.
  • 使用SQL Server Integration Services(以前称为DTS)将外部数据从CSV,Excel或其他任何位置提取到配置数据库中 - 如果需要定期刷新数据,可以保存SSIS包,使其最终只是一对点击次数

  • If you need to use Excel as an intermediary, i.e. to format or restructure some data from a web page, that's fine, but the important thing IMO is to get it out of Excel as soon as possible, and SSIS with a config database is an excellent repeatable method of doing that.
  • 如果你需要使用Excel作为中介,即从网页格式化或重构某些数据,那很好,但IMO最重要的是尽快将它从Excel中删除,并且带有配置数据库的SSIS是一个这样做的优秀可重复方法。

  • When you are ready to migrate the data from your configuration database into your application database, you can use SQL Server Management Studio to generate a script for the data (in case you don't already know - when you right click on the database, go to Tasks, Generate Scripts, and turn on "Script Data" in the Script Options). If you're really hardcore, you can actually script the scripting process, but I find that this usually takes less than a minute anyway.
  • 当您准备将数据从配置数据库迁移到应用程序数据库时,可以使用SQL Server Management Studio生成数据脚本(如果您还不知道 - 当您右键单击数据库时,请转到到任务,生成脚本,并在脚本选项中打开“脚本数据”)。如果你真的是硬核,你可以实际编写脚本编写过程的脚本,但我发现这通常需要不到一分钟。

It may sound like a lot of overhead, but in practice the effort is minimal. You set up your configuration database once, create an SSIS package once, and refresh the config data maybe once every few months or maybe never (this is the part you're already doing, and this part will become less work). Once that "setup" is out of the way, it's really just a few minutes to generate the script, which you can then use on all copies of the main database.

这可能听起来像很多开销,但在实践中,努力是最小的。您只需设置一次配置数据库,创建一次SSIS包,然后每隔几个月刷新一次配置数据,或者永远不要刷新配置数据(这是您已经在做的部分,这部分将减少工作量)。一旦“设置”完全消失,生成脚本的时间实际上只需几分钟,然后您可以在主数据库的所有副本上使用该脚本。

#4


0  

Since I use an object-relational mapper (Hibernate, there is also a .NET version), I prefer to generate such data in my programming language. The ORM then takes care of writing things into the database. I don't have to worry about changing column names in the data because I need to fix the mapping anyway. If refactoring is involved, it usually takes care of the startup/test data also.

由于我使用对象关系映射器(Hibernate,还有.NET版本),我更喜欢用我的编程语言生成这样的数据。然后ORM负责将数据写入数据库。我不必担心更改数据中的列名,因为我还是需要修复映射。如果涉及重构,它通常也会处理启动/测试数据。

#5


0  

Excel is an unnecessary component of this process.

Excel是此过程的不必要组件。

Script the current version the database components that you want to reuse, and add the script to your source control system. When you need to make changes in the future, either modify the entities in the database and regenerate the script, or modify the script and regenerate the database.

将当前版本的脚本编写为要重用的数据库组件,并将脚本添加到源控制系统。如果以后需要进行更改,请修改数据库中的实体并重新生成脚本,或修改脚本并重新生成数据库。

Avoid mixing Visual Studio's db designer and Excel as they only add complexity. Scripts and SQL Management Studio are your friends.

避免混合使用Visual Studio的数据库设计器和Excel,因为它们只会增加复杂性。脚本和SQL Management Studio是您的朋友。

#1


1  

I think your current process is fine for seeding the database with initial data. It's simple, easy to maintain, and works for you. If you've got a good database design with adequate constraints then it doesn't really matter how you seed the initial data. You could use an intermediate tool to generate scripts but why bother?

我认为您当前的流程适用于使用初始数据播种数据库。它简单,易于维护,适合您。如果你有一个具有足够约束的良好数据库设计,那么你如何为初始数据设定种子并不重要。您可以使用中间工具生成脚本,但为什么要这么麻烦?

SSIS has a steep learning curve, doesn't work well with source control (impossible to tell what changed between versions), and is very finicky about type conversions from Excel. There's also an issue with how many rows it reads ahead to determine the data type -- you're in deep trouble if your first x rows contain numbers stored as text.

SSIS有一个陡峭的学习曲线,在源代码控制方面效果不佳(无法分辨版本之间的变化),并且对Excel的类型转换非常挑剔。还有一个问题是它预先读取了多少行来确定数据类型 - 如果您的前x行包含存储为文本的数字,则会遇到很大问题。

#2


1  

1) I prefer to use scripts for several reasons.

1)我更喜欢使用脚本有几个原因。

• Scripts are easy to modify, and plus when I get ready to deploy my application to a production environment, I already have the scripts written so I'm all set.

•脚本易于修改,而且当我准备好将我的应用程序部署到生产环境时,我已经编写了脚本,所以我已经完成了设置。

• If I need to deploy my database to a different platform (like Oracle or MySQL) then it's easy to make minor modifications to the scripts to work on the target database.

•如果我需要将数据库部署到不同的平台(如Oracle或MySQL),则可以轻松地对脚本进行微小修改以在目标数据库上运行。

• With scripts, I'm not dependent on a tool like Visual Studio to build and maintain the database.

•使用脚本,我不依赖于像Visual Studio这样的工具来构建和维护数据库。

2) I like good old fashioned insert statements using a script. Again, at deployment time scripts are your best friend. At our shop, when we deploy our applications we have to have scripts ready for the DBA's to run, as that's what they expect.

2)我喜欢使用脚本的老式插入语句。同样,在部署时脚本是你最好的朋友。在我们的商店,当我们部署我们的应用程序时,我们必须准备好运行DBA的脚本,因为这是他们所期望的。

I just find that scripts are simple, easy to maintain, and the "least common denominator" when it comes to creating a database and loading up data to it. By least common denominator, I mean that the majority of people (i.e. DBA's, other people in your shop that might not have visual studio) will be able to use them without any trouble.

我只是发现脚本简单,易于维护,并且在创建数据库和向其加载数据时,“最小公分母”。至少是共同点,我的意思是大多数人(即DBA,你店里的其他人可能没有视觉工作室)将能够毫无困难地使用它们。

The other thing that's important with scripts is that it forces you to learn SQL and more specfically DDL (data definition language). While the hand-holding GUI tools are nice, there's no substitute for taking the time to learn SQL and DDL inside out. I've found that those skills are invaluable to have in almost any shop.

另一个对脚本很重要的是它会强迫你学习SQL和更具规律性的DDL(数据定义语言)。虽然手持GUI工具很好,但是没有什么可以替代花时间学习SQL和DDL。我发现这些技能对于几乎所有商店来说都是非常宝贵的。

#3


1  

Frankly, I find the concept of using Excel here a bit scary. It obviously works, but it's creating a dependency on an ad-hoc data source that won't be resolved until much later. Last thing you want is to be in a mad rush to deploy a database and find out that the Excel file is mangled, or worse, missing entirely. I suppose the severity of this would vary from company to company as a function of risk tolerance, but I would be actively seeking to remove Excel from the equation, or at least remove it as a permanent fixture.

坦率地说,我发现在这里使用Excel的概念有点可怕。它显然有效,但它创建了对特殊数据源的依赖,直到很久以后才能解决。你想要的最后一件事就是急于部署一个数据库并发现Excel文件被破坏,或者更糟,完全缺失。我认为这种风险的严重程度会因风险承受能力而在公司之间有所不同,但我会积极寻求从等式中删除Excel,或者至少将其作为永久固定装置删除。

I always use scripts to create databases, because scripts are portable and repeatable - you can use (almost) the same script to create a development database, a QA database, a UAT database, and a production database. For this reason it's equally important to use scripts to modify existing databases.

我总是使用脚本来创建数据库,因为脚本是可移植且可重复的 - 您可以使用(几乎)相同的脚本来创建开发数据库,​​QA数据库,UAT数据库和生产数据库。因此,使用脚本修改现有数据库同样重要。

I also always use a script to create bootstrap data (AKA startup data), and there's a very important reason for this: there's usually more scripting to be done afterward. Or at least there should be. Bootstrap data is almost invariably read-only, and as such, you should be placing it on a read-only filegroup to improve performance and prevent accidental changes. So you'll generally need to script the data first, then make the filegroup read-only.

我也总是使用脚本来创建引导数据(AKA启动数据),这是一个非常重要的原因:之后通常会有更多的脚本要做。或者至少应该有。 Bootstrap数据几乎总是只读的,因此,您应该将其放在只读文件组上以提高性能并防止意外更改。因此,您通常需要先编写数据脚本,然后将文件组设置为只读。

On a more philosophical level, though, if this startup data is required for the database to work properly - and most of the time, it is - then you really ought to consider it part of the data definition itself, the metadata. For that reason, I don't think it's appropriate to have the data defined anywhere but in the same script or set of scripts that you use to create the database itself.

但是,在更哲学的层面上,如果数据库需要这个启动数据才能正常工作 - 而且大部分时间都是如此 - 那么你真的应该把它视为数据定义本身的一部分,即元数据。出于这个原因,我不认为在任何地方定义数据是合适的,而是在用于创建数据库本身的相同脚本或脚本集中。

Test data is a little different, but in my experience you're usually trying to auto-generate that data in some fashion, which makes it even more important to use a script. You don't want to have to manually maintain an ad-hoc database of millions of rows for testing purposes.

测试数据略有不同,但根据我的经验,您通常会尝试以某种方式自动生成数据,这使得使用脚本变得更加重要。您不希望为测试目的手动维护数百万行的临时数据库。

If your problem is that the test or startup data comes from an external source - a web page, a CSV file, etc. - then I would handle this with an actual "configuration database." This way you don't have to validate references with VLOOKUPS as in Excel, you can actually enforce them.

如果您的问题是测试或启动数据来自外部源 - 网页,CSV文件等 - 那么我将使用实际的“配置数据库”处理此问题。这样您就不必像在Excel中那样使用VLOOKUPS验证引用,您实际上可以强制执行它们。

  • Use SQL Server Integration Services (formerly DTS) to pull your external data from CSV, Excel, or wherever, into your configuration database - if you need to periodically refresh the data, you can save the SSIS package so it ends up being just a couple of clicks.
  • 使用SQL Server Integration Services(以前称为DTS)将外部数据从CSV,Excel或其他任何位置提取到配置数据库中 - 如果需要定期刷新数据,可以保存SSIS包,使其最终只是一对点击次数

  • If you need to use Excel as an intermediary, i.e. to format or restructure some data from a web page, that's fine, but the important thing IMO is to get it out of Excel as soon as possible, and SSIS with a config database is an excellent repeatable method of doing that.
  • 如果你需要使用Excel作为中介,即从网页格式化或重构某些数据,那很好,但IMO最重要的是尽快将它从Excel中删除,并且带有配置数据库的SSIS是一个这样做的优秀可重复方法。

  • When you are ready to migrate the data from your configuration database into your application database, you can use SQL Server Management Studio to generate a script for the data (in case you don't already know - when you right click on the database, go to Tasks, Generate Scripts, and turn on "Script Data" in the Script Options). If you're really hardcore, you can actually script the scripting process, but I find that this usually takes less than a minute anyway.
  • 当您准备将数据从配置数据库迁移到应用程序数据库时,可以使用SQL Server Management Studio生成数据脚本(如果您还不知道 - 当您右键单击数据库时,请转到到任务,生成脚本,并在脚本选项中打开“脚本数据”)。如果你真的是硬核,你可以实际编写脚本编写过程的脚本,但我发现这通常需要不到一分钟。

It may sound like a lot of overhead, but in practice the effort is minimal. You set up your configuration database once, create an SSIS package once, and refresh the config data maybe once every few months or maybe never (this is the part you're already doing, and this part will become less work). Once that "setup" is out of the way, it's really just a few minutes to generate the script, which you can then use on all copies of the main database.

这可能听起来像很多开销,但在实践中,努力是最小的。您只需设置一次配置数据库,创建一次SSIS包,然后每隔几个月刷新一次配置数据,或者永远不要刷新配置数据(这是您已经在做的部分,这部分将减少工作量)。一旦“设置”完全消失,生成脚本的时间实际上只需几分钟,然后您可以在主数据库的所有副本上使用该脚本。

#4


0  

Since I use an object-relational mapper (Hibernate, there is also a .NET version), I prefer to generate such data in my programming language. The ORM then takes care of writing things into the database. I don't have to worry about changing column names in the data because I need to fix the mapping anyway. If refactoring is involved, it usually takes care of the startup/test data also.

由于我使用对象关系映射器(Hibernate,还有.NET版本),我更喜欢用我的编程语言生成这样的数据。然后ORM负责将数据写入数据库。我不必担心更改数据中的列名,因为我还是需要修复映射。如果涉及重构,它通常也会处理启动/测试数据。

#5


0  

Excel is an unnecessary component of this process.

Excel是此过程的不必要组件。

Script the current version the database components that you want to reuse, and add the script to your source control system. When you need to make changes in the future, either modify the entities in the database and regenerate the script, or modify the script and regenerate the database.

将当前版本的脚本编写为要重用的数据库组件,并将脚本添加到源控制系统。如果以后需要进行更改,请修改数据库中的实体并重新生成脚本,或修改脚本并重新生成数据库。

Avoid mixing Visual Studio's db designer and Excel as they only add complexity. Scripts and SQL Management Studio are your friends.

避免混合使用Visual Studio的数据库设计器和Excel,因为它们只会增加复杂性。脚本和SQL Management Studio是您的朋友。