The reason I am asking this question is because we are planning to read A LOT (several GB's) of data from a SQL Server database to a .Net app for processing. I would like to know how much space overhead to calculate for each record for estimating the impact on our network traffic.
我问这个问题的原因是因为我们计划从SQL Server数据库读取A LOT(几GB)数据到.Net应用程序进行处理。我想知道为每个记录计算多少空间开销来估算对网络流量的影响。
E.g. a record consists of 5 integers (which makes 4 * 5 = 20 bytes of data). How many bytes are physically transferred per record? Is there a precise formula or a rule of thumb?
例如。记录由5个整数组成(这使得4 * 5 = 20个字节的数据)。每条记录物理传输多少字节?有精确的公式还是经验法则?
5 个解决方案
#1
10
SQL Server uses the TDS protocol. And MSDN
SQL Server使用TDS协议。和MSDN
Frankly, I wouldn't worry about it. GBs of data will take time no matter how it's done unfortunately.
坦率地说,我不担心。无论如何,GB的数据都需要时间。
#2
4
I don't have a clue about the actual format, but I would suggest an empirical approach and hook up Wireshark and measure the data yourself.
我对实际格式没有任何线索,但我建议采用经验方法并连接Wireshark并自己测量数据。
#3
4
As Peter M said, test it.
正如Peter M所说,测试一下。
There isn't a real enough calculation you can perform which will give you enough information to work off of.
没有足够的计算可以执行,这将为您提供足够的信息来解决问题。
The reality is that there are too many variables to consider. For example:
现实情况是,要考虑的变量太多。例如:
What actual rate do the NIC's involved transfer at? Note that this rate will be different based on which network cards are in place as well as which DRIVERS those cards are using. You could quite easily have a 1Gb card which can only transfer at, say, 300Mb due to driver issues. I've even seen two cards by the exact same manufacturer with the same drivers have different transfer speeds due to a slight configuration difference in one of the cards.
NIC涉及的实际转移率是多少?请注意,此速率将根据所使用的网卡以及这些卡正在使用的驱动程序而有所不同。您可以很容易地拥有1Gb卡,由于驱动程序问题,它只能在300Mb传输。我甚至看到完全相同的制造商使用相同的驱动器的两张卡具有不同的传输速度,因为其中一张卡的配置差异很小。
What other pieces of equipment are between the two machines in question? Again, depending on the hardware, os's etc, you may see wildly different numbers. A $100 8 port 1Gb unmanaged switch from TRENDNet is going to have completely different throughput than a $5000 1Gb cisco managed switch.
这两台机器之间还有哪些其他设备?同样,根据硬件,操作系统等,您可能会看到截然不同的数字。来自TRENDNet的价值100美元的8端口1Gb非管理型交换机将具有与5000美元的1Gb cisco管理型交换机完全不同的吞吐量。
You will also have to consider the existing network "weather" at the time of the transfer What is the throughput from OTHER network traffic over the same lines that this will share. This will be a transient factor as the existing network load changes as different demands are placed on it.
您还必须考虑传输时现有的网络“天气”。与其共享的相同线路上的其他网络流量的吞吐量是多少。这将是一个瞬态因素,因为现有网络负载随着对它的不同需求而变化。
Additionally, some nic's support TCP Offloading, others don't. If your nic's don't then effective transfer rate is going to be hampered by whatever else the CPUs on those boxes are doing.
另外,一些nic支持TCP卸载,有些则不支持。如果你的黑客不这样做,那么有效的传输速率将受到这些盒子上的CPU正在做的其他任何事情的阻碍。
Next, hard drives have to be taken into consideration. Considering this is a large amount of data, then the read and write speeds of the various hard drives will have an impact. Sure the network might actually be able to run at 90% efficiency, but if you are talking large amounts of data, the hard drives themselves might not be able to keep up and therefore cause that to drop down to 25% efficiency, or less.
接下来,必须考虑硬盘驱动器。考虑到这是大量数据,那么各种硬盘的读写速度将产生影响。当然,网络实际上可以以90%的效率运行,但如果您正在谈论大量数据,那么硬盘驱动器本身可能无法跟上,从而导致其降低到25%或更低的效率。
Point is, you have to test it and at the end of the day, the protocol that SQL server uses will be immaterial to your findings. And don't run just one test, run a lot of real world tests. Only then will you be able to come up with an average; which might still be off depending on whatever else is going on at the time, but you should be able to get within say 10%.
重点是,你必须测试它,在一天结束时,SQL服务器使用的协议对你的发现并不重要。并且不要只运行一个测试,运行大量的实际测试。只有这样你才能想出一个平均水平;根据当时正在发生的其他情况,这可能仍然是关闭的,但你应该能够说到10%。
#4
0
From my observations, standard SQL commands cause a lot of round-trips. So for transferring lots of data it helps if you can restate it as uploading one table. Then you can use bulk copy operation, which is much more efficient. See: Bulk Copy Operations in SQL Server (ADO.NET) and bcp Utility.
根据我的观察,标准SQL命令会导致很多往返。因此,对于传输大量数据,如果您可以将其重新添加为上传一个表,则会有所帮助。然后,您可以使用批量复制操作,这样更有效。请参阅:SQL Server中的批量复制操作(ADO.NET)和bcp实用程序。
#5
0
Actually TDS protocol is an extreamly slow protocol. SQL Server is optimized for processign data, not for marshaling back and forth tonnes of data. While the representation overhead is not large, the fact that is a request-response protocol and the lack of boxcaring makes it quite slow compared to dedicated high throughput protocol, even inside SQL Server (like the Database Mirroring or the Service Broker protocols). But even so, with TDS being as slow as it is, a SQL Server shooting at full speed through a TDS pipe will overwhealm your .Net client, guaranteed.
实际上,TDS协议是一种极其缓慢的协议。 SQL Server针对processign数据进行了优化,而不是为来回调整大量数据。虽然表示开销并不大,但是与专用高吞吐量协议相比,即使是在SQL Server(如数据库镜像或Service Broker协议)中,请求 - 响应协议和缺少盒装的事实使得它非常慢。但即便如此,由于TDS速度很慢,因此通过TDS管道全速拍摄的SQL Server将保证您的.Net客户端过载。
Overall, if you ever come to ask a question like the one you asked, it means you're doing it wrong.
总的来说,如果你曾经问过一个类似问题的问题,那就意味着你做错了。
#1
10
SQL Server uses the TDS protocol. And MSDN
SQL Server使用TDS协议。和MSDN
Frankly, I wouldn't worry about it. GBs of data will take time no matter how it's done unfortunately.
坦率地说,我不担心。无论如何,GB的数据都需要时间。
#2
4
I don't have a clue about the actual format, but I would suggest an empirical approach and hook up Wireshark and measure the data yourself.
我对实际格式没有任何线索,但我建议采用经验方法并连接Wireshark并自己测量数据。
#3
4
As Peter M said, test it.
正如Peter M所说,测试一下。
There isn't a real enough calculation you can perform which will give you enough information to work off of.
没有足够的计算可以执行,这将为您提供足够的信息来解决问题。
The reality is that there are too many variables to consider. For example:
现实情况是,要考虑的变量太多。例如:
What actual rate do the NIC's involved transfer at? Note that this rate will be different based on which network cards are in place as well as which DRIVERS those cards are using. You could quite easily have a 1Gb card which can only transfer at, say, 300Mb due to driver issues. I've even seen two cards by the exact same manufacturer with the same drivers have different transfer speeds due to a slight configuration difference in one of the cards.
NIC涉及的实际转移率是多少?请注意,此速率将根据所使用的网卡以及这些卡正在使用的驱动程序而有所不同。您可以很容易地拥有1Gb卡,由于驱动程序问题,它只能在300Mb传输。我甚至看到完全相同的制造商使用相同的驱动器的两张卡具有不同的传输速度,因为其中一张卡的配置差异很小。
What other pieces of equipment are between the two machines in question? Again, depending on the hardware, os's etc, you may see wildly different numbers. A $100 8 port 1Gb unmanaged switch from TRENDNet is going to have completely different throughput than a $5000 1Gb cisco managed switch.
这两台机器之间还有哪些其他设备?同样,根据硬件,操作系统等,您可能会看到截然不同的数字。来自TRENDNet的价值100美元的8端口1Gb非管理型交换机将具有与5000美元的1Gb cisco管理型交换机完全不同的吞吐量。
You will also have to consider the existing network "weather" at the time of the transfer What is the throughput from OTHER network traffic over the same lines that this will share. This will be a transient factor as the existing network load changes as different demands are placed on it.
您还必须考虑传输时现有的网络“天气”。与其共享的相同线路上的其他网络流量的吞吐量是多少。这将是一个瞬态因素,因为现有网络负载随着对它的不同需求而变化。
Additionally, some nic's support TCP Offloading, others don't. If your nic's don't then effective transfer rate is going to be hampered by whatever else the CPUs on those boxes are doing.
另外,一些nic支持TCP卸载,有些则不支持。如果你的黑客不这样做,那么有效的传输速率将受到这些盒子上的CPU正在做的其他任何事情的阻碍。
Next, hard drives have to be taken into consideration. Considering this is a large amount of data, then the read and write speeds of the various hard drives will have an impact. Sure the network might actually be able to run at 90% efficiency, but if you are talking large amounts of data, the hard drives themselves might not be able to keep up and therefore cause that to drop down to 25% efficiency, or less.
接下来,必须考虑硬盘驱动器。考虑到这是大量数据,那么各种硬盘的读写速度将产生影响。当然,网络实际上可以以90%的效率运行,但如果您正在谈论大量数据,那么硬盘驱动器本身可能无法跟上,从而导致其降低到25%或更低的效率。
Point is, you have to test it and at the end of the day, the protocol that SQL server uses will be immaterial to your findings. And don't run just one test, run a lot of real world tests. Only then will you be able to come up with an average; which might still be off depending on whatever else is going on at the time, but you should be able to get within say 10%.
重点是,你必须测试它,在一天结束时,SQL服务器使用的协议对你的发现并不重要。并且不要只运行一个测试,运行大量的实际测试。只有这样你才能想出一个平均水平;根据当时正在发生的其他情况,这可能仍然是关闭的,但你应该能够说到10%。
#4
0
From my observations, standard SQL commands cause a lot of round-trips. So for transferring lots of data it helps if you can restate it as uploading one table. Then you can use bulk copy operation, which is much more efficient. See: Bulk Copy Operations in SQL Server (ADO.NET) and bcp Utility.
根据我的观察,标准SQL命令会导致很多往返。因此,对于传输大量数据,如果您可以将其重新添加为上传一个表,则会有所帮助。然后,您可以使用批量复制操作,这样更有效。请参阅:SQL Server中的批量复制操作(ADO.NET)和bcp实用程序。
#5
0
Actually TDS protocol is an extreamly slow protocol. SQL Server is optimized for processign data, not for marshaling back and forth tonnes of data. While the representation overhead is not large, the fact that is a request-response protocol and the lack of boxcaring makes it quite slow compared to dedicated high throughput protocol, even inside SQL Server (like the Database Mirroring or the Service Broker protocols). But even so, with TDS being as slow as it is, a SQL Server shooting at full speed through a TDS pipe will overwhealm your .Net client, guaranteed.
实际上,TDS协议是一种极其缓慢的协议。 SQL Server针对processign数据进行了优化,而不是为来回调整大量数据。虽然表示开销并不大,但是与专用高吞吐量协议相比,即使是在SQL Server(如数据库镜像或Service Broker协议)中,请求 - 响应协议和缺少盒装的事实使得它非常慢。但即便如此,由于TDS速度很慢,因此通过TDS管道全速拍摄的SQL Server将保证您的.Net客户端过载。
Overall, if you ever come to ask a question like the one you asked, it means you're doing it wrong.
总的来说,如果你曾经问过一个类似问题的问题,那就意味着你做错了。