如何在R包中管理数据库连接

时间:2021-11-24 11:27:55

I'm building an R package, the main purpose of which is to abstract away the pain of dealing with a proprietary database that requires some fairly complex SQL queries in order get data out.

我正在构建一个R包,其主要目的是消除处理专有数据库的痛苦,该数据库需要一些相当复杂的SQL查询才能获取数据。

As such, the connection to the Microsoft SQL Server (obtained by odbcDriverConnect) is a constant and important part of this package, but I can't work out how best to manage this and I'm hoping for advice as to how this should be implemented in R.

因此,与Microsoft SQL Server的连接(由odbcDriverConnect获得)是此包的一个不变且重要的部分,但我无法弄清楚如何最好地管理它,我希望得到关于如何应该这样做的建议在R.中实现

My current thoughts are:

我目前的想法是:

  1. Make the user ensure they have a valid connection before they call any function. Each function then has connection as a parameter which must be passed. This puts a burden on the user.

    让用户在调用任何函数之前确保他们具有有效的连接。然后,每个函数都将连接作为必须传递的参数。这给用户带来了负担。

  2. In every function, make a call to get.connection() which will get new connection each time. Old connections are then allowed to timeout naturally, which seems a sloppy approach.

    在每个函数中,调用get.connection(),每次都会得到新的连接。然后允许旧连接自然超时,这似乎是一种草率的方法。

  3. As above, but return the same connection each time. This appears not to be a viable proposition as I can't prevent connections from timing out through R. autoReconnect=TRUE and other tricks I've used in different languages seem to have no effect.

    如上所述,但每次都返回相同的连接。这似乎不是一个可行的命题,因为我不能通过R. autoReconnect = TRUE阻止连接超时,而我在不同语言中使用的其他技巧似乎没有效果。

In Java, I would probably have a DatabaseConnectionPool populated with a number of connections and simply grab connections from, and return them to, that pool as needed. I also don't seem to have the timeout issue in Java when I specify autoReconnect=TRUE.

在Java中,我可能会有一个DatabaseConnectionPool,其中填充了许多连接,只需从中获取连接,并根据需要将它们返回到该池。当我指定autoReconnect = TRUE时,我似乎也没有Java中的超时问题。

Any suggestions much appreciated.

任何建议非常感谢。

2 个解决方案

#1


2  

pool is an R package for pooling connections such as databases. If you're cool to use a github package, take a look at https://github.com/rstudio/pool. Will reuse or recreate the connection as required.

pool是用于池连接(例如数据库)的R包。如果您使用github软件包很酷,请查看https://github.com/rstudio/pool。将根据需要重用或重新创建连接。

#2


1  

It seems that a mix between the second and the third approach is a reasonable solution i.e. getting the same connection each time, however before returning the connection you can check if it is still opened, otherwise create a new connection.

似乎第二种和第三种方法之间的混合是一种合理的解决方案,即每次都获得相同的连接,但是在返回连接之前,您可以检查它是否仍然打开,否则创建一个新连接。

It is basically as if you are manually implementing autoReconnect=TRUE

它基本上就像您手动实现autoReconnect = TRUE一样

#1


2  

pool is an R package for pooling connections such as databases. If you're cool to use a github package, take a look at https://github.com/rstudio/pool. Will reuse or recreate the connection as required.

pool是用于池连接(例如数据库)的R包。如果您使用github软件包很酷,请查看https://github.com/rstudio/pool。将根据需要重用或重新创建连接。

#2


1  

It seems that a mix between the second and the third approach is a reasonable solution i.e. getting the same connection each time, however before returning the connection you can check if it is still opened, otherwise create a new connection.

似乎第二种和第三种方法之间的混合是一种合理的解决方案,即每次都获得相同的连接,但是在返回连接之前,您可以检查它是否仍然打开,否则创建一个新连接。

It is basically as if you are manually implementing autoReconnect=TRUE

它基本上就像您手动实现autoReconnect = TRUE一样