从数据库中获取最小数据的最快方法

I have a Postgres database. I want to find the minimum value of a column called calendarid, which is of type integer and the format yyyymmdd, from a certain table. I am able to do so via the following code.

我有一个Postgres数据库。我想从一个特定的表中找到名为calendarid的列的最小值，该列的类型为integer，格式为yyyyymmdd。我可以通过以下代码做到这一点。

get_history_startdate <- function(src) {
  get_required_table(src) %>% # This gives me the table tbl(src, "table_name")
    select(calendarid) %>%
    as_data_frame %>%
    collect() %>%
    min() # Result : 20150131
}

But this method is really slow as it loads all the data from the database to the memory. Any ideas how can I improve it?

但是这个方法非常慢，因为它将数据库中的所有数据加载到内存中。有什么办法可以改进它吗?

2 个解决方案

#1

get_required_table(src) %>% 
  summarise(max(calendarid, na.rm = TRUE)) %>% 
  pull

will run the appropriate SQL query.

将运行适当的SQL查询。

#2

If you just want the minimum value of the calendarid column across the entire table, then use this:

如果您只想要整个表中calendar和column的最小值，那么使用以下方法:

SELECT MIN(calendarid) AS min_calendarid
FROM your_table;

I don't exactly what your R code is doing under the hood, but if it's bringing in the entire table from Postgres into R, then it is very wasteful. If so, then running the above query directly on Postgres should give you a boost in performance.

我不知道你的R代码在做什么，但是如果它把整个表格从Postgres带入到R中，那是非常浪费的。如果是这样，那么直接在Postgres上运行上述查询将提高性能。

#1

get_required_table(src) %>% 
  summarise(max(calendarid, na.rm = TRUE)) %>% 
  pull