Does anyone know of any tools to provide simple, fast queries of flat files using a SQL-like declarative query language? I'd rather not pay the overhead of loading the file into a DB since the input data is typically thrown out almost immediately after the query is run.
有谁知道有什么工具可以使用类似sql的声明式查询语言来提供简单、快速的平面文件查询吗?我不愿支付将文件加载到DB中的开销,因为输入数据通常在查询运行后立即抛出。
Consider the data file, "animals.txt":
考虑数据文件“animals.txt”:
dog 15
cat 20
dog 10
cat 30
dog 5
cat 40
Suppose I want to extract the highest value for each unique animal. I would like to write something like:
假设我想提取每个独特动物的最高值。我想这样写:
cat animals.txt | foo "select $1, max(convert($2 using decimal)) group by $1"
I can get nearly the same result using sort
:
使用sort可以得到几乎相同的结果:
cat animals.txt | sort -t " " -k1,1 -k2,2nr
And I can always drop into awk
from there, but this all feels a bit awk
ward (couldn't resist) when a SQL-like language would seem to solve the problem so cleanly.
我总是可以从那里进入awk,但当一种类似sql的语言看起来能如此干净地解决问题时,这一切都让我觉得有点尴尬(无法抗拒)。
I've considered writing a wrapper for SQLite that would automatically create a table based on the input data, and I've looked into using Hive in single-processor mode, but I can't help but feel this problem has been solved before. Am I missing something? Is this functionality already implemented by another standard tool?
我曾考虑过为SQLite编写一个包装器,该包装器将根据输入数据自动创建一个表,我还研究过在单处理器模式下使用Hive,但我忍不住觉得这个问题以前已经解决了。我遗漏了什么东西?这个功能已经被另一个标准工具实现了吗?
Halp!
想买!
8 个解决方案
#1
3
I wrote TxtSushi mostly to do SQL selects on flat files. Here is the command chain for your example (all of these commands are from TxtSushi):
我写TxtSushi主要是为了在平面文件上执行SQL选择。下面是示例的命令链(所有这些命令都来自TxtSushi):
tabtocsv animals.txt | namecolumns - | tssql -table animals - \ 'select col1, max(as_int(col2)) from animals group by col1'
namecolumns is only required because animals.txt doesn't have a header row. You can get a quick sense of what is possible by looking through the example scripts. There are also links to similar tools on the bottom of the main page.
只有动物才需要命名。txt没有标题行。通过查看示例脚本,您可以快速了解什么是可能的。在主页的底部也有类似工具的链接。
#3
1
you can use sqlite. Here's an example using Python.
您可以使用sqlite。这里有一个使用Python的例子。
import sqlite3
conn = sqlite3.connect('/tmp/test.db')
cursor = conn.cursor()
try:
cursor.execute("""create table table1 (word varchar not null, number varchar not null)""")
except: pass
cursor.execute("insert into table1 values ('dog', '15')")
cursor.execute("insert into table1 values ('cat', '20')")
cursor.execute("insert into table1 values ('dog', '10')")
cursor.execute("select max(number) , word from table1 group by word")
print cursor.fetchall()
output
输出
$ ./python.py
[(u'20', u'cat'), (u'15', u'dog')]
#4
1
I just stumbled across this Python script which does something like what you want, although it only supports very basic queries.
我刚刚偶然发现了这个Python脚本,它所做的事情与您想要的差不多,尽管它只支持非常基本的查询。
#5
0
We'll I have a lightweight ORM for sqlite that would simplify this task without requiring any configuration files, etc.
我们将为sqlite提供一个轻量级ORM,它可以简化这个任务,而不需要任何配置文件等等。
If you can using PowerShell has a lot of powerful capabilities for parsing and querying text files (example here). Otherwise using .NET/Mono you can cut that up in and use LINQ in no time.
如果可以的话,PowerShell有很多强大的解析和查询文本文件的功能(这里的示例)。否则,使用. net / mono,您可以将其分割开来,并立即使用LINQ。
#6
0
I never managed to find a satisfying answer to my question, but I did at least find a solution to my toy problem using uniq
s "-f" option, which I had been unaware of:
我从来没有找到一个满意的答案,但我至少用uniqs“-f”选项找到了我的玩具问题的解决方案,这一点我一直都不知道:
cat animals.txt | sort -t " " -k1,1 -k2,2nr \
| awk -F' ' '{print $2, " ", $1}' | uniq -f 1
The awk
portion above could, obviously, be skipped entirely if the input file were created with columns in the opposite order.
显然,如果以相反的顺序创建输入文件,则可以完全跳过上面的awk部分。
I'm still holding out hope for a SQL-like tool, though.
不过,我仍然希望有一个类似sql的工具。
#7
0
I made a tool that might help. http://www.mccoyonlinestore.com/index.php?txtSearch=mccoy_rdbms your sql could be "Select Max(value) from animals" or it could by "Select * from animals order by value desc"
我做了一个有用的工具。http://www.mccoyonlinestore.com/index.php?txtSearch=mccoy_rdbms您的sql可以是“从动物中选择Max(value)”,也可以是“从动物中选择* (Select * from animals order by value desc)”
#8
0
You can look for HXTT JDBC Drivers. They provide JDBC drivers for most type of flat files, excel etc .
您可以查找HXTT JDBC驱动程序。它们为大多数类型的平面文件、excel等提供JDBC驱动程序。
You can execute simple SQL queries on it.
您可以对它执行简单的SQL查询。
They have trial versions available as well
他们也有试用版
#1
3
I wrote TxtSushi mostly to do SQL selects on flat files. Here is the command chain for your example (all of these commands are from TxtSushi):
我写TxtSushi主要是为了在平面文件上执行SQL选择。下面是示例的命令链(所有这些命令都来自TxtSushi):
tabtocsv animals.txt | namecolumns - | tssql -table animals - \ 'select col1, max(as_int(col2)) from animals group by col1'
namecolumns is only required because animals.txt doesn't have a header row. You can get a quick sense of what is possible by looking through the example scripts. There are also links to similar tools on the bottom of the main page.
只有动物才需要命名。txt没有标题行。通过查看示例脚本,您可以快速了解什么是可能的。在主页的底部也有类似工具的链接。
#2
#3
1
you can use sqlite. Here's an example using Python.
您可以使用sqlite。这里有一个使用Python的例子。
import sqlite3
conn = sqlite3.connect('/tmp/test.db')
cursor = conn.cursor()
try:
cursor.execute("""create table table1 (word varchar not null, number varchar not null)""")
except: pass
cursor.execute("insert into table1 values ('dog', '15')")
cursor.execute("insert into table1 values ('cat', '20')")
cursor.execute("insert into table1 values ('dog', '10')")
cursor.execute("select max(number) , word from table1 group by word")
print cursor.fetchall()
output
输出
$ ./python.py
[(u'20', u'cat'), (u'15', u'dog')]
#4
1
I just stumbled across this Python script which does something like what you want, although it only supports very basic queries.
我刚刚偶然发现了这个Python脚本,它所做的事情与您想要的差不多,尽管它只支持非常基本的查询。
#5
0
We'll I have a lightweight ORM for sqlite that would simplify this task without requiring any configuration files, etc.
我们将为sqlite提供一个轻量级ORM,它可以简化这个任务,而不需要任何配置文件等等。
If you can using PowerShell has a lot of powerful capabilities for parsing and querying text files (example here). Otherwise using .NET/Mono you can cut that up in and use LINQ in no time.
如果可以的话,PowerShell有很多强大的解析和查询文本文件的功能(这里的示例)。否则,使用. net / mono,您可以将其分割开来,并立即使用LINQ。
#6
0
I never managed to find a satisfying answer to my question, but I did at least find a solution to my toy problem using uniq
s "-f" option, which I had been unaware of:
我从来没有找到一个满意的答案,但我至少用uniqs“-f”选项找到了我的玩具问题的解决方案,这一点我一直都不知道:
cat animals.txt | sort -t " " -k1,1 -k2,2nr \
| awk -F' ' '{print $2, " ", $1}' | uniq -f 1
The awk
portion above could, obviously, be skipped entirely if the input file were created with columns in the opposite order.
显然,如果以相反的顺序创建输入文件,则可以完全跳过上面的awk部分。
I'm still holding out hope for a SQL-like tool, though.
不过,我仍然希望有一个类似sql的工具。
#7
0
I made a tool that might help. http://www.mccoyonlinestore.com/index.php?txtSearch=mccoy_rdbms your sql could be "Select Max(value) from animals" or it could by "Select * from animals order by value desc"
我做了一个有用的工具。http://www.mccoyonlinestore.com/index.php?txtSearch=mccoy_rdbms您的sql可以是“从动物中选择Max(value)”,也可以是“从动物中选择* (Select * from animals order by value desc)”
#8
0
You can look for HXTT JDBC Drivers. They provide JDBC drivers for most type of flat files, excel etc .
您可以查找HXTT JDBC驱动程序。它们为大多数类型的平面文件、excel等提供JDBC驱动程序。
You can execute simple SQL queries on it.
您可以对它执行简单的SQL查询。
They have trial versions available as well
他们也有试用版