import numpy as np

import pandas as pd

数据加载

首先，我们需要将收集的数据加载到内存中，才能进行进一步的操作。pandas提供了非常多的读取数据的函数，分别应用在各种数据源环境中，我们常用的函数为：

read_csv
read_table
read_sql

q

1.1 加载csv数据

header 表标题，可以使用整形和或者整形列表来指定标题在哪一行，None是无标题，默认infer首行

sep 控制数据之间的分隔符号。read_csv方法，默认为逗号(,)
names 设置列标签（相当于df.columns）
index_col 可以指定有唯一标记的列来充当行标签
usecols 指定感兴趣的列

# 加载数据集, 返回DataFram类型

df = pd.read_csv('/home/geoffrey/文档/33.csv', header=0, sep=',', usecols=['v:0', 'Points:0', 'Points:1', 'Points:2'])

df.head(10)

.dataframe thead tr:only-child th {
text-align: right;
}

.dataframe thead th {

    text-align: left;

}

.dataframe tbody tr th {

    vertical-align: top;

}

	v:0	Points:0	Points:1	Points:2
0	2.57150	1.23150	-0.86263	-0.40724
1	2.08420	1.15670	-0.90047	-0.34635
2	1.27970	0.76719	-0.93330	-0.26176
3	0.71951	0.63454	-0.91585	-0.22918
4	1.63080	0.81560	-0.93992	-0.20332
5	3.36400	1.50590	-0.98745	-0.19570
6	2.27160	0.82635	-0.89883	-0.19312
7	2.64630	0.96451	-0.85991	-0.18457
8	0.91226	0.68853	-0.83424	-0.18203
9	4.55390	1.46730	-0.82822	-0.17043

1.2 加载数据库数据

pd.read_sql(sql语句，连接对象)

import sqlite3

# 创建连接,创建数据库

con = sqlite3.connect('test.db')

# SQL语句

sql = 'create table person(id int primary key, name varchar(100))'

con.execute(sql)

# 插入数据

sql = 'insert into person(name) values("Geoffrey")'

con.execute(sql)

con.commit()

# 查看数据

sql = 'select * from person'

pd.read_sql(sql, con)

.dataframe thead tr:only-child th {
text-align: right;
}

.dataframe thead th {

    text-align: left;

}

.dataframe tbody tr th {

    vertical-align: top;

}

	id	name
0	None	Geoffrey

1.3 数据流处理

数据流.getvalue() # 注意，写入后指针在数据流的末尾，需要调整指针

from io import StringIO # 类文件对象（缓存区）

# 创建缓存区

sio = StringIO()

# 向缓存区写入数据

df.to_csv(sio)

# 读取数据

sio.getvalue()

',0,1,2\n0,1,2,3.0\n1,4,5,6.0\n2,7,8,\n'

# 调整指针到缓存区头部

sio.seek(0)

sio.read()

',0,1,2\n0,1,2,3.0\n1,4,5,6.0\n2,7,8,\n'

2. 写入数据

DataFrame与Series对象的to_csv方法：

该方法可以将数据写入：

文件中
数据流中

常用参数

sep 指定分隔符
header 是否写入标题行
na_rep 空值的表示
index 是否写入索引
index_label 索引字段的名称
columns 写入的字段

df = pd.DataFrame([

    [1, 2, 3],

    [4, 5, 6],

    [7, 8, np.nan] # 含有

])

df

.dataframe thead tr:only-child th {
text-align: right;
}

.dataframe thead th {

    text-align: left;

}

.dataframe tbody tr th {

    vertical-align: top;

}

	0	1	2
0	1	2	3.0
1	4	5	6.0
2	7	8	NaN

df.to_csv('test.csv', sep=',', header=True, index=True, na_rep='空', columns=[0, 2])

pd.read_csv('test.csv')

.dataframe thead tr:only-child th {
text-align: right;
}

.dataframe thead th {

    text-align: left;

}

.dataframe tbody tr th {

    vertical-align: top;

}

	Unnamed: 0	0	2
0	0	1	3.0
1	1	4	6.0
2	2	7	空

秒客网

Pandas学习1 --- 数据载入

数据加载

1.1 加载csv数据

1.2 加载数据库数据

1.3 数据流处理

2. 写入数据

常用参数

相关文章