从刻度线数据到烛台数据

I've tick by tick data for Forex pairs

我勾选外汇对的刻度数据

Here is a sample of EURUSD/EURUSD-2012-06.csv

以下是EURUSD / EURUSD-2012-06.csv的样本

EUR/USD,20120601 00:00:00.207,1.23618,1.2363
EUR/USD,20120601 00:00:00.209,1.23618,1.23631
EUR/USD,20120601 00:00:00.210,1.23618,1.23631
EUR/USD,20120601 00:00:00.211,1.23623,1.23631
EUR/USD,20120601 00:00:00.240,1.23623,1.23627
EUR/USD,20120601 00:00:00.423,1.23622,1.23627
EUR/USD,20120601 00:00:00.457,1.2362,1.23626
EUR/USD,20120601 00:00:01.537,1.2362,1.23625
EUR/USD,20120601 00:00:03.010,1.2362,1.23624
EUR/USD,20120601 00:00:03.012,1.2362,1.23625

Full tick data can be downloaded here http://dl.free.fr/k4vVF7aOD

完整的刻度数据可以在http://dl.free.fr/k4vVF7aOD下载

Columns are :

专栏是:

Symbol,Datetime,Bid,Ask

I would like to convert this tick by tick data to candlestick data (also called OHLC Open High Low Close) I will say that I want to get a M15 timeframe (15 minutes) as an example

我想将这个滴答数据转换为烛台数据(也称为OHLC Open High Low Close)我会说我想得到一个M15时间帧(15分钟)作为例子

I would like to use Python and Pandas library to achieve this task.

我想用Python和Pandas库来实现这个任务。

I've done a little part of the job... reading the tick by tick data file

我做了一小部分工作......按刻度数据文件读取

Here is the code

这是代码

#!/usr/bin/env python

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.finance import candlestick
from datetime import *

def conv_str_to_datetime(x):
    return(datetime.strptime(x, '%Y%m%d %H:%M:%S.%f'))

df = pd.read_csv('test_EURUSD/EURUSD-2012-07.csv', names=['Symbol', 'Date_Time', 'Bid', 'Ask'], converters={'Date_Time': conv_str_to_datetime})

PipPosition = 4
df['Spread'] = (df['Ask'] - df['Bid']) * 10**PipPosition

print(df)

print("="*10)

print(df.ix[0])

but now I don't know how to start rest of the job...

但现在我不知道如何开始其他工作......

I want to get data like

我想得到像这样的数据

Symbol,Datetime_open_candle,open_price,high_price,low_price,close_price

Price on candle will be based on Bid column.

蜡烛价格将基于出价栏。

The first part of the problem is in my mind to get the first Datetime_open_candle (compatible with the desired timeframe, lets say that the name of the variable is dt1) and the last Datetime_open_candle (let's say that the name of this variable is dt2).

问题的第一部分是我想到的第一个Datetime_open_candle(与所需的时间帧兼容,假设变量的名称是dt1)和最后一个Datetime_open_candle(假设这个变量的名称是dt2)。

After I will probably need to get data from dt1 to dt2 (and not data before dt1 and after dt2)

之后我可能需要从dt1到dt2获取数据(而不是dt1之前和dt2之后的数据)

Knowing dt1 and dt2 and desired timeframe I can know the number of candles I will have...

知道dt1和dt2以及所需的时间框架我可以知道我将拥有的蜡烛数量......

I've "just to" know, for each candle, what is open/high/low/close price.

我只是“知道”,对于每支蜡烛,什么是开盘/高/低/收盘价。

I'm looking for a quite fast algorithm, if possible a vectorized one (if it's possible) as tick data can be very big.

我正在寻找一种非常快速的算法,如果可能的话,一个矢量化算法(如果可能的话),因为滴答数据可能非常大。

2 个解决方案

#1

In [59]: df
Out[59]:
                             Symbol      Bid      Ask
Datetime
2012-06-01 00:00:00.207000  EUR/USD  1.23618  1.23630
2012-06-01 00:00:00.209000  EUR/USD  1.23618  1.23631
2012-06-01 00:00:00.210000  EUR/USD  1.23618  1.23631
2012-06-01 00:00:00.211000  EUR/USD  1.23623  1.23631
2012-06-01 00:00:00.240000  EUR/USD  1.23623  1.23627
2012-06-01 00:00:00.423000  EUR/USD  1.23622  1.23627
2012-06-01 00:00:00.457000  EUR/USD  1.23620  1.23626
2012-06-01 00:00:01.537000  EUR/USD  1.23620  1.23625
2012-06-01 00:00:03.010000  EUR/USD  1.23620  1.23624
2012-06-01 00:00:03.012000  EUR/USD  1.23620  1.23625

In [60]: grouped = df.groupby('Symbol')

In [61]: ask =  grouped['Ask'].resample('15Min', how='ohlc')

In [62]: bid = grouped['Bid'].resample('15Min', how='ohlc')

In [63]: pandas.concat([ask, bid], axis=1, keys=['Ask', 'Bid'])
Out[63]:
                                Ask                                 Bid
                               open     high      low    close     open     high      low   close
Symbol  Datetime
EUR/USD 2012-06-01 00:15:00  1.2363  1.23631  1.23624  1.23625  1.23618  1.23623  1.23618  1.2362

#2

The syntax in the answer from Overmeire is meanwhile deprecated.

Overmeire的答案中的语法同时被弃用。

Instead of this:

而不是这个:

ask =  grouped['Ask'].resample('15Min', how='ohlc')
bid = grouped['Bid'].resample('15Min', how='ohlc')

Use this:

ask =  grouped['Ask'].resample('15Min').ohlc()
bid = grouped['Bid'].resample('15Min').ohlc()

#1

In [59]: df
Out[59]:
                             Symbol      Bid      Ask
Datetime
2012-06-01 00:00:00.207000  EUR/USD  1.23618  1.23630
2012-06-01 00:00:00.209000  EUR/USD  1.23618  1.23631
2012-06-01 00:00:00.210000  EUR/USD  1.23618  1.23631
2012-06-01 00:00:00.211000  EUR/USD  1.23623  1.23631
2012-06-01 00:00:00.240000  EUR/USD  1.23623  1.23627
2012-06-01 00:00:00.423000  EUR/USD  1.23622  1.23627
2012-06-01 00:00:00.457000  EUR/USD  1.23620  1.23626
2012-06-01 00:00:01.537000  EUR/USD  1.23620  1.23625
2012-06-01 00:00:03.010000  EUR/USD  1.23620  1.23624
2012-06-01 00:00:03.012000  EUR/USD  1.23620  1.23625

In [60]: grouped = df.groupby('Symbol')

In [61]: ask =  grouped['Ask'].resample('15Min', how='ohlc')

In [62]: bid = grouped['Bid'].resample('15Min', how='ohlc')

In [63]: pandas.concat([ask, bid], axis=1, keys=['Ask', 'Bid'])
Out[63]:
                                Ask                                 Bid
                               open     high      low    close     open     high      low   close
Symbol  Datetime
EUR/USD 2012-06-01 00:15:00  1.2363  1.23631  1.23624  1.23625  1.23618  1.23623  1.23618  1.2362

#2