计算Python / Pandas中两行之间的差异

In python, how can I reference previous row and calculate something against it? Specifically, I am working with dataframes in pandas - I have a data frame full of stock price information that looks like this:

在python中，如何引用前一行并针对它计算一些东西？具体来说，我正在处理大熊猫中的数据帧 - 我有一个充满股票价格信息的数据框，如下所示：

           Date   Close  Adj Close
251  2011-01-03  147.48     143.25
250  2011-01-04  147.64     143.41
249  2011-01-05  147.05     142.83
248  2011-01-06  148.66     144.40
247  2011-01-07  147.93     143.69

Here is how I created this dataframe:

以下是我创建此数据框的方法：

import pandas

url = 'http://ichart.finance.yahoo.com/table.csv?s=IBM&a=00&b=1&c=2011&d=11&e=31&f=2011&g=d&ignore=.csv'
data = data = pandas.read_csv(url)

## now I sorted the data frame ascending by date 
data = data.sort(columns='Date')

Starting with row number 2, or in this case, I guess it's 250 (PS - is that the index?), I want to calculate the difference between 2011-01-03 and 2011-01-04, for every entry in this dataframe. I believe the appropriate way is to write a function that takes the current row, then figures out the previous row, and calculates the difference between them, the use the pandas apply function to update the dataframe with the value.

从第2行开始，或者在这种情况下，我猜它是250（PS - 是指数？），我想计算2011-01-03和2011-01-04之间的差异，对于这个数据框中的每个条目。我认为合适的方法是编写一个获取当前行的函数，然后计算出前一行，并计算它们之间的差异，使用pandas apply函数来更新数据帧的值。

Is that the right approach? If so, should I be using the index to determine the difference? (note - I'm still in python beginner mode, so index may not be the right term, nor even the correct way to implement this)

这是正确的方法吗？如果是这样，我应该使用索引来确定差异吗？（注意 - 我仍处于python初学者模式，因此索引可能不是正确的术语，甚至也不是正确的实现方式）

2 个解决方案

#1

I think you want to do something like this:

我想你想做这样的事情：

In [26]: data
Out[26]: 
           Date   Close  Adj Close
251  2011-01-03  147.48     143.25
250  2011-01-04  147.64     143.41
249  2011-01-05  147.05     142.83
248  2011-01-06  148.66     144.40
247  2011-01-07  147.93     143.69

In [27]: data.set_index('Date').diff()
Out[27]: 
            Close  Adj Close
Date                        
2011-01-03    NaN        NaN
2011-01-04   0.16       0.16
2011-01-05  -0.59      -0.58
2011-01-06   1.61       1.57
2011-01-07  -0.73      -0.71

#2

-1

I don't know pandas, and I'm pretty sure it has something specific for this; however, I'll give you the pure-Python solution, that might be of some help even if you need to use pandas:

我不知道大熊猫，我很确定它有特定的东西;但是，我会给你纯Python解决方案，即使你需要使用pandas也可能会有所帮助：

import csv
import urllib

# This basically retrieves the CSV files and loads it in a list, converting
# All numeric values to floats
url='http://ichart.finance.yahoo.com/table.csv?s=IBM&a=00&b=1&c=2011&d=11&e=31&f=2011&g=d&ignore=.csv'
reader = csv.reader(urllib.urlopen(url), delimiter=',')
# We sort the output list so the records are ordered by date
cleaned = sorted([[r[0]] + map(float, r[1:]) for r in list(reader)[1:]])

for i, row in enumerate(cleaned):  # enumerate() yields two-tuples: (<id>, <item>)
    # The try..except here is to skip the IndexError for line 0
    try:
        # This will calculate difference of each numeric field with the same field
        # in the row before this one
        print row[0], [(row[j] - cleaned[i-1][j]) for j in range(1, 7)]
    except IndexError:
        pass

#1