将几个Pandas系列组合成数据帧的功能

时间:2021-04-21 04:25:05

I have a Dataframe of financial values (df_price). I compute moving averages on one of the dataframe's columns/series (using a simple function, MA), and then I create a new dataframe (df_indicators), which contains each of the moving averages as a column/series.

我有一个财务值的数据框(df_price)。我在其中一个数据框的列/系列上计算移动平均值(使用简单函数MA),然后创建一个新的数据框(df_indicators),其中包含每个移动平均值作为列/系列。

I have to do this same basic operation repeatedly for many different indicators and dataframes so I'd like to create a function (call it bundle_indicators) to do it.

我必须为许多不同的指标和数据框重复执行相同的基本操作,所以我想创建一个函数(称之为bundle_indicators)来完成它。

Basically, I'd like to call bundle_indicators with three arguments:

基本上,我想用三个参数调用bundle_indicators:

  1. A list of the indicator names,
  2. 指标名称列表,

  3. a list (or series) of values for each of the indicators,
  4. 每个指标的值列表(或系列),

  5. df_price so that bundle_indicators can use it's index when creating the dataframe.
  6. df_price使bundle_indicators在创建数据帧时可以使用它的索引。

I'd like bundle_indicators to return a dataframe with each of the columns/series named after one of the indicators and each row/index representing that indicator's value.

我希望bundle_indicators返回一个数据框,其中每个列/系列都以其中一个指标命名,每个行/索引代表该指标的值。

Below is how I currently do it. It runs without errors but I'd like to replace the last paragraph with a function. I've tried everything I can think of but get errors, usually related to passing the arguments. I'd really appreciate anyone's help as I've been at this for quite a while now.

以下是我目前的工作方式。它运行没有错误,但我想用函数替换最后一个段落。我已经尝试了所有我能想到但却得到错误的东西,通常与传递参数有关。我真的很感激任何人的帮助,因为我已经有一段时间了。

import numpy as np
import pandas as pd


# Create a new dataframe
df_price = pd.DataFrame({
    'Date': ['1993.01.29', '1993.02.01', '1993.02.02', '1993.02.03', '1993.02.04', '1993.02.05', '1993.02.08', '1993.02.09', '1993.02.10', '1993.02.11'], 
    'Open': [43.80, 43.80, 44.05, 44.17, 44.67, 43.80, 44.05, 44.17, 44.67, 44.92], 
    'High': [43.80, 44.05, 44.17, 44.67, 44.92, 43.80, 43.80, 44.05, 44.17, 44.67], 
    'Low': [43.55, 43.80, 43.92, 44.17, 44.55, 43.80, 44.05, 44.17, 44.55, 44.89], 
    'Close': [43.80, 44.05, 44.17, 44.55, 44.89, 43.55, 43.80, 43.92, 44.17, 44.55],
    'Volume': [1007786, 482696, 202220, 531820, 533930, 1007786, 482696, 202220, 531820, 533930]
})


# Moving Average funtion
def MA(lb, frame):
    prices = frame['Close']
    mavg = []

    for i in range(len(prices)):
        if i < lb:
            mavg.append(0)
        else:
            sum_array = prices[(i - lb): (i + 1)] 
            mavg.append(np.mean(sum_array))
    return mavg


# Calculate the moving average for three different lookback periods: 1, 2, 4
mavg_fast = MA(1, df_price)
mavg_med = MA(2, df_price)
mavg_slow = MA(4, df_price)

# Create a new df_indicators dataframe, using df_price's index
# TODO REPLACE THIS WITH A FUNCTION THAT RETURNS A DATAFRAME 
df_indicators = pd.DataFrame({'mavg_fast': mavg_fast}, index = df_price.index)
df_indicators = df_indicators.assign(mavg_med= mavg_med)
df_indicators = df_indicators.assign(mavg_slow = mavg_slow)

print(df_indicators)

2 个解决方案

#1


0  

Consider building your dataframe with a dictionary comprehension to be passed into DataFrame() call where you iterate through the indicators name and values list elementwise with zip and map keys and values to migrate as columns and rows:

考虑使用字典理解构建数据框以传递到DataFrame()调用,其中您使用zip和映射键以及要作为列和行迁移的值迭代遍历指示符名称和值列表元素:

def bundle_indicators(indicators_name, values_list, df):        
    output = pd.DataFrame({k:v for k,v in zip(indicators_name, values_list)},
                          index=df.index)        
    return output

df_indicators_new = bundle_indicators(['mavg_fast', 'mavg_med', 'mavg_slow'],
                                      [mavg_fast, mavg_med, mavg_slow],
                                      df_price)

print(df_indicators_new)
#    mavg_fast   mavg_med  mavg_slow
# 0      0.000   0.000000      0.000
# 1     43.925   0.000000      0.000
# 2     44.110  44.006667      0.000
# 3     44.360  44.256667      0.000
# 4     44.720  44.536667     44.292
# 5     44.220  44.330000     44.242
# 6     43.675  44.080000     44.192
# 7     43.860  43.756667     44.142
# 8     44.045  43.963333     44.066
# 9     44.360  44.213333     43.998

# COMPARISON WITH ORIGINAL OUTPUT
print(df_indicators.eq(df_indicators_new))
#    mavg_fast  mavg_med  mavg_slow
# 0       True      True       True
# 1       True      True       True
# 2       True      True       True
# 3       True      True       True
# 4       True      True       True
# 5       True      True       True
# 6       True      True       True
# 7       True      True       True
# 8       True      True       True
# 9       True      True       True

#2


0  

It looks as though you end up with a list of values the same length as the dataframe. You can assign these values as a new column in the dataframe by simply doing df_price['new_column']=list_of_values This might be what you require.

看起来好像最终得到一个与数据帧长度相同的值列表。您可以通过简单地执行df_price ['new_column'] = list_of_values将这些值分配为数据框中的新列。这可能就是您所需要的。

#1


0  

Consider building your dataframe with a dictionary comprehension to be passed into DataFrame() call where you iterate through the indicators name and values list elementwise with zip and map keys and values to migrate as columns and rows:

考虑使用字典理解构建数据框以传递到DataFrame()调用,其中您使用zip和映射键以及要作为列和行迁移的值迭代遍历指示符名称和值列表元素:

def bundle_indicators(indicators_name, values_list, df):        
    output = pd.DataFrame({k:v for k,v in zip(indicators_name, values_list)},
                          index=df.index)        
    return output

df_indicators_new = bundle_indicators(['mavg_fast', 'mavg_med', 'mavg_slow'],
                                      [mavg_fast, mavg_med, mavg_slow],
                                      df_price)

print(df_indicators_new)
#    mavg_fast   mavg_med  mavg_slow
# 0      0.000   0.000000      0.000
# 1     43.925   0.000000      0.000
# 2     44.110  44.006667      0.000
# 3     44.360  44.256667      0.000
# 4     44.720  44.536667     44.292
# 5     44.220  44.330000     44.242
# 6     43.675  44.080000     44.192
# 7     43.860  43.756667     44.142
# 8     44.045  43.963333     44.066
# 9     44.360  44.213333     43.998

# COMPARISON WITH ORIGINAL OUTPUT
print(df_indicators.eq(df_indicators_new))
#    mavg_fast  mavg_med  mavg_slow
# 0       True      True       True
# 1       True      True       True
# 2       True      True       True
# 3       True      True       True
# 4       True      True       True
# 5       True      True       True
# 6       True      True       True
# 7       True      True       True
# 8       True      True       True
# 9       True      True       True

#2


0  

It looks as though you end up with a list of values the same length as the dataframe. You can assign these values as a new column in the dataframe by simply doing df_price['new_column']=list_of_values This might be what you require.

看起来好像最终得到一个与数据帧长度相同的值列表。您可以通过简单地执行df_price ['new_column'] = list_of_values将这些值分配为数据框中的新列。这可能就是您所需要的。