文件名称:Learning pandas - Second Edition - 2017 pdf 2分
文件大小:30.17MB
文件格式:ZIP
更新时间:2021-05-14 06:57:00
pandas
Table of Contents Preface What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support Downloading the example code Errata Piracy Questions 1. pandas and Data Analysis Introducing pandas Data manipulation, analysis, science, and pandas Data manipulation Data analysis Data science Where does pandas fit? The process of data analysis The process Ideation Retrieval Preparation Exploration Modeling Presentation Reproduction A note on being iterative and agile Relating the book to the process Concepts of data and analysis in our tour of pandas Types of data Structured Unstructured Semi-structured Variables Categorical Continuous Discrete Time series data General concepts of analysis and statistics Quantitative versus qualitative data/analysis Single and multivariate analysis Descriptive statistics Inferential statistics 17 Stochastic models Probability and Bayesian statistics Correlation Regression Other Python libraries of value with pandas Numeric and scientific computing - NumPy and SciPy Statistical analysis – StatsModels Machine learning – scikit-learn PyMC - stochastic Bayesian modeling Data visualization - matplotlib and seaborn Matplotlib Seaborn Summary 2. Up and Running with pandas Installation of Anaconda IPython and Jupyter Notebook IPython Jupyter Notebook Introducing the pandas Series and DataFrame Importing pandas The pandas Series The pandas DataFrame Loading data from files into a DataFrame Visualization Summary 3. Representing Univariate Data with the Series Configuring pandas Creating a Series Creating a Series using Python lists and dictionaries Creation using NumPy functions Creation using a scalar value The .index and .values properties The size and shape of a Series Specifying an index at creation Heads, tails, and takes Retrieving values in a Series by label or position Lookup by label using the [] operator and the .ix[] property Explicit lookup by position with .iloc[] Explicit lookup by labels with .loc[] Slicing a Series into subsets Alignment via index labels Performing Boolean selection Re-indexing a Series Modifying a Series in-place Summary 4. Representing Tabular and Multivariate Data with the DataFrame Configuring pandas 18 Creating DataFrame objects Creating a DataFrame using NumPy function results Creating a DataFrame using a Python dictionary and pandas Series objects Creating a DataFrame from a CSV file Accessing data within a DataFrame Selecting the columns of a DataFrame Selecting rows of a DataFrame Scalar lookup by label or location using .at[] and .iat[] Slicing using the [ ] operator Selecting rows using Boolean selection Selecting across both rows and columns Summary 5. Manipulating DataFrame Structure Configuring pandas Renaming columns Adding new columns with [] and .insert() Adding columns through enlargement Adding columns using concatenation Reordering columns Replacing the contents of a column Deleting columns Appending new rows Concatenating rows Adding and replacing rows via enlargement Removing rows using .drop() Removing rows using Boolean selection Removing rows using a slice Summary 6. Indexing Data Configuring pandas The importance of indexes The pandas index types The fundamental type - Index Integer index labels using Int64Index and RangeIndex Floating-point labels using Float64Index Representing discrete intervals using IntervalIndex Categorical values as an index - CategoricalIndex Indexing by date and time using DatetimeIndex Indexing periods of time using PeriodIndex Working with Indexes Creating and using an index with a Series or DataFrame Selecting values using an index Moving data to and from the index Reindexing a pandas object Hierarchical indexing Summary 7. Categorical Data 19 Configuring pandas Creating Categoricals Renaming categories Appending new categories Removing categories Removing unused categories Setting categories Descriptive information of a Categorical Munging school grades Summary 8. Numerical and Statistical Methods Configuring pandas Performing numerical methods on pandas objects Performing arithmetic on a DataFrame or Series Getting the counts of values Determining unique values (and their counts) Finding minimum and maximum values Locating the n-smallest and n-largest values Calculating accumulated values Performing statistical processes on pandas objects Retrieving summary descriptive statistics Measuring central tendency: mean, median, and mode Calculating the mean Finding the median Determining the mode Calculating variance and standard deviation Measuring variance Finding the standard deviation Determining covariance and correlation Calculating covariance Determining correlation Performing discretization and quantiling of data Calculating the rank of values Calculating the percent change at each sample of a series Performing moving-window operations Executing random sampling of data Summary 9. Accessing Data Configuring pandas Working with CSV and text/tabular format data Examining the sample CSV data set Reading a CSV file into a DataFrame Specifying the index column when reading a CSV file Data type inference and specification Specifying column names Specifying specific columns to load Saving DataFrame to a CSV file 20 Working with general field-delimited data Handling variants of formats in field-delimited data Reading and writing data in Excel format Reading and writing JSON files Reading HTML data from the web Reading and writing HDF5 format files Accessing CSV data on the web Reading and writing from/to SQL databases Reading data from remote data services Reading stock data from Yahoo! and Google Finance Retrieving options data from Google Finance Reading economic data from the Federal Reserve Bank of St. Louis Accessing Kenneth French's data Reading from the World Bank Summary 10. Tidying Up Your Data Configuring pandas What is tidying your data? How to work with missing data Determining NaN values in pandas objects Selecting out or dropping missing data Handling of NaN values in mathematical operations Filling in missing data Forward and backward filling of missing values Filling using index labels Performing interpolation of missing values Handling duplicate data Transforming data Mapping data into different values Replacing values Applying functions to transform data Summary 11. Combining, Relating, and Reshaping Data Configuring pandas Concatenating data in multiple objects Understanding the default semantics of concatenation Switching axes of alignment Specifying join type Appending versus concatenation Ignoring the index labels Merging and joining data Merging data from multiple pandas objects Specifying the join semantics of a merge operation Pivoting data to and from value and indexes Stacking and unstacking Stacking using non-hierarchical indexes Unstacking using hierarchical indexes 21 Melting data to and from long and wide format Performance benefits of stacked data Summary 12. Data Aggregation Configuring pandas The split, apply, and combine (SAC) pattern Data for the examples Splitting data Grouping by a single column's values Accessing the results of a grouping Grouping using multiple columns Grouping using index levels Applying aggregate functions, transforms, and filters Applying aggregation functions to groups Transforming groups of data The general process of transformation Filling missing values with the mean of the group Calculating normalized z-scores with a transformation Filtering groups from aggregation Summary 13. Time-Series Modelling Setting up the IPython notebook Representation of dates, time, and intervals The datetime, day, and time objects Representing a point in time with a Timestamp Using a Timedelta to represent a time interval Introducing time-series data Indexing using DatetimeIndex Creating time-series with specific frequencies Calculating new dates using offsets Representing data intervals with date offsets Anchored offsets Representing durations of time using Period Modelling an interval of time with a Period Indexing using the PeriodIndex Handling holidays using calendars Normalizing timestamps using time zones Manipulating time-series data Shifting and lagging Performing frequency conversion on a time-series Up and down resampling of a time-series Time-series moving-window operations Summary 14. Visualization Configuring pandas Plotting basics with pandas Creating time-series charts Adorning and styling your time-series plot 22 Adding a title and changing axes labels Specifying the legend content and position Specifying line colors, styles, thickness, and markers Specifying tick mark locations and tick labels Formatting axes' tick date labels using formatters Common plots used in statistical analyses Showing relative differences with bar plots Picturing distributions of data with histograms Depicting distributions of categorical data with box and whisker charts Demonstrating cumulative totals with area plots Relationships between two variables with scatter plots Estimates of distribution with the kernel density plot Correlations between multiple variables with the scatter plot matrix Strengths of relationships in multiple variables with heatmaps Manually rendering multiple plots in a single chart Summary 15. Historical Stock Price Analysis Setting up the IPython notebook Obtaining and organizing stock data from Google Plotting time-series prices Plotting volume-series data Calculating the simple daily percentage change in closing price Calculating simple daily cumulative returns of a stock Resampling data from daily to monthly returns Analyzing distribution of returns Performing a moving-average calculation Comparison of average daily returns across stocks Correlation of stocks based on the daily percentage change of the closing price Calculating the volatility of stocks Determining risk relative to expected returns Summary
【文件预览】:
Learning pandas - Second Edition - 2017.pdf