一、线性代数
线性代数模块包含了大量矩阵相关的函数,包括线性方程求解,特征值求解,矩阵函数,分解函数(SVD, LU, cholesky)等等。
1. 线性方程组
A是矩阵,x、b是向量:
from scipy.linalg import *
from numpy.random import *
A = array([[1,2,3], [4,5,6], [7,8,9]])
b = array([1,2,3])
x = solve(A, b)
x
=> array([-0.33333333, 0.66666667, 0. ])
dot(A, x) - b # 检验
=> array([ -1.11022302e-16, 0.00000000e+00, 0.00000000e+00])
A、B、X都是矩阵:
A = rand(3,3)
B = rand(3,3)
X = solve(A, B)
X
=> array([[ 2.28587973, 5.88845235, 1.6750663 ],
[-4.88205838, -5.26531274, -1.37990347],
[ 1.75135926, -2.05969998, -0.09859636]])
norm(dot(A, X) - B) # check
=> 6.2803698347351007e-16
2. 特征值 与 特征向量
使用 eigvals 计算矩阵的特征值,使用 eig 同时计算矩阵的特征值与特征向量:
evals = eigvals(A)
evals
=> array([ 1.06633891+0.j , -0.12420467+0.10106325j,
-0.12420467-0.10106325j])
evals, evecs = eig(A)
evals
=> array([ 1.06633891+0.j , -0.12420467+0.10106325j,
-0.12420467-0.10106325j])
evecs
=> array([[ 0.89677688+0.j , -0.30219843-0.30724366j, -0.30219843+0.30724366j],
[ 0.35446145+0.j , 0.79483507+0.j , 0.79483507+0.j ],
[ 0.26485526+0.j , -0.20767208+0.37334563j, -0.20767208-0.37334563j]])
第 n个特征值(存储在 evals[n])所对应的特征向量是evecs 的第n列, 比如evecs[:,n]:
n = 1
norm(dot(A, evecs[:,n]) - evals[n] * evecs[:,n])
=> 1.3964254612015911e-16
3. 矩阵运算
inv(A) # the matrix inverse
det(A) # determinant
norm(A, ord=2), norm(A, ord=Inf)# norms of various orders
=> (1.1657807164173386, 1.7872032588446576)
4. 稀疏矩阵
稀疏矩阵对于数值模拟一个大的方程组是很有帮助的。稀疏矩阵有很多种存储的方式, 一般常用的有坐标形式(COO),列表嵌套列表的形式(LIL),压缩列(CSC),压缩行(CSR)等。CSR与CSC在大部分算法下都有着不错的性能,但是它们不够直观,也不容易初始化。所以一般情况下都会先在COO或者LIL下进行初始化,再转换成CSC或CSR形式使用。创建一个稀疏矩阵的时候需要选择它的存储形式:
from scipy.sparse import *
M = array([[1,0,0,0], [0,3,0,0], [0,1,1,0], [1,0,0,1]]); # dense matrix
M
=> array([[1, 0, 0, 0],
[0, 3, 0, 0],
[0, 1, 1, 0],
[1, 0, 0, 1]])
A = csr_matrix(M); # 稠密转稀疏
A
=> <4x4 sparse matrix of type '<type 'numpy.int64'>'
with 6 stored elements in Compressed Sparse Row format>
A.todense() # 稀疏转稠密
A
=> matrix([[1, 0, 0, 0],
[0, 3, 0, 0],
[0, 1, 1, 0],
[1, 0, 0, 1]])
更有效率的方法是:先创建一个空矩阵,再按索引进行填充:
A = lil_matrix((4,4)) # empty 4x4 sparse matrix
A[0,0] = 1
A[1,1] = 3
A[2,2] = A[2,1] = 1
A[3,3] = A[3,0] = 1
A
=> <4x4 sparse matrix of type '<type 'numpy.float64'>'
with 6 stored elements in LInked List format>
A.todense()
matrix([[ 1., 0., 0., 0.],
[ 0., 3., 0., 0.],
[ 0., 1., 1., 0.],
[ 1., 0., 0., 1.]])
二、最优化
from scipy import optimize
1. 找到一个最小值
- fmin_bfgs 找到函数的最小值:
x_min = optimize.fmin_bfgs(f, -2) #-2附近最小值
x_min
=> Optimization terminated successfully.
Current function value: -3.506641
Iterations: 6
Function evaluations: 30
Gradient evaluations: 10
array([-2.67298167])
optimize.fmin_bfgs(f, 0.5)
=> Optimization terminated successfully.
Current function value: 2.804988
Iterations: 3
Function evaluations: 15
Gradient evaluations: 5
array([ 0.46961745])
- brent 或者 fminbound函数
optimize.brent(f)
=> 0.46961743402759754
optimize.fminbound(f, -4, 2)
=> -2.6729822917513886
2. 找到方程的解
fsolve:需要一个初始的猜测值:
optimize.fsolve(f, 0.1)
三、插值
interp1d 函数以一组X,Y数据为输入数据,返回一个类似于函数的对象,输入任意x值给该对象,返回对应的内插值y:
from scipy.interpolate import *
def f(x):
return sin(x)
n = arange(0, 10)
x = linspace(0, 9, 100)
y_meas = f(n) + 0.1 * randn(len(n)) # simulate measurement with noise
y_real = f(x)
linear_interpolation = interp1d(n, y_meas) #生成线性插值函数
y_interp1 = linear_interpolation(x)
cubic_interpolation = interp1d(n, y_meas, kind='cubic') #立方插值函数
y_interp2 = cubic_interpolation(x)
四、 统计学
scipy.stats 模块包含了大量的统计分布,统计函数与测试。
from scipy import stats
X = stats.poisson(3.5) # photon distribution for a coherent state with n=3.5 photons
Y = stats.norm() # create a (continous) random variable with normal distribution
统计值:
Y.mean(), Y.std(), Y.var() # normal distribution
=> (0.0, 1.0, 1.0)
统计检验
检验两组独立的随机数据是否来组同一个分布:
t_statistic, p_value = stats.ttest_ind(X.rvs(size=1000), X.rvs(size=1000))
print "t-statistic =", t_statistic
print "p-value =", p_value
=> t-statistic = -0.244622880865
p-value = 0.806773564698
P值很大就不能拒绝两组数据拥有不同的平均值的假设
检验一组随机数据的平均值是否为 0.1(实际均值为0.1):
stats.ttest_1samp(Y.rvs(size=1000), 0.1)
=> (-4.4661322772225356, 8.8726783620609218e-06)
低p值意味着我们可以拒绝y的均值为 0.1 这个假设。