我在 www.datacamp.com 上面的免费课程学习R 语言, 现在刚开始入门 ,记下一些函数的笔记,以后往数据分析/挖掘发展
vector:
sum( ):
It calculates the sum of all elements of a vector
计算向量里面所有元素的总个数
total_poker <- sum(poker_vector)
> 和<: answer <- total_poker > total_roulette
返回一个bool值来表示比较的结果(TRUE或者FALSE)
关于显示向量中第几个项的问题:poker_midweek <- poker_vector [c(2,3,4)]
的这里是显示 poker_vector里面的第2,3,4个. R里面第一位就是1,不是0
roulette_vector[2:5] :
显示roulette_vector里面的第2-5个 关键是【?:?】
mean( ):求平均值的函数 mean(poker_vector[1:3])
函数求 元素1-3的平均值
关于比较里面的几个用法 :
------------------------->
# What days of the week did you make money on poker?
selection_vector <- poker_vector > 0
# Select from poker_vector these days
poker_winning_days <- poker_vector[selection_vector]
selection_vector <- poker_vector > 0
# Select from poker_vector these days
poker_winning_days <- poker_vector[selection_vector]
poker_winning_days
这里先是 selection_vector 是从判断 poker_vector > 0 里面判断出来的向量
然后可以直接 从 poker_vector[selection_vector] 里面显示出符合条件的向量
matrix(矩阵):
# Construction of a matrix with 3 rows that contain the numbers 1 up to 9
matrix(1:9 ,byrow = TRUE , nrow =3)
------------------------------------->
> # Construction of a matrix with 3 rows that contain the numbers 1 up to 9
> matrix(1:9, byrow = TRUE, nrow = 3)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
> matrix(1:9, byrow = TRUE, nrow = 3)
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
------------------->
# The first element: US, the second element: Non-US
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)
new_hope <- c(460.998, 314.4)
empire_strikes <- c(290.475, 247.900)
return_jedi <- c(309.306, 165.8)
# Add your code below to Construct matrix
star_wars_matrix <- matrix(c(new_hope,empire_strikes,return_jedi), byrow = TRUE ,nrow=3)
<--另一种用法
star_wars_matrix
colnames(列 ) 和rownames(行 ) :给col和row起名字 具体用法如下
col_names_vector <- c("US","non-US")
row_names_vector <- c("A New Hope","The Empire Strikes Back", "Return of the Jedi")
rownames(star_wars_matrix) <- row_names_vector
row_names_vector <- c("A New Hope","The Empire Strikes Back", "Return of the Jedi")
rownames(star_wars_matrix) <- row_names_vector
colnames(star_wars_matrix) <- col_names_vector
rowSums( )和colSums:
sum_of_rows_vector <- rowSums(my_matrix)
统计出col或者row的值的总数 用于统计
cbind( )和rbind():将matrix合起来的一个函数 ,以col(列的形式表现),row(行的事形式表现)
You can add a column or multiple columns to a matrix with the cbind()
function, which merges matrices and/or vectors together by column. For example:
big_matrix <- cbind(matrix1, matrix2, vector1 ...)
关于matrix的表示方法: 【a,b】 a表示第a行,b表示第b列
For example:
-
my_matrix[1,2]
selects from the first row the second element. -
my_matrix[1:3,2:4]
selects rows 1,2,3 and columns 2,3,4.
If you want to select all elements of a row or a column, no number is needed before or after the comma, respectively:
-
my_matrix[,1]
selects all elements of the first column. -
my_matrix[1,]
selects all elements of the first row.
factor:
gender_vector <- c("Male", "Female", "Female", "Male", "Male")
factor_gender_vector <- factor(gender_vector)
gender_vector
----------------->
Levels: Female Male
<-----------------
factor_temperature_vector <-
factor(temperature_vector, order = TRUE, levels = c("Low", "Medium", "High"))
factor_temperature_vector
----------------->
Levels: Low < Medium < High
levels( ):(给factor设置level)
levels(factor_survey_vector) <- c("Female" , "Male")
factor_survey_vector
------------------>
[1] Male Female Female Male Male
Levels: Female Male
summary( ):
This will give you a quick overview of
some_variable
:对一般的vector来说是统计出长度和类型,对factor后的vector来是统计出levels规定后的出现的次数
survey_vector <- c("M", "F", "F", "M", "M")
factor_survey_vector <- factor(survey_vector)
levels(factor_survey_vector) <- c("Female", "Male")
factor_survey_vector
# Type your code here for 'survey_vector'
summary(survey_vector)
# Type your code here for 'factor_survey_vector'
factor_survey_vector <- factor(survey_vector)
levels(factor_survey_vector) <- c("Female", "Male")
factor_survey_vector
# Type your code here for 'survey_vector'
summary(survey_vector)
# Type your code here for 'factor_survey_vector'
summary(factor_survey_vector)
---------------------->
> summary(survey_vector)
Length Class Mode
5 character character
> # Type your code here for 'factor_survey_vector'
> summary(factor_survey_vector)
Female Male
2 3
Length Class Mode
5 character character
> # Type your code here for 'factor_survey_vector'
> summary(factor_survey_vector)
Female Male
2 3
应用于比较:
speed_vector <- c("Fast", "Slow", "Slow", "Fast", "Ultra-fast")
factor_speed_vector <- factor(speed_vector, ordered = TRUE, levels = c("Slow", "Fast", "Ultra-fast"))
# Your code below
compare_them <- speed_vector[2] > speed_vector[5]
# Is data analyst 2 faster than data analyst 5?
factor_speed_vector <- factor(speed_vector, ordered = TRUE, levels = c("Slow", "Fast", "Ultra-fast"))
# Your code below
compare_them <- speed_vector[2] > speed_vector[5]
# Is data analyst 2 faster than data analyst 5?
compare_them
------------------------->
[1] FALSE
frame:
head(
frame名字
) , tail(
frame名字
) :显示数据集 部分数据 ,head是前6个,tail是后6个
str( frame名字 ) : 应该是看一个数据集的 structure 可以看到有几个observation 每个ob的数据类型 以及每个ob的具体数据
data.frame( 变量(向量)1,变量2.。。。。 ) 将几个 定义的变量 做成 frame
$:
data_frame_name$variable_name ---->
rings_vector <- planets_df$rings
只包含
$ 符号后面的变量
subset( frame.name, condition = ...)
subset(my_data_frame, subset = some_condition)
找出frame中符合设置条件的属性
-------------->
> # 'planets_df' is pre-loaded in your workspace
> # Planets that are smaller than planet Earth:
> small_planets_df <- subset(planets_df, subset = diameter < 1)
> small_planets_df
planets type diameter rotation rings
1 Mercury Terrestrial planet 0.382 58.64 FALSE
2 Venus Terrestrial planet 0.949 -243.02 FALSE
> # Planets that are smaller than planet Earth:
> small_planets_df <- subset(planets_df, subset = diameter < 1)
> small_planets_df
planets type diameter rotation rings
1 Mercury Terrestrial planet 0.382 58.64 FALSE
2 Venus Terrestrial planet 0.949 -243.02 FALSE
4 Mars Terrestrial planet 0.532 1.03 FALSE
order( ):排序函数
a <- c(56,41,31)
order(a)
order(a)
a[order(a)]
--------------------》
> a <- c(56, 41, 31)
> order(a)
[1] 3 2 1
> # Just play around with the order function in the console to see how it works!
> a <- c(56, 41, 31)
> order(a)
[1] 3 2 1
> a[order(a)]
> order(a)
[1] 3 2 1
> # Just play around with the order function in the console to see how it works!
> a <- c(56, 41, 31)
> order(a)
[1] 3 2 1
> a[order(a)]
[1] 31 41 56
--------------------》
# 'planets_df' is pre-loaded in your workspace
# What is the correct ordering based on the planets_df$diameter variable?
positions <- order(planets_df$diameter , decreasing = TRUE)
# Create new "ordered" data frame:
largest_first_df <- planets_df[positions,]
# What is the correct ordering based on the planets_df$diameter variable?
positions <- order(planets_df$diameter , decreasing = TRUE)
# Create new "ordered" data frame:
largest_first_df <- planets_df[positions,]
largest_first_df
------------------------>
> # Create new "ordered" data frame:
> largest_first_df <- planets_df[positions, ]
> largest_first_df
planets type diameter rotation rings
5 Jupiter Gas giant 11.209 0.41 TRUE
6 Saturn Gas giant 9.449 0.43 TRUE
7 Uranus Gas giant 4.007 -0.72 TRUE
8 Neptune Gas giant 3.883 0.67 TRUE
3 Earth Terrestrial planet 1.000 1.00 FALSE
2 Venus Terrestrial planet 0.949 -243.02 FALSE
4 Mars Terrestrial planet 0.532 1.03 FALSE
> largest_first_df <- planets_df[positions, ]
> largest_first_df
planets type diameter rotation rings
5 Jupiter Gas giant 11.209 0.41 TRUE
6 Saturn Gas giant 9.449 0.43 TRUE
7 Uranus Gas giant 4.007 -0.72 TRUE
8 Neptune Gas giant 3.883 0.67 TRUE
3 Earth Terrestrial planet 1.000 1.00 FALSE
2 Venus Terrestrial planet 0.949 -243.02 FALSE
4 Mars Terrestrial planet 0.532 1.03 FALSE
1 Mercury Terrestrial planet 0.382 58.64 FALSE
list:
list( ):
:
my_list <- list(component1, component2 ...)
创建表,可以包含向量,frame等
# Vector with numerics from 1 up to 10
my_vector <- 1:10
# Matrix with numerics from 1 up to 9
my_matrix <- matrix(1:9, ncol = 3)
# First 10 elements of the built-in data frame 'mtcars'
my_df <- mtcars[1:10,]
# Construct list with these different elements:
my_vector <- 1:10
# Matrix with numerics from 1 up to 9
my_matrix <- matrix(1:9, ncol = 3)
# First 10 elements of the built-in data frame 'mtcars'
my_df <- mtcars[1:10,]
# Construct list with these different elements:
my_list <- list(my_vector, my_matrix , my_df)
names( ):
my_list <- list(your_comp1, your_comp2 ...)
names(my_list) <- c("name1", "name2" ...)
my_list <- list(name1 = your_comp1, name2 = your_comp2 ...)
给表里面的 components命名 ,
---------------------->
# Print 'my_list' to the console
names(my_list) <- c("vec", "mat", "df")
names(my_list) <- c("vec", "mat", "df")
my_list
------------------------------->
> names(my_list) <- c("vec", "mat", "df")
> my_list
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> my_list
$vec
<--起的名字在这里显示
[1] 1 2 3 4 5 6 7 8 9 10
$mat
<--起的名字在这里显示
[,1] [,2] [,3][1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
$df <--起的名字在这里显示
mp
g cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
补充:last_actor <- shining_list$actors[5] 将list中actors向量第5个actors赋给lastactors
C() :
c(list1 , some_variable)
给list添加新的变量,会添加的list的最后