I'm trying to understand what OLAP, OLTP, data mining, analytics etc. are about, and I feel like my understanding about some of these concepts is still a bit vague. Information about these subjects tend to be explained in a very complex manner on the internet.
我试图了解OLAP,OLTP,数据挖掘,分析等是什么,我觉得我对其中一些概念的理解仍然有点模糊。关于这些主题的信息往往在互联网上以非常复杂的方式解释。
I feel like a question like this is likely to be closed since it's a very broad one, so I'll try to narrow it down into two questions:
我觉得像这样的问题可能会被关闭,因为它是一个非常宽泛的问题,所以我会尝试将其缩小为两个问题:
Question 1:
After doing research I understand the following about these concepts, is it correct?
在做了研究后,我理解了以下关于这些概念,是否正确?
- Analysis is decomposing something complex, to understand the inner workings better.
- Analytics is predictive analysis on information that requires alot of math and statistics.
- There's many type of databases, but they are either OLTP (transactional) or OLAP (analytical).
- OLTP databases use ER diagrams, and are therefore easier to update because they are in normalized form.
- In contrast, OLAP uses the denormalized star schema's and is therefore easier to query
- OLAP is used for predictive analysis and OLTP is usually used in more practical situations since theres no redundancy.
- Data warehouses is a type of OLAP database, and usually consists out of multiple other databases.
- Data mining is a tool used in analytics, where u use computer software to find out relationships between data so you can predict things (e.g. customer behavior).
分析正在分解复杂的东西,更好地理解内部运作。
分析是对需要大量数学和统计信息的信息进行预测分析。
有许多类型的数据库,但它们是OLTP(事务性)或OLAP(分析性)。
OLTP数据库使用ER图,因此更容易更新,因为它们是标准化形式。
相比之下,OLAP使用非规范化星型模式,因此更容易查询
OLAP用于预测分析,OLTP通常用于更实际的情况,因为没有冗余。
数据仓库是一种OLAP数据库,通常由多个其他数据库组成。
数据挖掘是一种用于分析的工具,您可以使用计算机软件找出数据之间的关系,以便预测事物(例如客户行为)。
Question 2:
I'm especially confused about the difference between analytics and analysis. They say analytics is multidimensional analysis, but what is that supposed to mean?
我对分析和分析之间的区别感到特别困惑。他们说分析是多维分析,但那应该是什么意思?
2 个解决方案
#1
10
I will try to explain you from the top of the pyramid:
我将尝试从金字塔的顶部解释你:
Business Intelligence (what you didn't mentioned) is term in IT which stands for a complex system and gives useful informations about company from data.
商业智能(你没有提到的)是IT中的术语,它代表一个复杂的系统,并从数据中提供有关公司的有用信息。
So, BI systems has target: Clean, accurate and meaningful informations. Clean means there is no tech problems (missing keys, incomplete data ect). Accurate means accurate - BI systems are also used as fault checker of production database (logical faults - i.e invoice bill is too high, or inactive partner is used ect). It has been accomplished with rules. Meaningful is hard to explain, but in simple english, it's all your data (even excel table from the last meeting), in way you want.
因此,BI系统的目标是:清晰,准确和有意义的信息。清洁意味着没有技术问题(缺少密钥,不完整的数据等)。准确意味着准确 - BI系统也被用作生产数据库的故障检查器(逻辑故障 - 即发票账单太高,或者使用非活动伙伴等)。它已经完成了规则。有意义很难解释,但简单的英语,它是你想要的方式的所有数据(甚至是上次会议的excel表)。
So, BI system has back-end: It's data warehouse. DWH is nothing else than a database (instance, not software). It can be stored in RDBMS, analytical db (columnar or document store types), or NoSQL databases.
因此,BI系统有后端:它是数据仓库。 DWH只不过是数据库(实例,而不是软件)。它可以存储在RDBMS,分析数据库(列式或文档存储类型)或NoSQL数据库中。
Data warehouse is term used usually for whole database that I explained above. There could be number of data-marts (if Kimball model is used) - more often, or relational system in 3rd normalized form (Inmon model) called enterprise data warehouse.
数据仓库通常用于我上面解释的整个数据库。可能有许多数据集市(如果使用Kimball模型) - 更常见,或称为企业数据仓库的第三规范化形式(Inmon模型)中的关系系统。
Data marts are tables inside DWH that are related (star schema, snowflake schema). Fact table (business process in denormalized form ) and dimension tables.
数据集市是DWH中相关的表(星型模式,雪花模式)。事实表(非规范化形式的业务流程)和维度表。
Each data mart represents one business process. Example: DWH has 3 data marts. One is retail sales, second is export, and third is import. In retail you can see total sales, qty sold, import price, profit (measures) by SKU, date, store, city ect (dimensions).
每个数据集市代表一个业务流程。示例:DWH有3个数据集市。一个是零售,第二个是出口,第三个是进口。在零售中,您可以看到SKU的总销售额,销售数量,进口价格,利润(度量),日期,商店,城市等(维度)。
Loading data in DWH is called ETL(extract, transform, load).
在DWH中加载数据称为ETL(提取,转换,加载)。
-
Extract data from multiple sources (ERP db, CRM db, excel files, web service...)
从多个来源提取数据(ERP db,CRM db,excel文件,Web服务......)
-
Transform data (clean data, connect data from diff sources, match keys, mine data)
转换数据(干净的数据,连接来自差异源的数据,匹配密钥,挖掘数据)
-
Load data (Load transformed data in specific data marts)
加载数据(在特定数据集市中加载转换后的数据)
edit because of comment: ETL process is usually created with ETL tool, or manually with some programming language (python, c# ect) and APIs.
因注释而编辑:ETL过程通常使用ETL工具创建,或者使用某种编程语言(python,c#ect)和API手动创建。
ETL process is group of SQLs, procedures, scripts and rules related and separated in 3 parts (look above), controlled by meta data. It's either scheduled (every night, every few hours) or live (change data capture, triggers, transactions).
ETL过程是由一组SQL,过程,脚本和规则相关并分成3个部分(见上文),由元数据控制。它可以安排(每晚,每隔几个小时)或实时(更改数据捕获,触发器,事务)。
OLTP and OLAP are types of data processing. OLTP is used in transaction purpose, between database and software (usually only one way of input/output data). OLAP is for analitical purpose, and this means there is multiple sources, historical data, high select query performance, mined data.
OLTP和OLAP是数据处理的类型。 OLTP用于事务目的,数据库和软件之间(通常只有一种输入/输出数据)。 OLAP用于管理目的,这意味着有多个源,历史数据,高选择查询性能,挖掘数据。
edit because of comment: Data Processing is way how data is stored and accessed from database. So, based on of your needs, database is set in different way.
因注释而编辑:数据处理是数据从数据库存储和访问的方式。因此,根据您的需求,数据库以不同的方式设置。
Image from http://datawarehouse4u.info/:
来自http://datawarehouse4u.info/的图片:
Data mining is the computational process of discovering patterns in large data sets. Mined data can give you more insight view of business process or even forecast.
数据挖掘是在大型数据集中发现模式的计算过程。挖掘的数据可以为您提供更深入的业务流程视图甚至预测。
Analysis is a verb, which in BI world means simplicity of getting asked information from data. Multidimensional analysis actually says how system is slicing your data (with dimensions inside cube). Wikipedia said that analysis of data is a process of inspecting data with the goal of discovering useful information.
分析是动词,在BI世界中意味着从数据中获取信息的简单性。多维分析实际上说明了系统如何切割数据(维度在多维数据集内)。*说,数据分析是一个检查数据的过程,目的是发现有用的信息。
Analytics is a noun and it represent a result of analysis process.
Analytics是一个名词,它代表了分析过程的结果。
Don't get so much fuss about those two words.
不要对这两个词大惊小怪。
#2
-1
I can tell you about Data mining as i had project on Data mining. Data mining is not a tool ,Its a method of mining data and different tools used for data mining is WEKA ,RAPID MINER etc. Data mining follows many algorithms which are inbuilt in tools like Weka ,Rapid miner. Algorithms like Clustering algorithm , assosiation algorithm etc. A simple example i can give you of data mining . Teacher is teaching science subject in a class by using different methods of teaching like using chalkboard,presentation,Practical. So now our aim is to find which method is suitable for students. Then we do survey and take students opinion 40 students like chalk board ,30 likes presentation and 20 likes practical method. So with help of this data we can make the rules for example Science subject should be taught by chalk board method. To knw different algorithms you can use google :D.
我可以告诉你关于数据挖掘的事情,因为我有关于数据挖掘的项目。数据挖掘不是一种工具,它是一种挖掘数据的方法和用于数据挖掘的不同工具是WEKA,RAPID MINER等。数据挖掘遵循许多算法,这些算法内置在Weka,Rapid miner等工具中。像聚类算法,协同算法等算法。一个简单的例子,我可以给你数据挖掘。教师通过使用不同的教学方法,如使用黑板,演示,实践,在课堂上教授科学科目。所以现在我们的目标是找到适合学生的方法。然后我们做了调查,并采取学生的意见40名学生喜欢粉笔板,30喜欢演示,20喜欢实用的方法。因此,在这些数据的帮助下,我们可以制作例如科学主题的规则,应该用粉笔板方法来教授。要使用不同的算法,您可以使用谷歌:D。
#1
10
I will try to explain you from the top of the pyramid:
我将尝试从金字塔的顶部解释你:
Business Intelligence (what you didn't mentioned) is term in IT which stands for a complex system and gives useful informations about company from data.
商业智能(你没有提到的)是IT中的术语,它代表一个复杂的系统,并从数据中提供有关公司的有用信息。
So, BI systems has target: Clean, accurate and meaningful informations. Clean means there is no tech problems (missing keys, incomplete data ect). Accurate means accurate - BI systems are also used as fault checker of production database (logical faults - i.e invoice bill is too high, or inactive partner is used ect). It has been accomplished with rules. Meaningful is hard to explain, but in simple english, it's all your data (even excel table from the last meeting), in way you want.
因此,BI系统的目标是:清晰,准确和有意义的信息。清洁意味着没有技术问题(缺少密钥,不完整的数据等)。准确意味着准确 - BI系统也被用作生产数据库的故障检查器(逻辑故障 - 即发票账单太高,或者使用非活动伙伴等)。它已经完成了规则。有意义很难解释,但简单的英语,它是你想要的方式的所有数据(甚至是上次会议的excel表)。
So, BI system has back-end: It's data warehouse. DWH is nothing else than a database (instance, not software). It can be stored in RDBMS, analytical db (columnar or document store types), or NoSQL databases.
因此,BI系统有后端:它是数据仓库。 DWH只不过是数据库(实例,而不是软件)。它可以存储在RDBMS,分析数据库(列式或文档存储类型)或NoSQL数据库中。
Data warehouse is term used usually for whole database that I explained above. There could be number of data-marts (if Kimball model is used) - more often, or relational system in 3rd normalized form (Inmon model) called enterprise data warehouse.
数据仓库通常用于我上面解释的整个数据库。可能有许多数据集市(如果使用Kimball模型) - 更常见,或称为企业数据仓库的第三规范化形式(Inmon模型)中的关系系统。
Data marts are tables inside DWH that are related (star schema, snowflake schema). Fact table (business process in denormalized form ) and dimension tables.
数据集市是DWH中相关的表(星型模式,雪花模式)。事实表(非规范化形式的业务流程)和维度表。
Each data mart represents one business process. Example: DWH has 3 data marts. One is retail sales, second is export, and third is import. In retail you can see total sales, qty sold, import price, profit (measures) by SKU, date, store, city ect (dimensions).
每个数据集市代表一个业务流程。示例:DWH有3个数据集市。一个是零售,第二个是出口,第三个是进口。在零售中,您可以看到SKU的总销售额,销售数量,进口价格,利润(度量),日期,商店,城市等(维度)。
Loading data in DWH is called ETL(extract, transform, load).
在DWH中加载数据称为ETL(提取,转换,加载)。
-
Extract data from multiple sources (ERP db, CRM db, excel files, web service...)
从多个来源提取数据(ERP db,CRM db,excel文件,Web服务......)
-
Transform data (clean data, connect data from diff sources, match keys, mine data)
转换数据(干净的数据,连接来自差异源的数据,匹配密钥,挖掘数据)
-
Load data (Load transformed data in specific data marts)
加载数据(在特定数据集市中加载转换后的数据)
edit because of comment: ETL process is usually created with ETL tool, or manually with some programming language (python, c# ect) and APIs.
因注释而编辑:ETL过程通常使用ETL工具创建,或者使用某种编程语言(python,c#ect)和API手动创建。
ETL process is group of SQLs, procedures, scripts and rules related and separated in 3 parts (look above), controlled by meta data. It's either scheduled (every night, every few hours) or live (change data capture, triggers, transactions).
ETL过程是由一组SQL,过程,脚本和规则相关并分成3个部分(见上文),由元数据控制。它可以安排(每晚,每隔几个小时)或实时(更改数据捕获,触发器,事务)。
OLTP and OLAP are types of data processing. OLTP is used in transaction purpose, between database and software (usually only one way of input/output data). OLAP is for analitical purpose, and this means there is multiple sources, historical data, high select query performance, mined data.
OLTP和OLAP是数据处理的类型。 OLTP用于事务目的,数据库和软件之间(通常只有一种输入/输出数据)。 OLAP用于管理目的,这意味着有多个源,历史数据,高选择查询性能,挖掘数据。
edit because of comment: Data Processing is way how data is stored and accessed from database. So, based on of your needs, database is set in different way.
因注释而编辑:数据处理是数据从数据库存储和访问的方式。因此,根据您的需求,数据库以不同的方式设置。
Image from http://datawarehouse4u.info/:
来自http://datawarehouse4u.info/的图片:
Data mining is the computational process of discovering patterns in large data sets. Mined data can give you more insight view of business process or even forecast.
数据挖掘是在大型数据集中发现模式的计算过程。挖掘的数据可以为您提供更深入的业务流程视图甚至预测。
Analysis is a verb, which in BI world means simplicity of getting asked information from data. Multidimensional analysis actually says how system is slicing your data (with dimensions inside cube). Wikipedia said that analysis of data is a process of inspecting data with the goal of discovering useful information.
分析是动词,在BI世界中意味着从数据中获取信息的简单性。多维分析实际上说明了系统如何切割数据(维度在多维数据集内)。*说,数据分析是一个检查数据的过程,目的是发现有用的信息。
Analytics is a noun and it represent a result of analysis process.
Analytics是一个名词,它代表了分析过程的结果。
Don't get so much fuss about those two words.
不要对这两个词大惊小怪。
#2
-1
I can tell you about Data mining as i had project on Data mining. Data mining is not a tool ,Its a method of mining data and different tools used for data mining is WEKA ,RAPID MINER etc. Data mining follows many algorithms which are inbuilt in tools like Weka ,Rapid miner. Algorithms like Clustering algorithm , assosiation algorithm etc. A simple example i can give you of data mining . Teacher is teaching science subject in a class by using different methods of teaching like using chalkboard,presentation,Practical. So now our aim is to find which method is suitable for students. Then we do survey and take students opinion 40 students like chalk board ,30 likes presentation and 20 likes practical method. So with help of this data we can make the rules for example Science subject should be taught by chalk board method. To knw different algorithms you can use google :D.
我可以告诉你关于数据挖掘的事情,因为我有关于数据挖掘的项目。数据挖掘不是一种工具,它是一种挖掘数据的方法和用于数据挖掘的不同工具是WEKA,RAPID MINER等。数据挖掘遵循许多算法,这些算法内置在Weka,Rapid miner等工具中。像聚类算法,协同算法等算法。一个简单的例子,我可以给你数据挖掘。教师通过使用不同的教学方法,如使用黑板,演示,实践,在课堂上教授科学科目。所以现在我们的目标是找到适合学生的方法。然后我们做了调查,并采取学生的意见40名学生喜欢粉笔板,30喜欢演示,20喜欢实用的方法。因此,在这些数据的帮助下,我们可以制作例如科学主题的规则,应该用粉笔板方法来教授。要使用不同的算法,您可以使用谷歌:D。