[Spark][Python]Spark 访问 mysql , 生成 dataframe 的例子:

时间:2022-02-06 02:29:25

[Spark][Python]Spark 访问 mysql , 生成 dataframe 的例子:

mydf001=sqlContext.read.format("jdbc").option("url","jdbc:mysql://localhost/loudacre")\
.option("dbtable","accounts").option("user","training").option("password","training").load()

In [10]: mydf001=sqlContext.read.format("jdbc").option("url","jdbc:mysql://localhost/loudacre")\
....: .option("dbtable","accounts").option("user","training").option("password","training").load()
17/10/03 05:59:53 INFO hive.HiveContext: default warehouse location is /user/hive/warehouse
17/10/03 05:59:53 INFO hive.HiveContext: Initializing metastore client version 1.1.0 using Spark classes.
17/10/03 05:59:53 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0-cdh5.7.0
17/10/03 05:59:53 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0-cdh5.7.0
17/10/03 05:59:56 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost.localdomain:9083
17/10/03 05:59:56 INFO hive.metastore: Opened a connection to metastore, current connections: 1
17/10/03 05:59:56 INFO hive.metastore: Connected to metastore.
17/10/03 05:59:56 INFO session.SessionState: Created local directory: /tmp/c2d22d09-7425-4bb3-94c3-39cb32267c7d_resources
17/10/03 05:59:56 INFO session.SessionState: Created HDFS directory: /tmp/hive/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d
17/10/03 05:59:56 INFO session.SessionState: Created local directory: /tmp/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d
17/10/03 05:59:56 INFO session.SessionState: Created HDFS directory: /tmp/hive/training/c2d22d09-7425-4bb3-94c3-39cb32267c7d/_tmp_space.db
17/10/03 05:59:56 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr.

In [11]:

In [11]: type(mydf001)
Out[11]: pyspark.sql.dataframe.DataFrame

In [12]: mydf001.count()
17/10/03 06:00:29 INFO spark.SparkContext: Starting job: count at NativeMethodAccessorImpl.java:-2
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Registering RDD 2 (count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Got job 0 (count at NativeMethodAccessorImpl.java:-2) with 1 output partitions
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0)
17/10/03 06:00:29 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[2] at count at NativeMethodAccessorImpl.java:-2), which has no missing parents
17/10/03 06:00:30 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 11.0 KB, free 11.0 KB)
17/10/03 06:00:31 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 5.2 KB, free 16.1 KB)
17/10/03 06:00:31 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:36793 (size: 5.2 KB, free: 208.8 MB)
17/10/03 06:00:31 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/10/03 06:00:31 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[2] at count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:31 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/10/03 06:00:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 1911 bytes)
17/10/03 06:00:31 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
17/10/03 06:00:32 INFO codegen.GenerateMutableProjection: Code generated in 425.82589 ms
17/10/03 06:00:32 INFO codegen.GenerateUnsafeProjection: Code generated in 78.278589 ms
17/10/03 06:00:33 INFO codegen.GenerateMutableProjection: Code generated in 84.676206 ms
17/10/03 06:00:33 INFO codegen.GenerateUnsafeRowJoiner: Code generated in 60.144399 ms
17/10/03 06:00:33 INFO codegen.GenerateUnsafeProjection: Code generated in 95.977074 ms
17/10/03 06:00:34 INFO jdbc.JDBCRDD: closed connection
17/10/03 06:00:34 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1334 bytes result sent to driver
17/10/03 06:00:34 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3081 ms on localhost (1/1)
17/10/03 06:00:34 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
17/10/03 06:00:34 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (count at NativeMethodAccessorImpl.java:-2) finished in 3.163 s
17/10/03 06:00:34 INFO scheduler.DAGScheduler: looking for newly runnable stages
17/10/03 06:00:34 INFO scheduler.DAGScheduler: running: Set()
17/10/03 06:00:34 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1)
17/10/03 06:00:34 INFO scheduler.DAGScheduler: failed: Set()
17/10/03 06:00:34 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[5] at count at NativeMethodAccessorImpl.java:-2), which has no missing parents
17/10/03 06:00:34 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 12.1 KB, free 28.3 KB)
17/10/03 06:00:34 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 5.6 KB, free 33.9 KB)
17/10/03 06:00:34 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:36793 (size: 5.6 KB, free: 208.8 MB)
17/10/03 06:00:34 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
17/10/03 06:00:34 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[5] at count at NativeMethodAccessorImpl.java:-2)
17/10/03 06:00:34 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
17/10/03 06:00:34 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, partition 0,NODE_LOCAL, 1999 bytes)
17/10/03 06:00:34 INFO executor.Executor: Running task 0.0 in stage 1.0 (TID 1)
17/10/03 06:00:34 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
17/10/03 06:00:34 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote fetches in 32 ms
17/10/03 06:00:35 INFO codegen.GenerateMutableProjection: Code generated in 52.636353 ms
17/10/03 06:00:35 INFO codegen.GenerateMutableProjection: Code generated in 49.757505 ms
17/10/03 06:00:35 INFO executor.Executor: Finished task 0.0 in stage 1.0 (TID 1). 1666 bytes result sent to driver
17/10/03 06:00:35 INFO scheduler.DAGScheduler: ResultStage 1 (count at NativeMethodAccessorImpl.java:-2) finished in 0.795 s
17/10/03 06:00:35 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 789 ms on localhost (1/1)
17/10/03 06:00:35 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
17/10/03 06:00:35 INFO scheduler.DAGScheduler: Job 0 finished: count at NativeMethodAccessorImpl.java:-2, took 6.451521 s
Out[12]: 129761

In [13]:

[Spark][Python]Spark 访问 mysql , 生成 dataframe 的例子:的更多相关文章

  1. [Spark][Python]spark 从 avro 文件获取 Dataframe 的例子

    [Spark][Python]spark 从 avro 文件获取 Dataframe 的例子 从如下地址获取文件: https://github.com/databricks/spark-avro/r ...

  2. Spark(Python) 从内存中建立 RDD 的例子

    Spark(Python) 从内存中建立 RDD 的例子: myData = ["Alice","Carlos","Frank",&quot ...

  3. [Spark][Python]Spark Python 索引页

    Spark Python 索引页 为了查找方便,建立此页 === RDD 基本操作: [Spark][Python]groupByKey例子

  4. [spark][python]Spark map 处理

    map 就是对一个RDD的各个元素都施加处理,得到一个新的RDD 的过程 [training@localhost ~]$ cat names.txtYear,First Name,County,Sex ...

  5. crontab定时运行python脚本访问MySQL遇到问题

    最近写了一个python脚本来定时备份MySQL数据库.具体实现如下: 1)python脚本中使用os.system("mysqldump -h127.0.0.1 -uroot -ppass ...

  6. python+pymysql访问mysql数据库

    今天跟大家分享两种场景的python连接MySQL方法: 场景一:连接远程MySQL 首先,安装pymysql:在命令行执行pip install pymysql指令. 然后,导入pymysql: i ...

  7. [Spark][Python]Spark Join 小例子

    [training@localhost ~]$ hdfs dfs -cat people.json {"name":"Alice","pcode&qu ...

  8. 今天看到可以用sqlalchemy在python*问Mysql

    from sqlalchemy import create_engine, MetaData, and_ 具体的还没有多看.

  9. 基础 ADO.NET 访问MYSQL 与 MSSQL 数据库例子

    虽然实际开发时都是用 Entity 了,但是基础还是要掌握和复习的 ^^ //set connection string, server,database,username,password MySq ...

随机推荐

  1. Hexo的coney主题的一些补充说明

    title: Hexo的coney主题的一些补充说明 date: 2014-12-14 14:10:44 categories: Hexo tags: [hexo,技巧] --- Coney是一个非常 ...

  2. MyBatis知多少(2)

    MyBatis从目前最流行的关系数据库访问方法中吸收了大量的优秀特征和思想,并找出其中的协同增效作用.下图展示了MyBatis框架是如何吸收我们在多年使用不同方式进行数据库集成的 开发过程中所学到的知 ...

  3. sublime简单配置

    Preferences------->settings user { "font_face": "Courier New", "font_siz ...

  4. 《Pointers On C》读书笔记(第三章 数据)

    1.在C语言中,仅有4种基本数据类型:整型.浮点型.指针和聚合类型(如数组和结构等). 整型家族包括字符.短整型.整型和长整型,它们都分为有符号和无符号两种. 标准规定整型值相互之间大小的规则:长整型 ...

  5. iOS 用户密码 数字字母特殊符号设置 判断

    //直接调用这个方法就行 -(int)checkIsHaveNumAndLetter:(NSString*)password{ //数字条件 NSRegularExpression *tNumRegu ...

  6. JAVA函数的重载和重写

    一.什么是重载(overlording) 在JAVA中,可以在同一个类中存在多个函数,函数名称相同但参数列表不同.这就是函数的重载(overlording).这是类的多太性表现之一. 二.重载的作用: ...

  7. Go的并发调度原理

    Go语言是为并发而生的语言,Go语言是为数不多的在语言层面实现并发的语言:也正是Go语言的并发特性,吸引了全球无数的开发者.   并发(concurrency)和并行(parallellism) 并发 ...

  8. Android为TV端助力 切换fragment的两种方式

    使用add方法切换时:载入Fragment1Fragment1 onCreateFragment1 onCreateViewFragment1 onStartFragment1 onResume用以下 ...

  9. C# 内存理论与实践

    The C# Memory Model in Theory and Practice Best Practices All code you write should rely only on the ...

  10. BootStrap------之模态框1

    <!DOCTYPE html> <html lang="zh-cn"> <head> <meta charset="utf-8& ...