嘿,亲!知识可是无价之宝呢,但咱这精心整理的资料也耗费了不少心血呀。小小地破费一下,绝对物超所值哦!如有下载和支付问题,请联系我们QQ(微信同号):813200300
本次赞助数额为: 2 元微信扫码支付:2 元
请留下您的邮箱,我们将在2小时内将文件发到您的邮箱
林子雨编著《Spark编程基础(Python版)》各章Python源码
#/usr/local/spark/mycode/rdd/Combine.py
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("Combine ")
sc = SparkContext(conf = conf)
data=sc.parallelize([("company-1",88),("company-1",96),("company-1",85),("company-2",94),("company-2",86),("company-2",74),("company-3",86),("company-3",88),("company-3",92)],3)
res = data.combineByKey(\
lambda income:(income,1),\
lambda acc,income:(acc[0] income, acc[1] 1),\
lambda acc1,acc2:(acc1[0] acc2[0],acc1[1] acc2[1])). \
map(lambda x:(x[0],x[1][0],x[1][0]/float(x[1][1])))
res.repartition(1).saveAsTextFile("file:///usr/local/spark/mycode/pairrdd/result")
【源码目录】
代码
├── 第3章 Spark环境搭建和使用方法
│ └── WordCount.py
├── 第4章 RDD编程
│ ├── Combine.py
│ ├── FileSort.py
│ ├── SecondarySortApp.py
│ ├── SparkOperateHBase.py
│ ├── SparkWriteHBase.py
│ ├── TestPartitioner.py
│ ├── TopN.py
│ ├── file0.txt
│ ├── file1.txt
│ ├── file2.txt
│ ├── file3.txt
│ └── file4.txt
├── 第5章 Spark SQL
│ └── InsertStudent.py
├── 第6章 Spark Streaming
│ ├── DataSourceSocket.py
│ ├── FileStreaming.py
│ ├── NetworkWordCount.py
│ ├── NetworkWordCountStateful.py
│ ├── NetworkWordCountStatefulDB.py
│ ├── NetworkWordCountStatefulText.py
│ └── WindowedNetworkWordCount.py
└── 第7章 Structured Streaming
├── StructuredNetworkWordCount.py
├── StructuredNetworkWordCountFileSink.py
├── StructuredNetworkWordCountWithMonitor.py
├── spark_ss_filesource.py
├── spark_ss_filesource_generate.py
├── spark_ss_kafka_consumer.py
├── spark_ss_kafka_producer.py
├── spark_ss_rate.py
└── spark_ss_test_delay.py
5 directories, 30 files