基本信息
源码名称:林子雨Spark编程基础(Python版)源码(全)
源码大小:0.02M
文件格式:.zip
开发语言:Python
更新时间:2024-07-28
   友情提示:(无需注册或充值,赞助后即可获取资源下载链接)

     嘿,亲!知识可是无价之宝呢,但咱这精心整理的资料也耗费了不少心血呀。小小地破费一下,绝对物超所值哦!如有下载和支付问题,请联系我们QQ(微信同号):813200300

本次赞助数额为: 2 元 
   源码介绍
林子雨编著《Spark编程基础(Python版)》各章Python源码

#/usr/local/spark/mycode/rdd/Combine.py
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("Combine ")
sc = SparkContext(conf = conf)
data=sc.parallelize([("company-1",88),("company-1",96),("company-1",85),("company-2",94),("company-2",86),("company-2",74),("company-3",86),("company-3",88),("company-3",92)],3)
res = data.combineByKey(\
            lambda income:(income,1),\
            lambda acc,income:(acc[0] income, acc[1] 1),\
            lambda acc1,acc2:(acc1[0] acc2[0],acc1[1] acc2[1])). \
map(lambda x:(x[0],x[1][0],x[1][0]/float(x[1][1])))
res.repartition(1).saveAsTextFile("file:///usr/local/spark/mycode/pairrdd/result")

【源码目录】

代码

├── 第3章 Spark环境搭建和使用方法
│   └── WordCount.py
├── 第4章 RDD编程
│   ├── Combine.py
│   ├── FileSort.py
│   ├── SecondarySortApp.py
│   ├── SparkOperateHBase.py
│   ├── SparkWriteHBase.py
│   ├── TestPartitioner.py
│   ├── TopN.py
│   ├── file0.txt
│   ├── file1.txt
│   ├── file2.txt
│   ├── file3.txt
│   └── file4.txt
├── 第5章 Spark SQL
│   └── InsertStudent.py
├── 第6章 Spark Streaming
│   ├── DataSourceSocket.py
│   ├── FileStreaming.py
│   ├── NetworkWordCount.py
│   ├── NetworkWordCountStateful.py
│   ├── NetworkWordCountStatefulDB.py
│   ├── NetworkWordCountStatefulText.py
│   └── WindowedNetworkWordCount.py
└── 第7章 Structured Streaming
    ├── StructuredNetworkWordCount.py
    ├── StructuredNetworkWordCountFileSink.py
    ├── StructuredNetworkWordCountWithMonitor.py
    ├── spark_ss_filesource.py
    ├── spark_ss_filesource_generate.py
    ├── spark_ss_kafka_consumer.py
    ├── spark_ss_kafka_producer.py
    ├── spark_ss_rate.py
    └── spark_ss_test_delay.py

5 directories, 30 files