PySpark in Action: Python data analysis at scale

作者:Jonathan Rioux

出版社:Manning Publications



文件格式: pdf

标签: 软件工程 计算机科学 分布式 Python 2020

PySpark in Action is a carefully engineered tutorial that helps you use PySpark to deliver your data-driven applications at any scale. This clear and hands-on guide shows you how to enlarge your processing capabilities across multiple machines with data from any source, ranging from Hadoop-based clusters to Excel worksheets. You’ll learn how to break down big analysis tasks int...

