logo
分类于: 编程语言 人工智能

简介

Hadoop: The Definitive Guide: 4th Edition

Hadoop: The Definitive Guide: 4th Edition 8.9分

资源最后更新于 2020-11-26 07:31:14

作者:Tom White

出版社:O'Reilly Media

出版日期:2015-01

ISBN:9781491901632

文件格式: pdf

标签: Hadoop 大数据 BigData 计算机 分布式 hadoop 机器学习 O'Reilly

简介· · · · · ·

Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters.

Using Hadoop 2 exclusively, author Tom Wh...

想要: 点击会收藏到你的 我的收藏,可以在这里查看

已收: 表示已经收藏

Tips: 注册一个用户 可以通过用户中心得到电子书更新的通知哦

目录

Hadoop Fundamentals
Chapter 1Meet Hadoop
Data!
Data Storage and Analysis
Querying All Your Data
Beyond Batch
Comparison with Other Systems
A Brief History of Apache Hadoop
What’s in This Book?
Chapter 2MapReduce
A Weather Dataset
Analyzing the Data with Unix Tools
Analyzing the Data with Hadoop
Scaling Out
Hadoop Streaming
Chapter 3The Hadoop Distributed Filesystem
The Design of HDFS
HDFS Concepts
The Command-Line Interface
Hadoop Filesystems
The Java Interface
Data Flow
Parallel Copying with distcp
Chapter 4YARN
Anatomy of a YARN Application Run
YARN Compared to MapReduce 1
Scheduling in YARN
Further Reading
Chapter 5Hadoop I/O
Data Integrity
Compression
Serialization
File-Based Data Structures
MapReduce
Chapter 1Developing a MapReduce Application
The Configuration API
Setting Up the Development Environment
Writing a Unit Test with MRUnit
Running Locally on Test Data
Running on a Cluster
Tuning a Job
MapReduce Workflows
Chapter 2How MapReduce Works
Anatomy of a MapReduce Job Run
Failures
Shuffle and Sort
Task Execution
Chapter 3MapReduce Types and Formats
MapReduce Types
Input Formats
Output Formats
Chapter 4MapReduce Features
Counters
Sorting
Joins
Side Data Distribution
MapReduce Library Classes
Hadoop Operations
Chapter 1Setting Up a Hadoop Cluster
Cluster Specification
Cluster Setup and Installation
Hadoop Configuration
Security
Benchmarking a Hadoop Cluster
Chapter 2Administering Hadoop
HDFS
Monitoring
Maintenance
Related Projects
Chapter 1Avro
Avro Data Types and Schemas
In-Memory Serialization and Deserialization
Avro Datafiles
Interoperability
Schema Resolution
Sort Order
Avro MapReduce
Sorting Using Avro MapReduce
Avro in Other Languages
Chapter 2Parquet
Data Model
Parquet File Format
Parquet Configuration
Writing and Reading Parquet Files
Parquet MapReduce
Chapter 3Flume
Installing Flume
An Example
Transactions and Reliability
The HDFS Sink
Fan Out
Distribution: Agent Tiers
Sink Groups
Integrating Flume with Applications
Component Catalog
Further Reading
Chapter 4Sqoop
Getting Sqoop
Sqoop Connectors
A Sample Import
Generated Code
Imports: A Deeper Look
Working with Imported Data
Importing Large Objects
Performing an Export
Exports: A Deeper Look
Further Reading
Chapter 5Pig
Installing and Running Pig
An Example
Comparison with Databases
Pig Latin
User-Defined Functions
Data Processing Operators
Pig in Practice
Further Reading
Chapter 6Hive
Installing Hive
An Example
Running Hive
Comparison with Traditional Databases
HiveQL
Tables
Querying Data
User-Defined Functions
Further Reading
Chapter 7Crunch
An Example
The Core Crunch API
Pipeline Execution
Crunch Libraries
Further Reading
Chapter 8Spark
Installing Spark
An Example
Resilient Distributed Datasets
Shared Variables
Anatomy of a Spark Job Run
Executors and Cluster Managers
Further Reading
Chapter 9HBase
HBasics
Concepts
Installation
Clients
Building an Online Query Application
HBase Versus RDBMS
Praxis
Further Reading
Chapter 10ZooKeeper
Installing and Running ZooKeeper
An Example
The ZooKeeper Service
Building Applications with ZooKeeper
ZooKeeper in Production
Further Reading
Case Studies
Chapter 1Composable Data at Cerner
From CPUs to Semantic Integration
Enter Apache Crunch
Building a Complete Picture
Integrating Healthcare Data
Composability over Frameworks
Moving Forward
Chapter 2Biological Data Science: Saving Lives with Software
The Structure of DNA
The Genetic Code: Turning DNA Letters into Proteins
Thinking of DNA as Source Code
The Human Genome Project and Reference Genomes
Sequencing and Aligning DNA
ADAM, A Scalable Genome Analysis Platform
From Personalized Ads to Personalized Medicine
Join In
Chapter 3Cascading
Fields, Tuples, and Pipes
Operations
Taps, Schemes, and Flows
Cascading in Practice
Flexibility
Hadoop and Cascading at ShareThis
Summary
Appendix Installing Apache Hadoop
Prerequisites
Installation
Configuration
Appendix Cloudera’s Distribution Including Apache Hadoop
Appendix Preparing the NCDC Weather Data
Appendix The Old and New Java MapReduce APIs
Case Studies
Chapter 1Composable Data at Cerner
From CPUs to Semantic Integration
Enter Apache Crunch
Building a Complete Picture
Integrating Healthcare Data
Composability over Frameworks
Moving Forward
Chapter 2Biological Data Science: Saving Lives with Software
The Structure of DNA
The Genetic Code: Turning DNA Letters into Proteins
Thinking of DNA as Source Code
The Human Genome Project and Reference Genomes
Sequencing and Aligning DNA
ADAM, A Scalable Genome Analysis Platform
From Personalized Ads to Personalized Medicine
Join In
Chapter 3Cascading
Fields, Tuples, and Pipes
Operations
Taps, Schemes, and Flows
Cascading in Practice
Flexibility
Hadoop and Cascading at ShareThis
Summary
Appendix Installing Apache Hadoop
Prerequisites
Installation
Configuration
Appendix Cloudera’s Distribution Including Apache Hadoop
Appendix Preparing the NCDC Weather Data
Appendix The Old and New Java MapReduce APIs