This training session will cover the system administration aspects of Hadoop from installation and configuration to load balancing and tuning, to diagnosing and solving problems in your deployment. At the end of the course.

[box]

  • Introduction
    • About This Course
    • About Z Data
    • Course logistics/administration
  • An Introduction To Hadoop And HDFS
    • Why Hadoop?
    • HDFS
    • MapReduce
    • Hive, Pig, HBase and other sub-projects
  • Planning Your Hadoop Cluster
    • General Planning Considerations
    • Choosing The Right Hardware
    • Node Topologies
    • Choosing The Right Software
  • Deploying Your Cluster
    • Installing Hadoop
    • Typical Configuration Parameters
    • Hands-On Exercise: Install a pseudo-distributed Hadoop Cluster
  • Cluster Maintenance
    • Starting and stopping MapReduce jobs
    • Hands-On Exercise: Using the JobTracker UI to start and kill jobs
    • Checking HDFS with fsck
    • Copying data with distcp
    • Rebalancing cluster nodes
    • Demo
    • Adding and removing cluster nodes
    • Backup And Restore
    • Upgrading and Migrating
  • Scheduling Jobs
    • The FIFO Scheduler
    • The Fair Scheduler
    • Hands-On Exercise: Using Fair Scheduler
  • Cluster Monitoring and Troubleshooting
    • General system profiling
    • Using the NameNode UI to inspect the filesystem
    • Monitoring with Ganglia
    • Demo
    • Other monitoring tools
    • Hadoop Log Files
    • Benchmarking Your Cluster
    • Typical problems
    • Useful alerts
    • Dealing with a corrupt NameNode
  • Installing And Managing Other Hadoop Projects
    1. Hive
    2. HBase
    3. Pig
  • Populating HDFS From Databases Using Sqoop
    • What is Sqoop?
    • Sqoop command-line options
    • Hands-On Exercise: Importing data from MySQL
  • Conclusion

[/box]