Course Details
Course Outline
1 - Module 1: Overview of Big Data
What is big dataThe big data pipelineBig data architectural principals
2 - Module 2: Big Data ingestion and transfer
Overview: Data ingestionTransferring data
3 - Module 3: Big data streaming and Amazon Kinesis
Stream processing of big dataAmazon KinesisAmazon Kinesis Data FirehoseAmazon Kinesis Video StreamsAmazon Kinesis Data AnalyticsHands-on lab 1: Streaming and Processing Apache Server Logs Using Amazon Kinesis
4 - Module 4: Big data storage solutions
AWS data storage optionsStorage solutions conceptsFactors in choosing a data store
5 - Module 5: Big data processing and analytics
Big data processing and analyticsAmazon AthenaHands-on lab 2: Using Amazon Athena to Analyze Log Data
6 - Module 6: Apache Hadoop and Amazon EMR
Introduction to Amazon EMR and Apache HadoopBest practices for ingesting dataAmazon EMRAmazon EMR architectureHands-on lab 3: Storing and Querying Data on Amazon DynamoDB
7 - Module 7: Using Amazon EMR
Developing and running your applicationLaunching your clusterHandling output from your completed jobs
8 - Module 8: Hadoop programming frameworks
Hadoop frameworksOther frameworks for use on Amazon EMRHands-on lab 4: Processing Server Logs with Hive on Amazon EMR
9 - Module 9: Web interfaces on Amazon EMR
Hue on Amazon EMRMonitoring your clusterHands-on lab 5: Running Pig Scripts in Hue on Amazon EMR
10 - Module 10: Apache Spark on Amazon EMR
Apache SparkUsing SparkHands-on lab 6: Processing NY Taxi Data Using Apache Spark
11 - Module 11: Using AWS Glue to automate ETL workloads
What is AWS Glue?AWS Glue: Job orchestration
12 - Module 12: Amazon Redshift and big data
Data warehouses vs. traditional databasesAmazon RedshiftAmazon Redshift architecture
13 - Module 13: Securing your Amazon deployments
Securing your Amazon deploymentsAmazon EMR security overviewAWS Identity and Access Management (IAM) overviewSecuring dataAmazon Kinesis security overviewAmazon DynamoDB security overviewAmazon Redshift security overview
14 - Module 14: Managing big data costs
Total cost considerations for Amazon EMRAmazon EC2 pricing modelsAmazon Kinesis pricing modelsCost considerations for Amazon DynamoDBCost considerations and pricing models for Amazon RedshiftOptimizing cost with AWS
15 - Module 15: Visualizing and orchestrating big data
Visualizing big dataAmazon QuickSightOrchestrating a big data workflowHands-on lab 7: Using TIBCO Spotfire to visualize data
16 - Module 16: Big data design patterns
Common architectures
17 - Module 17: Course wrap-up
What’s next?
Actual course outline may vary depending on offering center. Contact your sales representative for more information.
Who is it For?
Target Audience
This course is intended for:
Individuals responsible for designing and implementing big data solutions, namely Solutions Architects and SysOps Administrators.
Data Scientists and Data Analysts interested in learning about big data solutions on AWS.
Other Prerequisites
We recommend that attendees of this course have:
Basic familiarity with big data technologies, including Apache Hadoop, HDFS, and SQL/NoSQL querying
Completed Data Analytics Fundamentals free digital training or equivalent experience
Working knowledge of core AWS services and public cloud implementation
Completed the AWS Technical Essentials classroom training or have equivalent experience
Basic understanding of data warehousing, relational database systems, and database design