Course Objectives
By the end of the course, participants will be able to:
- Design big data implementation plans and create strategies for data driven solutions
- Explain the challenges of big data and traditional technologies like Excel
- Discuss the main challenges and advantages of Hadoop ecosystem and other big data distributed architectures
- Demonstrate and discuss key technologies for big data storage and compute, such as PostgreSQL and MongoDB
- Discuss popular machine learning algorithms and the importance of ethics in data analytics and artificial intelligence
- Deliver an architectural diagram for analytics focused use cases
Course Content:
-
Introduction to Big Data Analytics
- What is Big Data?
- 5 “V’s” of big data
- How big data relates to data analytics
- Big data impact on technologies
- Open source revolution
- Key big data concepts and data types
- Text, audio, images
- Big data professional roles
- How can big data projects meet organizational needs
- Big data Examples:
- Netflix, LinkedIn, Facebook, Google, Orbitz, Dell, others.
- Best practices in project design
- Assessing the current state of your organization
- What is Big Data?
-
Storing Big Data
- Big data architectures and paradigms
- The Hadoop Ecosystem
- Overview of Hadoop
- Hadoop Distributed File System (HDFS)
- Massively parallel processing (MPP) versus distributed in-memory applications
- RDBMSs vs NoSQL DBs
- PostgreSQL, MongoDB, Cassandra
- Streaming data
- The Hadoop Ecosystem
- Data-warehousing versus Data Mart
- Big data architectures and paradigms
-
Computing Big Data
- How to access big data
- Role of cloud computing
- Data movement risk
- Networking and co-location
- Big data extract, transform, load (ETL)
- Big data compute technologies
- Hadoop continued
- MapReduce and beyond
- Distributed compute
- High performance clusters
- Spark
- Streaming: Storm, Spark structured streaming
- Other big data technologies: Kafka, etc.
- Hadoop continued
- Cloud applications for big data
- How to access big data
-
Big Data Projects
- Basics of data analytics
- Roles and objectives
- Key math and statistics concepts
- Supervised versus Unsupervised
- Key technologies and applications
- Getting Value out of Big Data
- 5 P’s of data science
- Importance of Ethics
- Programmability
- Basics of data analytics
-
Architecting Big Data Solutions
- Identify analytical opportunities
- Define and assess the problem
- Describe the impact and use of data to address the problem
- Identify potential data sources
- Brainstorm an analytics strategy to implement
- Storage and compute
- Identify a cloud environment strategy
- Brainstorm key storage systems and compute environments
- Identify analytical opportunities
Target Competencies
- Big data hands-on labs
- Big Data analytics structures and technologies
- Ethics and integrity for big data analytics
- Big data storage and computer system implementation
- Architecture diagram design
Target Audience
This course is ideal for data analysts, data engineers, data scientists, as well as technically-inclined management and administrative professionals seeking to understand big data strategies, technologies and use cases. Recommended pre-knowledge includes basic programming experience and analyzing data in python, knowledge of basic database technologies, and awareness of analytics driven business initiatives.