andkret / Cookbook
The Data Engineering Cookbook
AI Architecture Analysis
This repository is indexed by RepoMind. By analyzing andkret/Cookbook in our AI interface, you can instantly generate complete architecture diagrams, visualize control flows, and perform automated security audits across the entire codebase.
Our Agentic Context Augmented Generation (Agentic CAG) engine loads full source files into context, avoiding the fragmentation of traditional RAG systems. Ask questions about the architecture, dependencies, or specific features to see it in action.
Repository Summary (README)
PreviewIf You Like This Book & Need More Help
Check out my Data Engineering Academy at LearnDataEngineering.com trusted by almost 2,000 students!
Visit learndataengineering.com: Click Here
- Learn Data Engineering with our online Academy
- Perfect for becoming a Data Engineer or add Data Engineering to your skillset
- Proven process based on years of experience and hundreds of hours of personal coaching
- Over 30 prepared courses on the most important techniques, fundamental tools and platforms plus our
- Associate Data Engineer Certification
- Academy Discord server with over 1,000 members
Support This Book For Free!
- Amazon: Click Here buy whatever you like from Amazon using this link* (Also check out my complete podcast gear and books)
Here's what's new:
Find the change log with all recent updates here: SEE UPDATES
Contents:
- Introduction
- Basic Engineering Skills
- Advanced Engineering Skills
- Free Hands On Courses / Tutorials‚
- Case Studies
- Best Practices Cloud Platforms
- 130+ Data Sources Data Science
- 1001 Interview Questions
- Recommended Books, Courses, and Podcasts
- Updates
Full Table Of Contents:
Introduction
- What is this Cookbook
- Data Engineers
- My Data Science Platform Blueprint
- Who Companies Need
- How to Learn Data Engineering
- Data Engineers Skills Matrix
- How to Become a Senior Data Engineer
Basic Engineering Skills
- Learn To Code
- Get Familiar With Git
- Agile Development
- Software Engineering Culture
- Learn how a Computer Works
- Data Network Transmission
- Security and Privacy
- Linux
- Docker
- The Cloud
- Security Zone Design
Advanced Engineering Skills
- Data Science Platform
- 81 Platform & Pipeline Design Questions
- Connect
- Buffer
- Processing Frameworks
- Lambda and Kappa Architecture
- Batch Processing
- Stream Processing
- Should You do Stream or Batch Processing
- Is ETL still relevant for Analytics?
- MapReduce
- Apache Spark
- What is the Difference to MapReduce?
- How Spark Fits to Hadoop
- Spark vs Hadoop
- Spark and Hadoop a Perfect Fit
- Spark on YARn
- My Simple Rule of Thumb
- Available Languages
- Spark Driver Executor and SparkContext
- Spark Batch vs Stream processing
- How Spark uses Data From Hadoop
- What are RDDs and How to Use Them
- SparkSQL How and Why to Use It
- What are Dataframes and How to Use Them
- Machine Learning on Spark (TensorFlow)
- MLlib
- Spark Setup
- Spark Resource Management
- AWS Lambda
- Apache Flink
- Elasticsearch
- Apache Drill
- StreamSets
- Store
- Visualize
- Machine Learning
- How to do Machine Learning in production
- Why machine learning in production is harder then you think
- Models Do Not Work Forever
- Where are The Platforms That Support Machine Learning
- Training Parameter Management
- How to Convince People That Machine Learning Works
- No Rules No Physical Models
- You Have The Data. Use It!
- Data is Stronger Than Opinions
- AWS Sagemaker
Hands On Course
- Free Data Engineering Course with AWS, TDengine, Docker and Grafana
- Monitor your data in dbt & detect quality issues with Elementary
- Solving Engineers 4 Biggest Airflow Problems
- The best alternative to Airlfow? Mage.ai
Case Studies
- Data Science @Airbnb
- Data Science @Amazon
- Data Science @Baidu
- Data Science @Blackrock
- Data Science @BMW
- Data Science @Booking.com
- Data Science @CERN
- Data Science @Disney
- Data Science @DLR
- Data Science @Drivetribe
- Data Science @Dropbox
- Data Science @Ebay
- Data Science @Expedia
- Data Science @Facebook
- Data Science @Google
- Data Science @Grammarly
- Data Science @ING Fraud
- Data Science @Instagram
- Data Science @LinkedIn
- Data Science @Lyft
- Data Science @NASA
- Data Science @Netflix
- Data Science @OLX
- Data Science @OTTO
- Data Science @Paypal
- Data Science @Pinterest
- Data Science @Salesforce
- Data Science @Siemens Mindsphere
- Data Science @Slack
- Data Science @Spotify
- Data Science @Symantec
- Data Science @Tinder
- Data Science @Twitter
- Data Science @Uber
- Data Science @Upwork
- Data Science @Woot
- Data Science @Zalando
Best Practices Cloud Platforms
130+ Free Data Sources For Data Science
- Student Favorites
- General And Academic
- Content Marketing
- Crime
- Drugs
- Education
- Entertainment
- Environmental And Weather Data
- Financial And Economic Data
- Government And World
- Health
- Human Rights
- Labor And Employment Data
- Politics
- Retail
- Social
- Travel And Transportation
- Various Portals
- Source Articles and Blog Posts
- Free Data Sources Data Science
1001 Interview Questions
Recommended Books, Courses, and Podcasts
How To Contribute
If you have some cool links or topics for the cookbook, please become a contributor.
Simply pull the repo, add your ideas and create a pull request. You can also open an issue and put your thoughts there.
Please use the "Issues" function for comments.
Important Links
Subscribe to my YouTube channel for regular updates: Link to YouTube
I have a Medium publication where you can publish your data engineer articles to reach more people: Medium publication
<br> *(As an Amazon Associate I earn from qualifying purchases from Amazon This is free of charge for you, but super helpful for supporting this channel)