Applied Big Data Analytics
George Washington University
Spring 2026
Course Description
Applied Big Data Analytics (DATS 6450.13) teaches practical, hands‑on methods for building scalable analytics pipelines that start on a single machine and scale to distributed clusters. Students learn to develop local analytical workflows with DuckDB, translate and scale them with Apache Spark, and evaluate performance tradeoffs through comparative benchmarking, query tuning, partitioning and memory strategies. The course covers modern tooling (Polars, Ray, RAPIDS), Spark SQL/DataFrame APIs, Spark NLP and MLlib, and efficient visualization of very large datasets (Datashader), with emphasis on reproducible end‑to‑end workflows and a final project demonstrating design and performance decisions.