Search Results
DSA 440 Introduction to APACHE Spark Using Big Datasets
APACHE Spark has become the industry-prevalent Big Data framework, and is the core engine in Databricks. Students will take an introductory hands-on approach to processing datasets up to 6 TB in size with this framework on NC State's High Performance Computing center. Spark applications like Natural Language Processing, Structured Streaming, SQL, MLib, PySpark, SparkR, and GraphX will be covered. Participants will also accelerate Spark NLP on GPUs. Skill-based prerequisites: Basic programming experience and familiarity with R or Python.
Typically offered in Fall, Spring, and Summer