Cognite Spark Data Source

About this course

Welcome to Cognite Spark Data Source!
Spark is a software running on a cluster of computer(s) enabling large-scale data processing, while Databricks is a software that makes it easy to set up and manage these clusters, in addition to facilitating collaboration with hosted notebooks. Cognite Spark Data source is a tool developed by Cognite enabling Spark to write to and read from CDF.

By the end of this course, you will be able to:

Describe what Spark and Databricks are and when to use them.
Interact through a Databricks notebook (import a notebook, set up secret scope).
Use different features of Cognite Spark Data Source.
Display, aggregate, and analyze the Open Industrial Data (OID) in Cognite Data Fusion (CDF) with Cognite Spark Data Source.

Who should take this course?
Anyone who wants to have an understanding of Cognite Spark Data Source.

Instructor
Cognite Academy has developed this course with:

Joel Wilsson
Principal Architect

Håk on Trømborg
Tech Lead - Data Integration

Knowledge prerequisites
Basic understanding of coding (Python, PySpark).

Technical prerequisites
Have a Databricks Enterprise account.

Curriculum1 hr 30 min

Introduction
Introduction and how to learn from the practical examples
Spark and Databricks
Spark, Databricks and Cognite Spark Data source
Access and set up
Get access to Open Industrial Data project
Generate client secret
Access Databricks
Importing Notebook
Cluster
Secret scope
Work with assets, events, and time series metadata (Part 1)
Introduction
DataFrames
Displaying data
Caching Data
Time Series Metadata
Aggregations
Filtering
Column objects
Joins
Work with data points and files (Part 2)
Data points
Plotting data
Joins with data points
Files metadata
Summary
Key takeaways
Share your feedback

About this course