Cognite Spark Data Source

Use Spark in Databricks to read and write data from and to Cognite Data Fusion.

rate limit

Code not recognized.

About this course

Welcome to Cognite Spark Data Source!
Spark is a software running on a cluster of computer(s) enabling large-scale data processing, while Databricks is a software that makes it easy to set up and manage these clusters, in addition to facilitating collaboration with hosted notebooks. Cognite Spark Data source is a tool developed by Cognite enabling Spark to write to and read from CDF.

By the end of this course, you will be able to:

  • Describe what Spark and Databricks are and when to use them.
  • Interact through a Databricks notebook (import a notebook, set up secret scope).
  • Use different features of Cognite Spark Data Source.
  • Display, aggregate, and analyze the Open Industrial Data (OID) in Cognite Data Fusion (CDF) with Cognite Spark Data Source.

Who should take this course?
Anyone who wants to have an understanding of Cognite Spark Data Source.

Instructor
Cognite Academy has developed this course with:

Joel Wilsson
Principal Architect

Håkon Trømborg
Tech Lead - Data Integration

Knowledge prerequisites
Basic understanding of coding (Python, PySpark).

Technical prerequisites
Have a Databricks Enterprise account.

Curriculum1 hr 30 min

  • Introduction
  • Introduction and how to learn from the practical examples
  • Spark and Databricks
  • Spark, Databricks and Cognite Spark Data source
  • Access and set up
  • Get access to Open Industrial Data project
  • Generate client secret
  • Access Databricks
  • Importing Notebook
  • Cluster
  • Secret scope
  • Work with assets, events, and time series metadata (Part 1)
  • Introduction
  • DataFrames
  • Displaying data
  • Caching Data
  • Time Series Metadata
  • Aggregations
  • Filtering
  • Column objects
  • Joins
  • Work with data points and files (Part 2)
  • Data points
  • Plotting data
  • Joins with data points
  • Files metadata
  • Summary
  • Key takeaways
  • Share your feedback

About this course

Welcome to Cognite Spark Data Source!
Spark is a software running on a cluster of computer(s) enabling large-scale data processing, while Databricks is a software that makes it easy to set up and manage these clusters, in addition to facilitating collaboration with hosted notebooks. Cognite Spark Data source is a tool developed by Cognite enabling Spark to write to and read from CDF.

By the end of this course, you will be able to:

  • Describe what Spark and Databricks are and when to use them.
  • Interact through a Databricks notebook (import a notebook, set up secret scope).
  • Use different features of Cognite Spark Data Source.
  • Display, aggregate, and analyze the Open Industrial Data (OID) in Cognite Data Fusion (CDF) with Cognite Spark Data Source.

Who should take this course?
Anyone who wants to have an understanding of Cognite Spark Data Source.

Instructor
Cognite Academy has developed this course with:

Joel Wilsson
Principal Architect

Håkon Trømborg
Tech Lead - Data Integration

Knowledge prerequisites
Basic understanding of coding (Python, PySpark).

Technical prerequisites
Have a Databricks Enterprise account.

Curriculum1 hr 30 min

  • Introduction
  • Introduction and how to learn from the practical examples
  • Spark and Databricks
  • Spark, Databricks and Cognite Spark Data source
  • Access and set up
  • Get access to Open Industrial Data project
  • Generate client secret
  • Access Databricks
  • Importing Notebook
  • Cluster
  • Secret scope
  • Work with assets, events, and time series metadata (Part 1)
  • Introduction
  • DataFrames
  • Displaying data
  • Caching Data
  • Time Series Metadata
  • Aggregations
  • Filtering
  • Column objects
  • Joins
  • Work with data points and files (Part 2)
  • Data points
  • Plotting data
  • Joins with data points
  • Files metadata
  • Summary
  • Key takeaways
  • Share your feedback