docs.databricks.com Open in urlscan Pro
2600:9000:223c:3400:1c:30a3:840:93a1  Public Scan

Submitted URL: https://lnkd.in/gWG5TKeK
Effective URL: https://docs.databricks.com/introduction/index.html
Submission: On January 06 via manual from US — Scanned from DE

Form analysis 1 forms found in the DOM

<form autocomplete="off" id="searchForm" class="su__search-forms su__m-0">
  <div class="su__form-block su__w-100 su__position-relative">
    <div class="su__radius-2 su__d-flex su__position-relative"><input id="search-box-autocomplete" class="su__input-search su__w-100 su__su__font-14 su__text-black su__p-3 su__border-none su__radius-2 su__pr-60" type="input"
        placeholder="Search here"><button type="button" class="su__btn su__search_btn su__animate-zoom su__flex-vcenter su__position-absolute su__zindex su__bg-transparent su__rtlleft"><svg width="24" height="24" viewBox="0 0 24 24">
          <path
            d="M15.5 14h-.79l-.28-.27C15.41 12.59 16 11.11 16 9.5 16 5.91 13.09 3 9.5 3S3 5.91 3 9.5 5.91 16 9.5 16c1.61 0 3.09-.59 4.23-1.57l.27.28v.79l5 4.99L20.49 19l-4.99-5zm-6 0C7.01 14 5 11.99 5 9.5S7.01 5 9.5 5 14 7.01 14 9.5 11.99 14 9.5 14z"
            fill="#333"></path>
        </svg></button></div>
  </div>
</form>

Text Content

 * 

 * 

 * Support
 * Feedback
 * Try Databricks

 * Help Center
 * Documentation
 * Knowledge Base


Amazon Web Services Microsoft Azure Google Cloud Platform Amazon Web Services
Databricks on AWS

Getting started

 * What is Databricks?
   * What is the Databricks Lakehouse?
   * Security & compliance
   * Concepts
   * Architecture
   * Integrations
 * Get started
 * Tutorials and best practices

User guides

 * Data Science & Engineering
 * Machine Learning
 * Databricks SQL
 * Data lakehouse
 * Data discovery
 * Data ingestion
 * Delta Lake
 * Developer tools
 * Integrations
 * Partner Connect
 * Databricks partners

Administration guides

 * Account and workspace administration
 * Navigate the workspace
 * Security and compliance
 * Data governance
 * Data sharing (Delta Sharing)

Reference guides

 * API reference
 * SQL reference
 * Language-specific overviews
 * Intro to Apache Spark
 * CLI and utilities
 * Error messages

Resources

 * Release notes
 * Optimizations and performance
 * Resources
 * Documentation archive

Updated Jan 06, 2023

Send us feedback

 * Documentation
 * What is Databricks?
 * 


WHAT IS DATABRICKS?

November 03, 2022

The Databricks Lakehouse Platform provides a unified set of tools for building,
deploying, sharing, and maintaining enterprise-grade data solutions at scale.
Databricks integrates with cloud storage and security in your cloud account, and
manages and deploys cloud infrastructure on your behalf.

In this article:
 * Managed integration with open source
 * How does Databricks work with AWS?
 * What is Databricks used for?
 * What are common use cases for Databricks?
 * Build an enterprise data lakehouse
 * ETL and data engineering
 * Machine learning, AI, and data science
 * Data warehousing, analytics, and BI
 * Data governance and secure data sharing
 * DevOps, CI/CD, and task orchestration
 * Real-time and streaming analytics


MANAGED INTEGRATION WITH OPEN SOURCE

Databricks has a strong commitment to the open source community. Databricks
manages updates of open source integrations in the Databricks Runtime releases.
The following technologies are open source projects founded by Databricks
employees:

 * Delta Lake

 * Delta Sharing

 * MLflow

 * Apache Spark and Structured Streaming

 * Redash

Databricks maintains a number of proprietary tools that integrate and expand
these technologies to add optimized performance and ease of use, such as the
following:

 * Workflows

 * Unity Catalog

 * Delta Live Tables

 * Databricks SQL

 * Photon


HOW DOES DATABRICKS WORK WITH AWS?

The Databricks platform architecture is composed of two primary parts: the
infrastructure used by Databricks to deploy, configure, and manage the platform
and services, and the customer-owned infrastructure managed in collaboration by
Databricks and your company.

Unlike many enterprise database companies, Databricks does not force you to
migrate your data into proprietary storage systems in order to use the platform.
Instead, you configure a Databricks workspace by configuring secure integrations
between the Databricks platform and your cloud account, and then Databricks
deploys ephemeral compute clusters using cloud resources in your account to
process and store data in object storage and other integrated services you
control.

Unity Catalog further extends this relationship, allowing you to manage
permissions for accessing data using familiar SQL syntax from within Databricks.

Databricks has deployed workspaces that meet the security and networking
requirements of some of the world’s largest and most security-minded companies.
Databricks makes it easy for new users to get started on the platform, and
removes many of the burdens and concerns of working with cloud infrastructure
from end users, but does not limit the customizations and control experienced
data, operations, and security teams require.


WHAT IS DATABRICKS USED FOR?

Our customers use Databricks to process, store, clean, share, analyze, model,
and monetize their datasets with solutions from BI to machine learning. You can
use the Databricks platform to build many different applications spanning data
personas. Customers who fully embrace the lakehouse take advantage of our
unified platform to build and deploy data engineering workflows, machine
learning models, and analytics dashboards that power innovations and insights
across an organization.

The Databricks workspace provides user interfaces for many core data tasks,
including tools for the following:

 * Interactive notebooks

 * Workflows scheduler and manager

 * SQL editor and dashboards

 * Data ingestion and governance

 * Data discovery, annotation, and exploration

 * Compute management

 * Machine learning (ML) experiment tracking

 * ML model serving

 * A feature store

 * Source control with Git

In addition to the workspace UI, you can interact with Databricks
programmatically with the following tools:

 * REST API

 * CLI

 * Terraform


WHAT ARE COMMON USE CASES FOR DATABRICKS?

Use cases on Databricks are as varied as the data processed on the platform and
the many personas of employees that work with data as a core part of their job.
The following use cases highlight how users throughout your organization can
leverage Databricks to accomplish tasks essential to processing, storing, and
analyzing the data that drives critical business functions and decisions.


BUILD AN ENTERPRISE DATA LAKEHOUSE

The data lakehouse combines strenghts of data warehouses and data lakes to
accelerate, simplify, and unify enterprise data solutions. Data engineers, data
scientists, analysts, and production systems can all leverage the data lakehouse
as a single source of truth, allowing timely access to consistent data and
reducing the complexities of building, maintaining, and syncing many distributed
data systems. See What is the Databricks Lakehouse?.


ETL AND DATA ENGINEERING

Whether you’re generating dashboards or powering artificial intelligence
applications, data engineering provides the backbone for data-centric companies
by making sure data is available, clean, and stored in data models that allow
for efficient discovery and use. Databricks combines the power of Apache Spark
with Delta Lake and custom tools to provide an unrivaled ETL (extract,
transform, load) experience. You can use SQL, Python, and Scala to compose ETL
logic and then orchestrate scheduled job deployment with just a few clicks.

Delta Live Tables simplifies ETL even further by intelligently managing
dependencies between datasets and automatically deploying and scaling production
infrastructure to ensure timely and accurate delivery of data per your
specifications.

Databricks provides a number of custom tools for data ingestion, including Auto
Loader, an efficient and scalable tool for incrementally and idempotently
loading data from cloud object storage and data lakes into the data lakehouse.


MACHINE LEARNING, AI, AND DATA SCIENCE

Databricks machine learning expands the core functionality of the platform with
a suite of tools tailored to the needs of data scientists and ML engineers,
including MLflow and the Databricks Runtime for Machine Learning. See Databricks
Machine Learning guide.


DATA WAREHOUSING, ANALYTICS, AND BI

Databricks combines user-friendly UIs with cost-effective compute resources and
infinitely scalable, affordable storage to provide a powerful platform for
running analytic queries. Administrators configure scalable compute clusters as
SQL warehouses, allowing end users to execute queries without worrying about any
of the complexities of working in the cloud. SQL users can run queries against
data in the lakehouse using the SQL query editor or in notebooks. Notebooks
support Python, R, and Scala in addition to SQL, and allow users to embed the
same visualizations available in dashboards alongside links, images, and
commentary written in markdown.


DATA GOVERNANCE AND SECURE DATA SHARING

Unity Catalog provides a unified data governance model for the data lakehouse.
Cloud administrators configure and integrate coarse access control permissions
for Unity Catalog, and then Databricks administrators can manage permissions for
teams and individuals. Privileges are managed with access control lists (ACLs)
through either user-friendly UIs or SQL syntax, making it easier for database
administrators to secure access to data without needing to scale on cloud-native
identity access management (IAM) and networking.

Unity Catalog makes running secure analytics in the cloud simple, and provides a
division of responsibility that helps limit the reskilling or upskilling
necessary for both administrators and end users of the platform. See What is
Unity Catalog?.

The lakehouse makes data sharing within your organization as simple as granting
query access to a table or view. For sharing outside of your secure environment,
Unity Catalog features a managed version of Delta Sharing.


DEVOPS, CI/CD, AND TASK ORCHESTRATION

The development lifecycles for ETL pipelines, ML models, and analytics
dashboards each present their own unique challenges. Databricks allows all of
your users to leverage a single data source, which reduces duplicate efforts and
out-of-sync reporting. By additionally providing a suite of common tools for
versioning, automating, scheduling, deploying code and production resources, you
can simplify your overhead for monitoring, orchestration, and operations.
Workflows schedule Databricks notebooks, SQL queries, and other arbitrary code.
Repos let you sync Databricks projects with a number of popular git providers.
For a complete overview of tools, see Developer tools and guidance.


REAL-TIME AND STREAMING ANALYTICS

Databricks leverages Apache Spark Structured Streaming to work with streaming
data and incremental data changes. Structured Streaming integrates tightly with
Delta Lake, and these technologies provide the foundations for both Delta Live
Tables and Auto Loader. See What is Apache Spark Structured Streaming?.



























Was this article helpful?

--------------------------------------------------------------------------------

© Databricks 2023. All rights reserved. Apache, Apache Spark, Spark, and the
Spark logo are trademarks of the Apache Software Foundation.

Send us feedback | Privacy Policy | Terms of Use


IN THIS ARTICLE:

 * Managed integration with open source
 * How does Databricks work with AWS?
 * What is Databricks used for?
 * What are common use cases for Databricks?
 * Build an enterprise data lakehouse
 * ETL and data engineering
 * Machine learning, AI, and data science
 * Data warehousing, analytics, and BI
 * Data governance and secure data sharing
 * DevOps, CI/CD, and task orchestration
 * Real-time and streaming analytics


WE CARE ABOUT YOUR PRIVACY

By clicking “Accept All Cookies”, you agree to the storing of cookies on your
device to enhance site navigation, analyze site usage, and assist in our
marketing efforts.

Manage Preferences Reject all cookies Accept all cookies



PRIVACY PREFERENCE CENTER




 * YOUR PRIVACY


 * STRICTLY NECESSARY COOKIES


 * PERFORMANCE COOKIES


 * FUNCTIONAL COOKIES


 * TARGETING COOKIES

YOUR PRIVACY

When you visit any website, it may store or retrieve information on your
browser, mostly in the form of cookies. This information might be about you,
your preferences or your device and is mostly used to make the site work as you
expect it to. The information does not usually directly identify you, but it can
give you a more personalized web experience. Because we respect your right to
privacy, you can choose not to allow some types of cookies. Click on the
different category headings to find out more and change our default settings.
However, blocking some types of cookies may impact your experience of the site
and the services we are able to offer.
More information

STRICTLY NECESSARY COOKIES

Always Active

These cookies are necessary for the website to function and cannot be switched
off in our systems. They are usually only set in response to actions made by you
which amount to a request for services, such as setting your privacy
preferences, logging in or filling in forms. You can set your browser to block
or alert you about these cookies, but some parts of the site will not then work.

PERFORMANCE COOKIES

Performance Cookies


These cookies allow us to count visits and traffic sources so we can measure and
improve the performance of our site. They help us to know which pages are the
most and least popular and see how visitors move around the site.

FUNCTIONAL COOKIES

Functional Cookies


These cookies enable the website to provide enhanced functionality and
personalization. They may be set by us or by third party providers whose
services we have added to our pages. If you do not allow these cookies then some
or all of these services may not function properly.

TARGETING COOKIES

Targeting Cookies


These cookies may be set through our site by our advertising partners. They may
be used by those companies to build a profile of your interests and show you
relevant advertisements on other sites. If you do not allow these cookies, you
will experience less targeted advertising.

Back Button


BACK

Filter Button
Consent Leg.Interest
Switch Label label
Switch Label label
Switch Label label

Clear
checkbox label label
Apply Cancel
Confirm My Choices
Reject All Allow All