Highest scored 'databricks' questions

35 votes

3 answers

89k views

Exploding nested Struct in Spark dataframe

I'm working through a Databricks example. The schema for the dataframe looks like: > parquetDF.printSchema root |-- department: struct (nullable = true) | |-- id: string (nullable = true) | |-...

Feynman27

3,239

asked Sep 1, 2016 at 15:41

35 votes

7 answers

73k views

How to delete all files from folder with Databricks dbutils

Can someone let me know how to use the databricks dbutils to delete all files from a folder. I have tried the following but unfortunately, Databricks doesn't support wildcards. dbutils.fs.rm('adl://...

Carltonp

1,344

asked Jan 7, 2019 at 20:48

32 votes

8 answers

70k views

Databricks drop a delta table?

How can I drop a Delta Table in Databricks? I can't find any information in the docs... maybe the only solution is to delete the files inside the folder 'delta' with the magic command or dbutils: %fs ...

Joanteixi

517

asked Nov 22, 2019 at 9:01

31 votes

6 answers

105k views

Databricks: Download a dbfs:/FileStore File to my Local Machine?

I am using saveAsTextFile() to store the results of a Spark job in the folder dbfs:/FileStore/my_result. I can access to the different "part-xxxxx" files using the web browser, but I would like to ...

Nacho Castiñeiras

373

asked Feb 27, 2018 at 23:36

28 votes

4 answers

39k views

How to handle an AnalysisException on Spark SQL?

I am trying to execute a list of queries in Spark, but if the query does not run correctly, Spark throws me the following error: AnalysisException: "ALTER TABLE CHANGE COLUMN is not supported for ...

Kevin Gomez

323

asked Oct 4, 2019 at 17:39

28 votes

3 answers

76k views

How to list all the mount points in Azure Databricks?

I tried with this %fs ls dbfs:/mnt, but i want to know do this give me all the mount point?

Shahid Ahmed

281

asked Jun 5, 2020 at 12:57

27 votes

7 answers

31k views

Databricks: Issue while creating spark data frame from pandas

I have a pandas data frame which I want to convert into spark data frame. Usually, I use the below code to create spark data frame from pandas but all of sudden I started to get the below error, I am ...

data en

565

asked Apr 4, 2023 at 7:32

27 votes

5 answers

63k views

Databricks: How do I get path of current notebook?

Databricks is smart and all, but how do you identify the path of your current notebook? The guide on the website does not help. It suggests: %scala dbutils.notebook.getContext.notebookPath res1: ...

Esben Eickhardt

3,592

asked Nov 28, 2018 at 16:03

26 votes

7 answers

71k views

How to drop a column from a Databricks Delta table?

I have recently started discovering Databricks and faced a situation where I need to drop a certain column of a delta table. When I worked with PostgreSQL it was as easy as ALTER TABLE main....

samba

3,031

asked Jan 31, 2019 at 9:15

26 votes

6 answers

49k views

Azure Databricks - Can not create the managed table The associated location already exists

I have the following problem in Azure Databricks. Sometimes when I try to save a DataFrame as a managed table: SomeData_df.write.mode('overwrite').saveAsTable("SomeData") I get the following error: ...

BuahahaXD

629

asked Mar 27, 2019 at 15:04

26 votes

4 answers

11k views

How to detect Databricks environment programmatically

I'm writing a spark job that needs to be runnable locally as well as on Databricks. The code has to be slightly different in each environment (file paths) so I'm trying to find a way to detect if the ...

steven35

3,917

asked Jul 13, 2018 at 16:13

25 votes

5 answers

53k views

How to load databricks package dbutils in pyspark

I was trying to run the below code in pyspark. dbutils.widgets.text('config', '', 'config') It was throwing me an error saying Traceback (most recent call last): File "<stdin>", line 1, ...

Babu

881

asked Aug 16, 2018 at 21:04

24 votes

3 answers

31k views

NameError: name 'dbutils' is not defined in pyspark

I am running a pyspark job in databricks cloud. I need to write some of the csv files to databricks filesystem (dbfs) as part of this job and also i need to use some of the dbutils native commands ...

Krishna Reddy

1,089

asked Jun 12, 2018 at 9:16

23 votes

2 answers

10k views

Apache Spark + Delta Lake concepts

I have many doubts related to Spark + Delta. 1) Databricks propose 3 layers (bronze, silver, gold), but in which layer is recommendable to use for Machine Learning and why? I suppose they propose to ...

Eric Gabriel Bellet Locker

313

asked May 19, 2019 at 19:20

23 votes

7 answers

32k views

How to find size (in MB) of dataframe in pyspark?

How to find size (in MB) of dataframe in pyspark, df = spark.read.json("/Filestore/tables/test.json") I want to find how the size of df or test.json

Aravindh

331

asked Jun 16, 2020 at 15:15

22 votes

3 answers

29k views

Printing secret value in Databricks

Even though secrets are for masking confidential information, I need to see the value of the secret for using it outside Databricks. When I simply print the secret it shows [REDACTED]. print(dbutils....

aykcandem

1,075

asked Nov 11, 2021 at 8:49

21 votes

3 answers

37k views

Databricks - How to change a partition of an existing Delta table?

I have a table in Databricks delta which is partitioned by transaction_date. I want to change the partition column to view_date. I tried to drop the table and then create it with a new partition ...

samba

3,031

asked Mar 4, 2019 at 18:12

21 votes

1 answer

15k views

Local instance of Databricks for development

I am currently working on a small team that is developing a Databricks based solution. For now we are small enough to work off of cloud instances of Databricks. As the group grows this will not ...

John

3,856

asked Sep 11, 2020 at 3:17

20 votes

3 answers

32k views

lstm will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU

I am running the following code for LSTM on Databricks with GPU model = Sequential() model.add(LSTM(64, activation=LeakyReLU(alpha=0.05), batch_input_shape=(1, timesteps, n_features), stateful=...

Muhammad Haris Choudhary

253

asked Aug 19, 2021 at 8:54

20 votes

1 answer

24k views

Databricks SQL - How to get all the rows (more than 1000) in the first run?

Currently, in Databricks if we run the query, it always returns 1000 rows in the first run. If we need all the rows, we need to execute the query again. In the situations where we know that we need to ...

MrKrizzer

419

asked Oct 1, 2020 at 19:03

19 votes

3 answers

86k views

How to export data from a dataframe to a file databricks

I'm doing right now Introduction to Spark course at EdX. Is there a possibility to save dataframes from Databricks on my computer. I'm asking this question, because this course provides Databricks ...

Tom Becker

311

asked Jul 27, 2016 at 17:55

19 votes

7 answers

57k views

How to slice a pyspark dataframe in two row-wise

I am working in Databricks. I have a dataframe which contains 500 rows, I would like to create two dataframes on containing 100 rows and the other containing the remaining 400 rows. +----------------...

Data_101

913

asked Feb 20, 2018 at 12:06

19 votes

4 answers

42k views

list the files of a directory and subdirectory recursively in Databricks(DBFS)

Using python/dbutils, how to display the files of the current directory & subdirectory recursively in Databricks file system(DBFS).

Kiran A

249

asked Sep 18, 2020 at 12:29

19 votes

2 answers

55k views

How to set environment variable in databricks?

Simple question, but I can't find a simple guide on how to set the environment variable in Databricks. Also, is it important to set the environment variable on both the driver and executors (and would ...

information_interchange

2,908

asked Jul 2, 2019 at 15:44

19 votes

4 answers

47k views

How to move files of same extension in databricks files system?

I am facing file not found exception when i am trying to move the file with * in DBFS. Here both source and destination directories are in DBFS. I have the source file named "test_sample.csv" ...

Krishna Reddy

1,089

asked Jun 8, 2018 at 13:18

19 votes

1 answer

2k views

PySpark and Protobuf Deserialization UDF Problem

I'm getting this error Can't pickle <class 'google.protobuf.pyext._message.CMessage'>: it's not found as google.protobuf.pyext._message.CMessage when I try to create a UDF in PySpark. ...

Marc Vitalis

2,209

asked May 3, 2020 at 14:41

18 votes

2 answers

103k views

Read/Write single file in DataBricks

I have a file which contains a list of names stored in a simple text file. Each row contains one name. Now I need to pro grammatically append a new name to this file based on a users input. For the ...

Gerhard Brueckl

748

asked Mar 16, 2018 at 10:25

18 votes

2 answers

17k views

What does "Determining location of DBIO file fragments..." mean, and how do I speed it up?

When running simple SQL commands in Databricks, sometimes I get the message: Determining location of DBIO file fragments. This operation can take some time. What does this mean, and how do I ...

David Maddox

2,014

asked Nov 30, 2019 at 20:11

17 votes

3 answers

47k views

Ways to Plot Spark Dataframe without Converting it to Pandas

Is there any way to plot information from Spark dataframe without converting the dataframe to pandas? Did some online research but can't seem to find a way. I need to automatically save these plots ...

KikiNeko

301

asked Jul 29, 2019 at 20:45

16 votes

9 answers

142k views

How to read xlsx or xls files as spark dataframe

Can anyone let me know without converting xlsx or xls files how can we read them as a spark dataframe I have already tried to read with pandas and then tried to convert to spark dataframe but got ...

Ravi Kiran

161

asked Jun 3, 2019 at 11:05

16 votes

3 answers

53k views

How to rename a column in Databricks

How do you rename a column in Databricks? The following does not work: ALTER TABLE mySchema.myTable change COLUMN old_name new_name int It returns the error: ALTER TABLE CHANGE COLUMN is not ...

David Maddox

2,014

asked Dec 26, 2019 at 17:06

16 votes

1 answer

10k views

Databricks Community Edition Cluster won't start

I am trying to start a cluster that was terminated in a Community Edition. However, whenever I click on 'start' the cluster won't start. It would appear I have to create a new cluster everytime I want ...

Patterson

2,599

asked Sep 6, 2021 at 10:02

16 votes

4 answers

33k views

Databricks - is not empty but it's not a Delta table

I run a query on Databricks: DROP TABLE IF EXISTS dublicates_hotels; CREATE TABLE IF NOT EXISTS dublicates_hotels ... I'm trying to understand why I receive the following error: Error in SQL ...

QbS

464

asked Oct 13, 2021 at 7:51

16 votes

1 answer

29k views

Error running Spark on Databricks: constructor public XXX is not whitelisted

I was using Azure Databricks and trying to run some example python code from this page. But I get this exception: py4j.security.Py4JSecurityException: Constructor public org.apache.spark.ml....

lidong

608

asked Mar 30, 2019 at 3:05

16 votes

1 answer

20k views

Use of lit() in expr()

The line: df.withColumn("test", expr("concat(lon, lat)")) works as expected but df.withColumn("test", expr("concat(lon, lit(','), lat)")) produces the following exception: org.apache.spark.sql....

Kyunam

189

asked Nov 8, 2018 at 2:17

16 votes

3 answers

34k views

Unity catalog not enabled on cluster in Databricks

We are trying out Unity catalog in Azure Databricks. We connected a pre-existing workspace to the new metastore. I created a new catalog. When I run a notebook and try to write to table "...

Mathias Rönnlund

4,628

asked Nov 25, 2022 at 9:35

16 votes

1 answer

13k views

Spark: Read an inputStream instead of File

I'm using SparkSQL in a Java application to do some processing on CSV files using Databricks for parsing. The data I am processing comes from different sources (Remote URL, local file, Google Cloud ...

Nate Vaughan

3,771

asked Jul 20, 2016 at 21:13

16 votes

2 answers

3k views

Switching between Databricks Connect and local Spark environment

I am looking to use Databricks Connect for developing a pyspark pipeline. DBConnect is really awesome because I am able to run my code on the cluster where the actual data resides, so it's perfect for ...

casparjespersen

3,710

asked May 11, 2020 at 16:39

15 votes

3 answers

52k views

How to write pandas dataframe into Databricks dbfs/FileStore?

I'm new to the Databricks, need help in writing a pandas dataframe into databricks local file system. I did search in google but could not find any case similar to this, also tried the help guid ...

Shaan Proms

153

asked Dec 19, 2019 at 20:53

15 votes

2 answers

13k views

How to properly access dbutils in Scala when using Databricks Connect

I'm using Databricks Connect to run code in my Azure Databricks cluster locally from IntelliJ IDEA (Scala). Everything works fine. I can connect, debug, inspect locally in the IDE. I created a ...

empz

12.1k

asked Nov 19, 2019 at 19:47

15 votes

3 answers

62k views

How to solve this error org.apache.spark.sql.catalyst.errors.package$TreeNodeException

I have two procesess each process do 1) connect oracle db read a specific table 2) form dataframe and process it. 3) save the df to cassandra. If I am running both process parallelly , both try to ...

BdEngineer

3,109

asked Oct 26, 2018 at 14:48

14 votes

1 answer

21k views

ArrowTypeError: Did not pass numpy.dtype object', 'Conversion failed for column X with type int32

Problem I am trying to save a data frame as a parquet file on Databricks, getting the ArrowTypeError. Databricks Runtime Version: 7.6 ML (includes Apache Spark 3.0.1, Scala 2.12) Log Trace ...

Naga Budigam

749

asked May 12, 2021 at 11:41

14 votes

1 answer

38k views

How to create a empty folder in Azure Blob from Azure databricks

I have scenario where I want to list all the folders inside a directory in Azure Blob. If no folders present create a new folder with certain name. I am trying to list the folders using dbutils.fs.ls(...

Saikat

433

asked Jun 24, 2020 at 13:09

14 votes

3 answers

30k views

Python Version in Azure Databricks

I am trying to find out the python version I am using in Databricks. To find out I tried import sys print(sys.version) And I got the output as 3.7.3 However when I went to Cluster --> SparkUI --> ...

learner

973

asked Jun 10, 2020 at 12:47

14 votes

2 answers

29k views

reading data from URL using spark databricks platform

trying to read data from url using spark on databricks community edition platform i tried to use spark.read.csv and using SparkFiles but still, i am missing some simple point url = "https://raw....

arya

436

asked Jul 12, 2019 at 21:05

14 votes

6 answers

38k views

Databricks display() function equivalent or alternative to Jupyter

I'm in the process of migrating current DataBricks Spark notebooks to Jupyter notebooks, DataBricks provides convenient and beautiful display(data_frame) function to be able to visualize Spark ...

Luis Leal

3,462

asked Sep 8, 2017 at 23:11

14 votes

4 answers

9k views

How to login SSH on Azure Databricks cluster

I used the following ubuntu command to access SSH login as, ssh user@hostname_or_IP Can able to see Master node hostname but not able to get the username from Azure Databricks cluster Refer this ...

veilupearl

929

asked Dec 5, 2017 at 6:04

14 votes

1 answer

36k views

How can I convert a pyspark.sql.dataframe.DataFrame back to a sql table in databricks notebook

I created a dataframe of type pyspark.sql.dataframe.DataFrame by executing the following line: dataframe = sqlContext.sql("select * from my_data_table") How can I convert this back to a sparksql ...

Semihcan Doken

796

asked Aug 19, 2016 at 23:03

13 votes

2 answers

50k views

Adding constant value column to spark dataframe

I am using Spark version 2.1 in Databricks. I have a data frame named wamp to which I want to add a column named region which should take the constant value NE. However, I get an error saying ...

Gaurav Bansal

5,528

asked May 17, 2017 at 19:13

13 votes

3 answers

37k views

check if delta table exists on a path or not in databricks

I need to delete certain data from a delta-lake table before I load it. I am able to delete the data from delta table if it exists but it fails when the table does not exist. Databricks scala code ...

VNK

155

asked Oct 6, 2020 at 16:39

Collectives™ on Stack Overflow

Questions tagged [databricks]

Related Tags