RDD Python - Search

About 25,700,000 results

Open links in new tab

Any time

stackoverflow.com
https://stackoverflow.com › questions
scala - What is RDD in spark - Stack Overflow
Dec 23, 2015 · An RDD is, essentially, the Spark representation of a set of data, spread across multiple machines, with APIs to let you act on it. An RDD could come from any datasource, …
stackoverflow.com
https://stackoverflow.com › questions
Difference between DataFrame, Dataset, and RDD in Spark
Feb 18, 2020 · I'm just wondering what is the difference between an RDD and DataFrame (Spark 2.0.0 DataFrame is a mere type alias for Dataset[Row]) in Apache Spark? Can you convert …
stackoverflow.com
https://stackoverflow.com › questions
scala - How to print the contents of RDD? - Stack Overflow
But I think I know where this confusion comes from: the original question asked how to print an RDD to the Spark console (= shell) so I assumed he would run a local job, in which case …
stackoverflow.com
https://stackoverflow.com › questions
Spark: Best practice for retrieving big data from RDD to local …
Feb 11, 2014 · Update: RDD.toLocalIterator method that appeared after the original answer has been written is a more efficient way to do the job. It uses runJob to evaluate only a single …
stackoverflow.com
https://stackoverflow.com › questions
How do I split an RDD into two or more RDDs? - Stack Overflow
Oct 6, 2015 · I'm looking for a way to split an RDD into two or more RDDs. The closest I've seen is Scala Spark: Split collection into several RDD? which is still a single RDD. If you're familiar …
stackoverflow.com
https://stackoverflow.com › questions
(Why) do we need to call cache or persist on a RDD
Mar 11, 2015 · 193 When a resilient distributed dataset (RDD) is created from a text file or collection (or from another RDD), do we need to call "cache" or "persist" explicitly to store the …
stackoverflow.com
https://stackoverflow.com › questions
Difference and use-cases of RDD and Pair RDD - Stack Overflow
May 6, 2016 · I am new to spark and trying to understand the difference between normal RDD and a pair RDD. What are the use-cases where a pair RDD is used as opposed to a normal …
stackoverflow.com
https://stackoverflow.com › questions
How to check the number of partitions of a Spark DataFrame …
Jan 19, 2019 · There are a number of questions about how to obtain the number of partitions of a n RDD and or a DataFrame : the answers invariably are: rdd.getNumPartitions or …
stackoverflow.com
https://stackoverflow.com › questions
RDD in Spark: where and how are they stored? - Stack Overflow
Jun 8, 2021 · Q2: if RDD_THREE depends on RDD_TWO and this in turn depends on RDD_ONE (lineage) if I didn't use the cache () method on RDD_THREE Spark should recalculate …
stackoverflow.com
https://stackoverflow.com › questions
What's the difference between RDD and Dataframe in Spark?
Aug 20, 2019 · RDD stands for Resilient Distributed Datasets. It is Read-only partition collection of records. RDD is the fundamental data structure of Spark. It allows a programmer to perform …

Pagination
- Next
- Next