▲ | rovr138 3 days ago | |
in spark, doesn't pyspark and sql both still get translated to scala? | ||
▲ | orochimaaru 3 days ago | parent [-] | |
Yes. But with pyspark there is a Python gateway, the sql I think is translated natively in spark. But when you create a dataframe in spark, that schema needs to be defined - or if it’s sql takes the form of the columns returned. Use of Python can create hotspots with data transfers between spark and the Python gateway. Python UDFs are a common culprit. Either way, my point is there are architectural and design points to your data solution that can cause many more problems than choice of language. |