If you are a data scientist then odds are that you spent most of your time working with either R or Python, the first was designed from the get-go to be a programming language for math and statistics, while the latter started as an amicable all-purpose programming language which became a safe-haven for those disgruntled by R’s syntax and limitations.
In this day in age, there is no question whatsoever of which of the 2 is more popular, as Python has taken the crown as the most popular programming language for data science. It’s not that R is bad, quite the opposite; it’s extremely powerful and bolsters one of the most academic-oriented communities in the field.
Unfortunately, R is also well known for its frankly abhorrent syntax, non-standardized solutions, and unoptimized memory management. It’s a technology that shows its age, but its plethora of scientific libraries and toolkits makes it a necessary addition to any data scientist’s toolbox.
That’s not to say that Python doesn’t have its fair share of problems as well. First and foremost, Python is slow, which might not mean much for small projects, but when you are dealing with gigabytes or terabytes of data, those extra seconds start to ramp up.
Pandas offer a great solution by adding methods that are faster than standard Python loops, but that’s a bandaid at best. As one developer puts it, the fastest way to loop in Python is to not loop at all.
Another point of contempt for Python is that it’s not capable of multithreading. Some libraries allow code to run concurrently, but in the end, those solutions will only get you so far due to the global interpreter lock (GIL).
That might not seem like a big issue but think about it. We’ve had multithreaded processors on the market for a long time, we have GPUs that are amazing at doing calculations, and all that power is barely getting used.
So, has Julia come to save the day?
Enter Julia
Julia is a programming language released in 2012 by Jeff Bezanson and collaborators. It’s, as they put it, a product of greed, the desire to create a high-level language that had performance similar to C—in short, to have it all in a single package.
The growth Julia has experienced as a math-friendly programming language is nothing short of staggering. To this date, Julia has been downloaded over 29 million times and has been implemented in over 10,000 companies worldwide. That might not seem like much, but for such a young language, it’s impressive.
To put it in perspective, Julia became one of the top 50 programming languages back in 2018 according to the TIOBE index. In 2021 it sits at 35th place and TIOBE predicts that it will breach the top 20 threshold by the end of the year.
Julia’s features
Julia’s founders wanted to create a technology with a liberal license, the speed of C, the dynamism of Ruby, the math syntax of MatLab, the usability of Python, and the applicability to statistics of R. That’s quite the high bar they set for themselves. what features did they design to achieve their goal?:
- Julia has a just-in-time compilation. For faster runtime performance, Julia is just-in-time (JIT) compiled using the LLVM compiler framework. In the hands of an expert, Julia can match speeds similar to C without sacrificing readability.
- Julia is interactive. Julia has an interactive command line, similar to what Python offers. You can create one-off scripts or try bits of code with a few key presses.
- Julia combines the benefits of dynamic typing and static typing. You can specify types for variables or create hierarchies of types to design general cases for handling variables of a specific type.
- Julia can call Python, C, R, Java, and Fortran libraries. Julia has foreign function interfaces for the most popular programming languages. Likewise, Julia can be called from other languages through its embedding API.
- Julia has straightforward syntax. While not as simple as Python or JavaScript, Julia has one of the easiest syntaxes on the market.
- Julia has a full-featured debugger. Like Python, Julia has one of the friendliest debugging tools out there that help you trace back the source of any error.
- Julia supports metaprogramming. Julia programs can create other programs, or rewrite their code, in a way that is reminiscent of languages like Lisp.
- Julia is optimized for parallel computing. Julia is designed for parallelism and provides built-in primitives for parallel computing at every level: instruction-level parallelism, multi-threading, GPU computing, and distributed computing.
On top of that Julia was built from the ground up with machine learning in mind, and already has a rather powerful suite of libraries for developing artificial intelligence. Examples of MLJ include classic machine learning algorithms like general linear models, decision trees, and clustering. Flux for deep learning, TextAnalysis for Natural Language Processing, and so on.
Julia is like the love-child of R’s math-centric paradigm and Python’s smooth learning curve and all-purpose functionalities. Imagine if Python already came with SciPy, NumPy, and Pandas as part of its core and you’ll have a pretty good idea of what Julia is.
Data scientists like Julia because it’s very easy to pick up. It blends well with other programming languages, you can add it to your projects with very little effort, and it creates a bridge between different programming languages. It’s an all-around one-solution for data analysis.
Julia Is Not Perfect
Yes, we are Julia fans, but we are also Python and R fans, and we are well aware of some of Julia’s limitations. For example, a minor quibble of mine is that Julia starts its array index at 1 instead of 0 in contrast with the default industry standard.
That’s a premeditated decision, mind you, one aimed to appeal to people who are coming from other math-oriented languages, but one that’s going to cause an infinite amount of headaches for people who are used to 0 as the first element of an array.
To be fair, 0-indexing has been added to Julia as an experimental feature, but it’s one example of what happens when you create a language for everyone and everything: some of the parts will end up clunking.
Finally, Julia’s biggest issue is that it’s young. The libraries and repositories pale in comparison to Python or even R, so working with Julia means having to do a lot of programming yourself. Of course, this is something that’s going to become less of an issue as the community grows, but it’s important to keep in mind.
Having said that, Julia is growing, and fast, and people seem to love the language. Fortunately, this isn’t a competition, and the more tools you have at your disposal, the better your job as a data scientist will be. I’m really glad that we can have Julia alongside Python and R.