Service offer

Jolin.io is the consultancy for enabling Julia technology within the data and analysis pipeline of your business — with a focus on machine learning, big data and real time pipelines.

Building a Julia proof-of-concept

Julia is quick in setup and prototyping — an excellent foundation for developing a proof-of-concept (PoC).

In just 2 weeks, an initial solution can be developed, including a minimal dashboard for visualization and reporting purposes.

2 weeks

1st week

2nd week

2 days data setup

3 days first prototype

2nd week

improvement day

finalizing day

2 days dashboard

presentation & future

Using the prototype, you can assess the concrete performance, development speed, and simplicity of the code, and see for yourself the benefits of Julia.

From PoC to production

Because the Julia language is fast by default, there is a high reuse factor when moving From PoC to production. No need to rewrite everything in C or Java to meet performance targets, like you may have to do when using R or Python.

When you want to move the prototype to production, we support you with everything you need.

We support you with

Development

code versioning
code packaging
containerization, docker
extensive test suite
extensive documentation
CI continuous integration
CD continuous deployment

Operations

parameterization
resource monitoring
alerting
operations documentation
high availability
failure recovery plans

Connectivity

realtime requirements
data base connections
dashboard
caching

GDPR

user consent
encryption
anonymization / pseudonymization
right to be forgotten
data versioning
data tracing

Science

experiment management
secure flexible compute environments
computation cost transparency

Machine Learning

model versioning
model packaging
model deployment
model re-training automation
model evaluation pipeline
performance monitoring
alerting

Migrating parts to Julia

Probably you already have a data science team with 2 - 20 developers and scientists. Most commonly, such projects are run in Python or R, or they use some proprietary software like Matlab or SAS. If you have higher performance requirements, your team may rather use Fortran, C++, or Java directly.

We support you with

Moving performance critical parts to Julia

Maintaining a Julia part is far easier than maintaining C++ or Fortran.

You can seamlessly port a single part of your system for performance improvements. Bridging Julia and the original language is done via Julia’s excellent foreign function interface (C, Fortran, Python, R, Matlab, …).

Building new components in Julia

Try out Julia for your new component. Julia is as easy to use as Python or R or even simpler.

Creating wrappers for C, Python & R is straightforward. See also our section about PoC.

Migrating entire projects to Julia

Migrating a whole project needs to be planned in great detail.

We support you with all stages

requirements assessment
feasibility evaluation
dependency tracking
time & resource planning
training
development
deployment
operations

Python, R, …

You use Python or R?

In summary, you can plug and play Julia everywhere where you would use Python or R. Whether the concrete package you use has a mirror in Julia needs to be checked, but chances are very high you find something even better. Many of the Julia packages are best-in-class.

	Python	R	Julia
interactive repl
virtual environments
debugging and profiling
dashboards
statistics libraries
machine learning
mathematical optimization
differential equations
Spark · big data · streaming
general purpose programming language		🔲
meta programming	🔲
fast	🔲	🔲
memory efficient	🔲	🔲
libraries mainly written in	Python Cython C++	R C++	Julia

R was the first scripting language for statistics and data science. It is still widely used today because of its interactivy and very good support for statistical tools.

Python has been adopted by some teams, mainly because of its better general purpose programming tools. Even today, Python’s statistical support is not as good as R’s, but for machine learning, the Python ecosystem is one of the best.

Julia is the latest language in this series, outperforming the other two by combining general purpose programming and meta programming together with C-like performance.

Matlab, SAS, …

You are using a proprietary software solution for your data analytics?

There are many good reasons why you specifically would benefit greatly from switching to Julia.

no license fees
100% open
source

full flexibility & hackability

access to newest algorithms from academia

general purpose programming language

best performance also for custom methods

easy onboarding of new employees

knowledge sharing & community support

We know how big the project can be to transition from proprietary software to open source. That’s why we provide end-to-end support from planning to development and training.

For example it is also possible to start small by changing or adding one sample part first.

Fortran, C++, Java, …

You already focus on performance and no high-level language was meeting your requirements?

Many companies with high demands on processing speed and memory aren’t using Python, R, or Matlab, but still prefer low-level languages like C++ or Fortran. Simply because they are fast.

Julia is the language for you, finally.

Julia is
as fast as
C, Rust
or Fortran.

Calling C and Fortran from Julia is first class.
documentation

Embed Julia into your C/C++/Fortran code.
documentation

Julia is truly
high-level.

Julia was made to be the new Fortran, designed specifically for applied mathematics.

Whether you calculate advanced financial forecasts or critical uncertainity estimates, whether you model a large power grid or fold complex molecules: Julia makes such scientific computations performant and easy to use.

Big data, real time, & high performance computing

Your business already has large amounts of data, and the trend is rising?

Today, data pipelines are not only compute-intensive, but should also be scalable in size and speed to always keep up with your company’s standards.

Julia enables you to build scalable data transformations, without the need to sacrifice performance for customizability or ease of use.

We support you with

High performance computing

setting up HPC clusters
developing & configuring Julia jobs
deployment & scheduling

Streaming

setting up Kafka
developing Julia consumers & producers
deployment kafka processors

Datalakes & big data

setting up data lake
centralized data storage
Julia batch jobs
job scheduling

Small data

creating Julia data pipelines
very light and efficient
possibility to easily scale later on

for all

machine learning
GDPR-
security-
cloud deployment-
infrastructure as code-
monitoring & alerting-
integration with dashboards-

Big data

Traditional databases are tied to a single machine and hard to scale for really large data.

Open source solutions that distribute storage and computation across multiple machines are still rather rare today. Many of the existing distributed frameworks can be quite complex to set up and maintain.

Julia was designed with distributed computation in mind from the start, which makes it much easier to work with. One can reuse the existing small data ecosystem and, as usual in Julia, have first-class support for efficient custom functions.

In addition there is already a large set of dedicated tools for distributed computation:

core components for distributed computation
works out-of-the-box with high performance clusters like Slurm
distributed Arrays
distributed Tables
Actor model
distributed directed-acyclic-graphs (DAGs)
distributed GPUs

Real time

While batch processing is often the cost-effective solution for throughput-optimized data pipelines, latency requirements are best solved with streaming architectures.

Julia’s performance is so good that you can build business critical components with very low latencies.

This is very useful if you are using applied mathematics in any form, whether for optimization, simulation, or just current machine learning algorithms. One great example is robot control:

“

One area in which we see significant advantages using Julia is in developing online controllers (that is, controllers which run in real time on the robot, typically with control rates of 100-1000 Hz). Modern controllers for walking robots typically involve much more complicated computation than a simple linear feedback controller, and most humanoid robots are controlled by solving mathematical optimizations at these high rates. Even setting up these optimization problems can be complex, so it is extremely useful to have a language like Julia that combines excellent support for mathematical programming, useful optimization libraries such as JuMP.jl, and highly performant code when developing new robot controllers.

”

Source: MIT Robotics

Julia comes with a garbage collector, that can cause runtime pauses in the range of milliseconds. In most streaming applications, milliseconds don’t matter at all, but if your real time requirements are that strict, Julia offers ways to make your code garbage-collector-free.

Pre-allocating resources is the most common way to deal with this scenario. In addition, Julia provides the ability to create custom data structures which are invisible to the garbage collector (called stack-allocated).

“

Given the constraint of avoiding jitter due to dynamic allocation and online JIT[Just-In-Time] compilation, we find Julia to be more productive than Java. First, Julia provides immutable, stack-allocated user data types […] Second, JIT compilation is also better than in Java.

”

Source: MIT Robotics

Besides these super-low-latency applications you can use Julia with Apache Kafka or RabbitMQ.

Small data

Small data is everywhere. Both big data and real time frameworks use small data libraries for their local computations. And then, of course, there are all those data pipelines which don’t need to scale (yet).

That’s why it is critical to be able to work with small data efficiently and in a production-ready way.

Dataframes

The most common tool for such pipelines are Dataframes, which are simply tables.

Comparing Dataframes	Python pandas	R tibble	Scala Spark
select, filter, groupby, sort, join, …
import export to csv, parquet, …
careful handling of missing data	🔲		🔲*
general support for complex/custom types
correct type-representation for complex/custom types	🔲
optimal performance for complex/custom types	🔲	🔲	🔲**

* Only for Scala and R. Spark uses null for representing missing.

** Spark’s performance for custom types holds only for Scala (not for Python or R).

Taking Apache Spark as a good baseline for a production-ready industry-standard Dataframe implementation, we see that Python pandas in particular has some significant drawbacks. However, Sparks’s performance for custom types only applies to Scala, not Python or R.

Julia’s Dataframe was designed with a focus on full flexibility and performance. This goes hand in hand with careful type-handling and results in a truly production-ready Dataframe.

n-dimensional Arrays

The second most important tool is the n-dimensional Array. This is a data structure with either one, two, or more dimensions, but containing the same type of data everywhere. For comparison, a dataframe can have a different data type for each column.

Comparing n-dimensional Arrays	Python numpy	R array
elementwise operations
broadcasting		🔲
careful handling of missing data	🔲
general support for complex/custom types
correct type-representation for complex/custom types	🔲
optimal performance for complex/custom types	🔲	🔲

Since Scala Spark does not provide a generic n-dimensional Array, we omitted Scala for this comparison.

Similar to the Dataframes case, the Python/R version for Arrays have crucial disadvantages which should be considered when deciding for a production-ready data pipeline tool.

Julia’s Array was designed with a focus on full flexibility and performance. This goes hand-in-hand with careful handling of data types, resulting in a truly production-ready n-dimensional Array implementation.

Individual training & consultancy

If you have any questions or requirements that are not covered here, please contact us.

Julia is a very flexible and promising technology. Let’s work together, and increase the value of your data processing.

email hello@jolin.io or call +49 152 2406 7803

For organizing workshops and invidual trainings, please contact us too.

workshops · trainings | introductory · advanced

Programming languages for data processing

Julia
Python
R
Scala
Matlab

Challenges and solutions for scaling

processing very large amounts of data (big data).
real time processing

Infrastructure

cloud solutions
internal computer clusters
setup
maintenance
infrastructure as code

Product realization

versioning & packaging & registration
parameterization & reusability
automated rollout & testing
monitoring & alerting
automated execution of data processing flows
automated training & (re)evaluation of machine learning models
visualization and presentation of results

Special software packages

dataframes
graphs
mathematical optimization
machine learning
differential equation systems
…

Visualization of data

dashboards
interactive data visualizations
customized solutions

Open to an exchange?

e-mail hello@jolin.io or call/whatsapp/signal

+49 152 2406 7803