Service offer

Jolin.io is the consultancy for enabling Julia technology within the data and analysis pipeline of your business โ€” with a focus on machine learning, big data and real time pipelines.

Building a Julia proof-of-concept

Julia is quick in setup and prototyping โ€” an excellent foundation for developing a proof-of-concept (PoC).

In just 2 weeks, an initial solution can be developed, including a minimal dashboard for visualization and reporting purposes.

2 weeks
1st week
2 days data setup
3 days first prototype
2nd week
improvement day
finalizing day
2 days dashboard
presentation & future

Using the prototype, you can assess the concrete performance, development speed, and simplicity of the code, and see for yourself the benefits of Julia.

From PoC to production

Because the Julia language is fast by default, there is a high reuse factor when moving From PoC to production. No need to rewrite everything in C or Java to meet performance targets, like you may have to do when using R or Python.

When you want to move the prototype to production, we support you with everything you need.

We support you with
Development
  • code versioning
  • code packaging
  • containerization, docker
  • extensive test suite
  • extensive documentation
  • CI continuous integration
  • CD continuous deployment
Operations
  • parameterization
  • resource monitoring
  • alerting
  • operations documentation
  • high availability
  • failure recovery plans
Connectivity
  • realtime requirements
  • data base connections
  • dashboard
  • caching
GDPR
  • user consent
  • encryption
  • anonymization / pseudonymization
  • right to be forgotten
  • data versioning
  • data tracing
Science
  • experiment management
  • secure flexible compute environments
  • computation cost transparency
Machine Learning
  • model versioning
  • model packaging
  • model deployment
  • model re-training automation
  • model evaluation pipeline
  • performance monitoring
  • alerting

Migrating parts to Julia

Probably you already have a data science team with 2 - 20 developers and scientists. Most commonly, such projects are run in Python or R, or they use some proprietary software like Matlab or SAS. If you have higher performance requirements, your team may rather use Fortran, C++, or Java directly.

We support you with
Moving performance critical parts to Julia

Mainting a Julia part is far easier than mainting C++ or Fortran.

You can seamlessly port a single part of your system for performance improvements. Bridging Julia and the original language is done via Julia's excellent foreign function interface (C, Fortran, Python, R, Matlab, ...).

Building new components in Julia

Try out Julia for your new component. Julia is as easy to use as Python or R or even simpler.

Creating wrappers for C, Python & R is straightforward. See also our section about PoC.

Migrating entire projects to Julia

Migrating a whole project needs to be planned in great detail.

We support you with all stages

  • requirements assessment
  • feasibility evaluation
  • dependency tracking
  • time & resource planning
  • training
  • development
  • deployment
  • operations

Python, R, ...

You use Python or R?

In summary, you can plug and play Julia everywhere where you would use Python or R. Whether the concrete package you use has a mirror in Julia needs to be checked, but chances are very high you find something even better. Many of the Julia packages are best-in-class.

Python R Julia
interactive repl
virtual environments
debugging and profiling
dashboards
statistics libraries
machine learning
mathematical optimization
differential equations
Spark ยท big data ยท streaming
general purpose programming language ๐Ÿ”ฒ
meta programming ๐Ÿ”ฒ
fast ๐Ÿ”ฒ ๐Ÿ”ฒ
memory efficient ๐Ÿ”ฒ ๐Ÿ”ฒ
libraries mainly written in Python
Cython
C++
R
C++
Julia

R was the first scripting language for statistics and data science. It is still widely used today because of its interactivy and very good support for statistical tools.

Python has been adopted by some teams, mainly because of its better general purpose programming tools. Even today, Python's statistical support is not as good as R's, but for machine learning, the Python ecosystem is one of the best.

Julia is the latest language in this series, outperforming the other two by combining general purpose programming and meta programming together with C-like performance.

Matlab, SAS, ...

You are using a proprietary software solution for your data analytics?

There are many good reasons why you specifically would benefit greatly from switching to Julia.


no license fees
100% open
source

full flexibility & hackability


access to newest algorithms from academia

general purpose programming language

best performance also for custom methods

easy onboarding of new employees


knowledge sharing & community support

We know how big the project can be to transition from proprietary software to open source. That's why we provide end-to-end support from planning to development and
training.

For example it is also possible to start small by changing or adding one sample part first.

Fortran, C++, Java, ...

You already focus on performance and no high-level language was meeting your requirements?

Many companies with high demands on processing speed and memory aren't using Python, R, or Matlab, but still prefer low-level languages like C++ or Fortran. Simply because they are fast.

Julia is the language for you, finally.

Julia is
as fast as
C, Rust
or Fortran.

Calling C and Fortran from Julia is first class.
documentation

Embed Julia into your C/C++/Fortran code.
documentation

Julia is truly
high-level.

Julia was made to be the new Fortran, designed specifically for applied mathematics.

Whether you calculate advanced financial forecasts or critical uncertainity estimates, whether you model a large power grid or fold complex molecules: Julia makes such scientific computations performant and easy to use.

Big data, real time, & high performance computing

Your business already has large amounts of data, and the trend is rising?

Today, data pipelines are not only compute-intensive, but should also be scalable in size and speed to always keep up with your company's standards.

Julia enables you to build scalable data transformations, without the need to sacrifice preformance for customizability or ease of use.

We support you with
High performance computing
  • setting up HPC clusters
  • developing & configuring Julia jobs
  • deployment & scheduling
Streaming
  • setting up Kafka
  • developing Julia consumers & producers
  • deployment kafka processors
Datalakes & big data
  • setting up data lake
  • centralized data storage
  • Julia batch jobs
  • job scheduling
Small data
  • creating Julia data pipelines
  • very light and efficient
  • possibility to easily scale later on
for all
  • machine learning
  • GDPR
  • security
  • cloud deployment
  • infrastructure as code
  • monitoring & alerting
  • integration with dashboards

Big data

Traditional databases are tied to a single machine and hard to scale for really large data.

Open source solutions that distribute storage and computation across multiple machines are still rather rare today. Many of the existing distributed frameworks can be quite complex to set up and maintain.

Julia was designed with distributed computation in mind from the start, which makes it much easier to work with. One can reuse the existing small data ecosystem and, as usual in Julia, have first-class support for efficient custom functions.

In addition there is already a large set of dedicated tools for distributed computation:

  • core components for distributed computation
  • works out-of-the-box with high performance clusters like Slurm
  • distributed Arrays
  • distributed Tables
  • Actor model
  • distributed directed-acyclic-graphs (DAGs)
  • distributed GPUs

Real time

While batch processing is often the cost-effective solution for throughput-optimized data pipelines, latency requirements are best solved with streaming architectures.

Julia's performance is so good that you can build business critical components with very low latencies.

This is very useful if you are using applied mathematics in any form, whether for optimization, simulation, or just current machine learning algorithms. One great example is robot control:

โ€œ

One area in which we see significant advantages using Julia is in developing online controllers (that is, controllers which run in real time on the robot, typically with control rates of 100-1000 Hz). Modern controllers for walking robots typically involve much more complicated computation than a simple linear feedback controller, and most humanoid robots are controlled by solving mathematical optimizations at these high rates. Even setting up these optimization problems can be complex, so it is extremely useful to have a language like Julia that combines excellent support for mathematical programming, useful optimization libraries such as JuMP.jl, and highly performant code when developing new robot controllers.

โ€

Source: MIT Robotics

Julia comes with a garbage collector, that can cause runtime pauses in the range of milliseconds. In most streaming applications, milliseconds don't matter at all, but if your real time requirements are that strict, Julia offers ways to make your code garbage-collector-free.

Pre-allocating resources is the most common way to deal with this scenario. In addition, Julia provides the ability to create custom data structures which are invisible to the garbage collector (called stack-allocated).

โ€œ

Given the constraint of avoiding jitter due to dynamic allocation and online JIT [Just-In-Time] compilation, we find Julia to be more productive than Java. First, Julia provides immutable, stack-allocated user data types [...] Second, JIT compilation is also better than in Java.

โ€

Source: MIT Robotics

Besides these super-low-latency applications you can use Julia with Apache Kafka or RabbitMQ.

Small data

Small data is everywhere. Both big data and real time frameworks use small data libraries for their local computations. And then, of course, there are all those data pipelines which don't need to scale (yet).

That's why it is critical to be able to work with small data efficiently and in a production-ready way.

Dataframes

The most common tool for such pipelines are Dataframes, which are simply tables.

Comparing Dataframes Python
pandas
R
tibble
Scala
Spark
Julia
DataFrames.jl
select, filter, groupby, sort, join, ...
import export to csv, parquet, ...
careful handling of missing data ๐Ÿ”ฒ ๐Ÿ”ฒ*
general support for complex/custom types
correct type-representation for complex/custom types ๐Ÿ”ฒ
optimal performance for complex/custom types ๐Ÿ”ฒ ๐Ÿ”ฒ ๐Ÿ”ฒ**
* Only for Scala and R. Spark uses `null` for representing missing.
** Spark's performance for custom types holds only for Scala (not for Python or R).

Taking Apache Spark as a good baseline for a production-ready industry-standard Dataframe implementation, we see that Python pandas in particular has some significant drawbacks. However, Sparks's performance for custom types only applies to Scala, not Python or R.

Julia's Dataframe was designed with a focus on full flexibility and performance. This goes hand in hand with careful type-handling and results in a truly production-ready Dataframe.

n-dimensional Arrays

The second most important tool is the n-dimensional Array. This is a data structure with either one, two, or more dimensions, but containing the same type of data everywhere. For comparison, a dataframe can have a different data type for each column.

Comparing n-dimensional Arrays Python
numpy
R
array
Julia
Array
elementwise operations
broadcasting ๐Ÿ”ฒ
careful handling of missing data ๐Ÿ”ฒ
general support for complex/custom types
correct type-representation for complex/custom types ๐Ÿ”ฒ
optimal performance for complex/custom types ๐Ÿ”ฒ ๐Ÿ”ฒ
Since Scala Spark does not provide a generic n-dimensional Array, we omitted Scala for this comparison.

Similar to the Dataframes case, the Python/R version for Arrays have crucial disadvantages which should be considered when deciding for a production-ready data pipeline tool.

Julia's Array was designed with a focus on full flexibility and performance. This goes hand-in-hand with careful handling of data types, resulting in a truly production-ready n-dimensional Array implementation.

Individual training & consultancy

If you have any questions or requirements that are not covered here, please contact us.

Julia is a very flexible and promising technology. Let's work together, and increase the value of your data processing.

email hello@jolin.io or call
+49 152 2406 7803

For organizing workshops and invidual trainings, please contact us too.

workshops ยท trainings | introductory ยท advanced
Programming languages for data processing
  • Julia
  • Python
  • R
  • Scala
  • Matlab
Challenges and solutions for scaling
  • processing very large amounts of data (big data).
  • real time processing
Infrastructure
  • cloud solutions
  • internal computer clusters
  • setup
  • maintenance
  • infrastructure as code
Product realization
  • versioning & packaging & registration
  • parameterization & reusability
  • automated rollout & testing
  • monitoring & alerting
  • automated execution of data processing flows
  • automated training & (re)evaluation of machine learning models
  • visualization and presentation of results
Special software packages
  • dataframes
  • graphs
  • mathematical optimization
  • machine learning
  • differential equation systems
  • ...
Visualization of data
  • dashboards
  • interactive data visualizations
  • customized solutions
Open to a brief exchange?
e-mail hello@jolin.io or call/whatsapp/signal
+49 152 2406 7803