Working in data science, it may be difficult to share ideas of complex data sets using only static figures. All facets that describe the form and meaning of interesting data are not always captured in a handful of previously generated figures. While we have powerful technologies available to present interactive figures, where a spectator can turn, filter, zoom and explore complex data, they always come with compensation.
Here I present my experience using a recently released Python Library –cost– That opens new exciting opportunities to publish interactive visualizations throughout the field of data science.
Interactive data display
Compensations to consider selecting an approach to present data viewing can be divided into three categories:
- Capacities – What visualizations and interactivity can I present to the user?
- Publication cost – What are the necessary resources to show this visualization to users (for example, execute servers, website accommodation)?
- Ease of use – How much of a new skill set / code base do I need to learn in advance?
JavaScriptIt is the basis of portable interactivity. Each user has a web browser installed on their computer and there are many different frames available to show any degree of interactivity or visualization that you can imagine (for example, this Gallery of incredible things that people have done with three.). Since the application is executed on the user's computer, no expensive servers are needed. However, a significant inconvenience for the data science community is the ease of use, since JS does not have many of the high -level libraries (that is, easy to use) that data scientists use for data manipulation , the layout and interactivity.
PitonIt provides a useful comparison point. By his Continuously growing popularitySome have called this the “era of python.” For data scientists in particular, Python is next to R as one of the fundamental languages to exercise complex data quickly and effectively. While Python may be easier to use than Javascript, there are fewer options to present interactive visualizations. Some popular projects that provide interactivity and visualization have been Flask, Dash and Rationalize(It is also worth mentioning –Bakeh, Holoviews, Altairand TRUST ). The greatest compensation to use Python has been the cost for publication: deliver the tool to users. In the same way as Glow They require a computer in execution to serve visualization, these python -based frames have been based exclusively on server. This is in no way prohibitive for authors with a budget to spend, but limits the number of users who can take advantage of a particular project.
Pyodide It is an intriguing intermediate land: python code that runs directly to the web browser using Web warning(Wasm). There are resource limitations (only 1 thread and 2 GB memory) that make this not very practical to do the heavy work of data science. HoweverThis may be more than enough to build visualizations and update depending on the user entry. Because it is executed in the browser, no servers to accommodate are required. The tools that use the pyodide as a base are interesting to explore because they provide data scientists the opportunity to write a python code that runs directly on user computers without having to install or execute anything outside the web browser.
As apart, I have been previously interested in a project that has tried this approach: Stlite , An implementation in the streamlit browser That allows you to implement these flexible and powerful applications in a wide range of users. However, a central limitation is that the transmission line itself is different from stlite (the port of rationalization to Wasm), which means that not all characteristics are compatible and that the progress of the project depends on two separate groups that work along compatible lines.
Presentation: Marimo
This leads us to Cost.
He First audience Marimo ads They were in January 2024, so the project is very new and has a unique combination of characteristics:
- The interface resembles a jupyter laptop That will be familiar to users.
- The execution of the cells is reagentso that the update of a cell will return to all the cells that depend on its exit.
- User input It can be captured with a flexible set of user interface components.
- Notebooks can quickly become Applicationshiding the code and showing only the input/output elements.
- Applications can be executed locally or becoming Static web pagesUsing wasm/pyoduro.
Marimo balances technology compensation in a way that adapts well to the skills set of typical data scientists:
- Capacities– The user's visual entry and visual functions are quite extensive, admitting user entrythrough Altair and Plotly plots.
- Publication cost– Implementation as static websites is basically free – No servers are required
- Ease of use– For users familiar with the Python notebooks, Marimo will feel very familiar and will be easy to collect.
Publish Marimo applications on the web
The best place to start with Marimo is reading Your extensive documentation.
As a simple example of the type of screen that can be useful in data science, which consists of an explanatory text interspersed with interactive screens, I have created a basic Github repository. Try it yourself here .
Using only some code, users can:
- Attach source data sets
- Generate visualizations with flexible interactivity
- Write narrative text that describes your findings
- Publish on the web free of charge (that is, using github pages)
For more details, read your DOCUMENTATION ON WEB PUBLICATIONand Template repository for implementation on github pages.
Public application / private data
This new technology offers a new exciting opportunity for collaboration: publish the application publicly in the world, but users can only see specific data sets to those who have permission to access.
Instead of building a dedicated data backend for each application, the user data can be stored in a generic backend that can be authenticated and access safely using a Python customer library, all content in the user's website. For example, the user receives an Oauth login link that will authenticate them with the backend and allow the application to temporarily access the input data.
As proof of concept, I built a simple visualization application that connects to The Cirro Data Platformwhich is used in my institution to administer scientific data. Complete dissemination: I was part of the team that built this platform before it occurred as an independent company. In this way, users can:
- Load the public visualization application, housed in the pages of Github
- Safely connect to your private data warehouse
- Load the appropriate data set for visualization
- Share a link that addresses authorized collaborators to the same data
Try it yourself here.

As a data scientist, this approach to publish free and open source display applications that can be used to interact with private data sets is extremely exciting. Building and publishing a new application can take hours and days instead of weeks and years, allowing researchers to quickly share their ideas with collaborators and then publish them in the world in general.