Check out the schedule of sessions for AbracaData 2025.

Title

2 packages for ethnicity imputation

by Jesus Ramos

Imputation/inference of ethnicity is, sometimes contrary to privacy concerns, an important and often executed task in the data/feature engineering space for the public and social sector. We will take a look at 2 python packages to perform this task and look at their pros and cons, and where they work best.

AI in BI: From Lane Assist to Full Self-Driving Analytics

by Maxime Beauchemin

The promise of AI in business intelligence is enticing—ask a question in natural language and receive an accurate chart or metric, no SQL required. While this works in controlled environments, real-world analytics demand precision where 90% accuracy isn’t sufficient. Drawing parallels to self-driving cars, we’re currently at a “lane assist” stage in BI, where AI can suggest and automate parts of the workflow but still requires human oversight.

In this talk, I’ll delve into the lessons learned from building AI features like text-to-SQL on top of Apache Superset at Preset. We’ll explore where AI excels in BI, where it falls short, and what it takes to build trust in AI-assisted analytics. Key topics include the importance of context-rich interfaces, seamless human-AI handoffs, and robust feedback loops. We’ll also discuss the challenges of real-world data environments and the necessity of transparency in AI outputs.

Airflow 3 is all you need (for your AI/ML flows)

by Jarek Potiuk

Do you consider productionizing your machine learning workflows and integrating them with all the data processing pipelines you already have? Does your team struggle with multiple people developing their own ways of running team and your monitoring and SRE guys scratching their head on how to efficiently monitor and run all those loosely connected pieces of logic? Airflow 3 to the rescue. With the shiny new Airflow 3 released earlier this year, being result of years of learning and listening to the users and invested time and effort of literally 100s of people to make it happen, you now have the most popular workflow engine that processes data available for you to make all your production pipelines manageable and rock solid. If all you want to hear is just humming of the processing of your data rather than panicky screams of your business that need things working - this talk is for you.

Building LLM-Powered Apps with Apache Airflow

by Ephraim Anierobi

In this session, we’ll unpack how Apache Airflow orchestrates the end-to-end workflows behind Ask Astro—an LLM-powered assistant designed to answer questions about Airflow itself. We’ll walk through how the system manages data ingestion, embedding generation, vector index updates, and prompt execution. Whether working with open-source models or commercial APIs, you’ll gain practical insights into building and scaling LLM applications with Airflow.

Causal Inference for Homelessness Research

by Brian Blackwell

Homelessness is the number one issue in public policy in Los Angeles, with over 75,000 people experiencing homelessness on any given night in Los Angeles County. Since its inception in 2017, the California Policy Lab (CPL) has been one of the leading academic research organizations focused on homelessness in Los Angeles, focused on the intersection of predictive modeling and causal inference. In this presentation, we will give a brief introduction to causal inference – a set of statistical methods for inferring relations of cause and effect in observational and experimental data – and describe some of CPL’s causal inference projects related to homelessness, including the impacts of time limited rental subsidies and homelessness prevention.

Event-Driven AI: Building Intelligent Systems with Atlas Stream Processing

by Alejandro Mancilla

Discover how to build responsive AI systems that react to real-time data using event-driven architectures. This beginner-friendly session introduces Atlas Stream Processing as your foundation for creating dynamic AI applications that process continuous data streams and adapt to changing conditions. You’ll learn practical approaches to:

Understanding event-driven design principles for AI applications Processing data streams for real-time ML predictions Building resilient pipelines that handle errors gracefully Implementing simple event patterns for common AI use cases Monitoring your AI systems effectively

Machine Learning Informed Financial Assistance for Homelessness Prevention

by Kasun Amarasinghe

Local governments implement Rental Assistance Programs to provide temporary financial assistance to renters struggling to pay their rent to keep them stably housed and prevent homelessness. However, as the number of vulnerable households increases and funding for rental assistance programs remains limited, program administrators must prioritize households to help. Although identifying and prioritizing the most vulnerable individuals is a top concern, the typical practice is to prioritize individuals using simple heuristics such as “first-come-first-served”. This talk will describe a collaborative effort between our team at Carnegie Mellon University and the Allegheny County Department of Human Services, which examined the utility of using Machine Learning (ML) models to inform the prioritization of rental assistance resources to minimize eviction-caused homelessness.

Open Models: Use cases, fine-tuning and tooling ecosystem

by Omar Sanseviero

Although many models work well out of the box, many times you want to customize them with your own data or for a specific domain, adding new features and boosting performance. In this talk, Omar will explain how to fine-tune LLMs, covering topics such as RLHF (reinforcement learning from human feedback) and quantization. Omar will share some of the latest advances in open models, the ecosystem of tools to build your own AI solutions, and how to evaluate and choose the best models according to your needs.

Responsible Machine Learning with Post Modeling

by Liliana Millan

In real-world projects, modeling is not the most important aspect—it’s just one component that interacts with others to create a solution that should improve the outcomes of the people who will be affected by it. In this talk, we will review the different components that make up a complete and responsible deployed AI/ML solution and focus on post-modeling, a part that is rarely explored in both academic and industry settings.

Scaling Machine Learning in Enterprises

by Francys Lanza

Today, with the rapid advancements in technology, research, and tools, building machine learning models has become increasingly accessible. Delivering them consistently, reliably, and at scale in a real business environment? That’s where the real challenge begins.

In this talk, I’ll share how we designed and deployed an enterprise-grade, fully automated ML pipeline at Bankaya, built to enable continuous integration, delivery, and automated retraining of machine learning models across multiple business areas like risk, collections, and marketing.

Workshop: Power your data orchestration with Apache Airflow 3

by Kenten Danas

Apache Airflow is the world’s most popular open source data orchestration tool, used by companies all over the world and in every industry to manage their most critical data workflows. Airflow 3.0, released in April 2025, is the most significant release in Airflow’s history, and brings a better user experience, stronger security, and the ability to run tasks anywhere, at any time. In this workshop, you’ll get hands-on experience with the new release and learn how to leverage new features like DAG versioning, backfills, data assets, and a new react-based UI.

Workshop: Your pipeline writes prompts, how to use Gemini for inference with Apache Beam

by Israel Herraiz

In this workshop, you will use Apache Beam, Google Colab and Google AI Studio to use Gemini from our Beam pipeline, to enrich your data processing with the output provided by Gemini. This tutorial will be an example of how to leverage Beam ML and RunInference to easily integrate any AI or ML model into your pipelines.

To join this workshop you will need a GMail account and access to Google Colab and Google AI Studio (available for free with any GMail account)