Introduction to Apache Pinot

A presentation at StarTree Meetup in April 2023 in United States by David G. Simmons

Slide 1

Slide 1

Real-Time Analytics: Going beyond stream processing with Apache Pinot David G. Simmons Head of Developer Advocacy davidgsIoT

Slide 2

Slide 2

What is Real-Time Analytics? Real-time analytics is the discipline that applies logic and mathematics to data to provide insights for making better decisions quickly. davidgsIoT

Slide 3

Slide 3

Events -> Insight -> Action Events Insight Action davidgsIoT

Slide 4

Slide 4

The value of data over time Value Real-Time Who’s interested in this data? ● Analysts ● Management ● Users Time davidgsIoT

Slide 5

Slide 5

Real-Time Analytics Quadrant Machine Facing Observability Internal Recommendation Engine Fraud Detection External Real-Time Dashboard Human Facing Order Tracking Service davidgsIoT

Slide 6

Slide 6

Examples of Real-Time Analytics Total users 700 Million QPS 10000+ Latency SLA < 100 ms p99th Freshness Seconds davidgsIoT

Slide 7

Slide 7

Examples of Real-Time Analytics Missed orders Inaccurate orders Top selling items Total users 500,000+ QPS 100s Latency SLA < 100 ms p99th Freshness Seconds - Minutes Menu item Feedback Downtime davidgsIoT

Slide 8

Slide 8

Examples of Real-Time Analytics Source: Peter Bakkum, Engineering Manager @Stripe Financial davidgsIoT

Slide 9

Slide 9

Properties of Real-Time Analytics Systems davidgsIoT

Slide 10

Slide 10

Building a User-facing Real-Time Analytics System Seconds Freshness Real-Time Ingestion High Dimensionality 1000s of QPS Velocity of ingestion Milliseconds Latency Highly Available Scalable Cost Effective davidgsIoT

Slide 11

Slide 11

What is Apache Pinot? davidgsIoT

Slide 12

Slide 12

Apache Pinot Architecture Seg1 -> S1, S1 S4 Seg2 -> S2, S2 S3 Seg3 -> S3, S3 S1 Seg4 -> S4, S4 S2 Pinot Controller select count(*) from X where country = us Pinot Broker Zookeeper 3 1 S1 2 4 S2 2 3 S3 Pinot Servers 4 1 S4

Slide 13

Slide 13

<insert any user-facing real-time analytics use case here> davidgsIoT

Slide 14

Slide 14

Powered by Apache Pinot Performance Community 100+ Companies 2400+ Slack Users 3.9k Github Stars 1M+ 200k+ Events/sec Peak QPS ms Query Latency pinot.apache.org davidgsIoT

Slide 15

Slide 15

Takeaways ● Real-time analytics lets us create applications that give users actionable insights ● Properties of these systems: Fresh data, fast querying, at scale ● Kafka + Pinot is the perfect combination to achieve this davidgsIoT

Slide 16

Slide 16

Thank you! davidgs@startree.ai @davidgs@tty0.social @davidgsIoT in/davidgsimmons davidgs.com dev.startree.ai davidgsIoT