Real-Time Analytics: Going beyond stream processing with Apache Pinot
David G. Simmons Head of Developer Advocacy
davidgsIoT
Slide 2
What is Real-Time Analytics? Real-time analytics is the discipline that applies logic and mathematics to data to provide insights for making better decisions quickly.
davidgsIoT
The value of data over time Value
Real-Time Who’s interested in this data? ● Analysts ● Management ● Users
Time
davidgsIoT
Slide 5
Real-Time Analytics Quadrant Machine Facing
Observability
Internal
Recommendation Engine Fraud Detection
External
Real-Time Dashboard Human Facing
Order Tracking Service davidgsIoT
Slide 6
Examples of Real-Time Analytics
Total users
700 Million
QPS
10000+
Latency SLA
< 100 ms p99th
Freshness
Seconds
davidgsIoT
Slide 7
Examples of Real-Time Analytics
Missed orders
Inaccurate orders
Top selling items
Total users
500,000+
QPS
100s
Latency SLA
< 100 ms p99th
Freshness
Seconds - Minutes
Menu item Feedback Downtime
davidgsIoT
Slide 8
Examples of Real-Time Analytics
Source: Peter Bakkum, Engineering Manager @Stripe Financial
davidgsIoT
Slide 9
Properties of Real-Time Analytics Systems
davidgsIoT
Slide 10
Building a User-facing Real-Time Analytics System
Seconds Freshness
Real-Time Ingestion High Dimensionality
1000s of QPS
Velocity of ingestion
Milliseconds Latency Highly Available
Scalable
Cost Effective
davidgsIoT
Slide 11
What is Apache Pinot?
davidgsIoT
Slide 12
Apache Pinot Architecture Seg1 -> S1, S1 S4 Seg2 -> S2, S2 S3 Seg3 -> S3, S3 S1 Seg4 -> S4, S4 S2 Pinot Controller
select count(*) from X where country = us
Pinot Broker
Zookeeper
3
1
S1
2
4
S2
2
3
S3
Pinot Servers
4
1
S4
Slide 13
<insert any user-facing real-time analytics use case here>
davidgsIoT
Slide 14
Powered by Apache Pinot Performance
Community
100+ Companies
2400+ Slack Users
3.9k Github Stars
1M+ 200k+
Events/sec
Peak QPS
ms
Query Latency
pinot.apache.org
davidgsIoT
Slide 15
Takeaways ● Real-time analytics lets us create applications that give users actionable insights ● Properties of these systems: Fresh data, fast querying, at scale ● Kafka + Pinot is the perfect combination to achieve this
davidgsIoT