Real-Time Analytics: Going beyond stream processing with Apache Pinot
David G. Simmons Head of Developer Advocacy
davidgsIoT
What is Real-Time Analytics? Real-time analytics is the discipline that applies logic and mathematics to data to provide insights for making better decisions quickly.
davidgsIoT
The value of data over time Value
Real-Time Who’s interested in this data? ● Analysts ● Management ● Users
Time
davidgsIoT
Real-Time Analytics Quadrant Machine Facing
Observability
Internal
Recommendation Engine Fraud Detection
External
Real-Time Dashboard Human Facing
Order Tracking Service davidgsIoT
Examples of Real-Time Analytics
Total users
700 Million
QPS
10000+
Latency SLA
< 100 ms p99th
Freshness
Seconds
davidgsIoT
Examples of Real-Time Analytics
Missed orders
Inaccurate orders
Top selling items
Total users
500,000+
QPS
100s
Latency SLA
< 100 ms p99th
Freshness
Seconds - Minutes
Menu item Feedback Downtime
davidgsIoT
Examples of Real-Time Analytics
Source: Peter Bakkum, Engineering Manager @Stripe Financial
davidgsIoT
Properties of Real-Time Analytics Systems
davidgsIoT
Building a User-facing Real-Time Analytics System
Seconds Freshness
Real-Time Ingestion High Dimensionality
1000s of QPS
Velocity of ingestion
Milliseconds Latency Highly Available
Scalable
Cost Effective
davidgsIoT
What is Apache Pinot?
davidgsIoT
Apache Pinot Architecture Seg1 -> S1, S1 S4 Seg2 -> S2, S2 S3 Seg3 -> S3, S3 S1 Seg4 -> S4, S4 S2 Pinot Controller
select count(*) from X where country = us
Pinot Broker
Zookeeper
3
1
S1
2
4
S2
2
3
S3
Pinot Servers
4
1
S4
<insert any user-facing real-time analytics use case here>
davidgsIoT
Powered by Apache Pinot Performance
Community
100+ Companies
2400+ Slack Users
3.9k Github Stars
1M+ 200k+
Events/sec
Peak QPS
ms
Query Latency
pinot.apache.org
davidgsIoT
Takeaways ● Real-time analytics lets us create applications that give users actionable insights ● Properties of these systems: Fresh data, fast querying, at scale ● Kafka + Pinot is the perfect combination to achieve this
davidgsIoT