Presto @ Facebook
Martin Traverso and Dain Sundstrom
Presto @ Facebook
• Ad-hoc/interactive queries for warehouse
• Batch processing for warehouse
• Analytics for user-facing ...
Analytics for Warehouse
Architecture
UI CLI Dashboards Other tools
Gateway
Presto Presto
Warehouse
Cluster
Warehouse
Cluster
Deployment
Presto
HDFS
Datanode
MR
HDFS
Datanode
MR
HDFS
Datanode
Presto
HDFS
Datanode
MR
HDFS
Datanode
Stats
• 1000s of internal daily active users
• Millions of queries each month
• Scan PBs of data every day
• Process trill...
Features
• Pipelined partition/split enumeration
• Streaming
• Admission control
• Resource management
• System reliability
Batch workloads
Batch Requirements
• INSERT OVERWRITE
• More data types
• UDFs
• Physical properties (partitioning, etc)
Analytics for
User-facing Products
Requirements
• Hundreds of ms to seconds latency, low variability
• Availability
• Update semantics
• 10 - 15 way joins
Architecture
Loader
Presto
Worker
Presto
Worker
Presto
Worker
MySQL
Loader
MySQL
MySQL
Client
Stats
• > 99.99% query success rate
• 100% system availability
• 25 - 200 concurrent queries
• 1 - 20 queries per second
•...
Presto Raptor
Requirements
• Large data sets
• Seconds to minutes latency
• Predictable performance
• 5-15 minute load latency
• Reliabl...
Basic Architecture
Coordinator
MySQL
Worker Flash
Worker Flash
Worker Flash
Client
But isn’t that exactly what
Hive does?
Additional Features
• Full featured and atomic DDL
• Table statistics
• Tiered storage
• Atomic data loads
• Physical orga...
Table Statistics
• Table is divided into shards
• Each shard is stored in a separate replication unit (i.e., file)
• Typica...
Table Schema in MySQL
Tables
id name
1 orders
2 line_items
3 parts
table1 shards
uuid nodes c1_min c1_max c2_min c2_max c3...
Tiered Storage
Coordinator
MySQL
Worker Flash
Worker Flash
Worker Flash
Client Backup
Tiered Storage
• One copy in local, expensive, flash
• Backup copy in cheap durable backup tier
• Currently Gluster interna...
Atomic Data Loads
• Import data periodically from streaming event system
• Internally a Scribe based system similar to Kaf...
Atomic Data Loads
INSERT INTO target
SELECT *
FROM source_stream
WHERE token BETWEEN ${last_token} AND ${next_token}
Loader Process
1. Record new job with “now” token in MySQL
2. Execute INSERT from last committed token to “now” token with...
Failure Recovery
• Loader crash
• Check status of jobs using external batch id
• INSERT hang
• Cancel query and rollback j...
Physical Organization
• Temporal organization
• Assure files don’t cross temporal boundaries
• Common filter clause
• Eases ...
Unorganized Data
Sort Columns
Time
Organized Data
Sort Columns
Time
Background Organization
• Compaction
• Balance data
• Eager data recover (from backup)
• Garbage collection
• Junk created...
Future Use Cases
• Hot data cache for Hadoop data
• 0-N local copies of “backup” tier
• Query results cache
• Raw, not rol...
of 31

Presto at Facebook - Presto Meetup @ Boston (10/6/2015)

Outline of major use cases of Presto at Facebook and overview of Raptor architecture and design.
Published on: Mar 6, 2016
Published in: Software      
Source: www.slideshare.net


Transcripts - Presto at Facebook - Presto Meetup @ Boston (10/6/2015)

  • 1. Presto @ Facebook Martin Traverso and Dain Sundstrom
  • 2. Presto @ Facebook • Ad-hoc/interactive queries for warehouse • Batch processing for warehouse • Analytics for user-facing products • Analytics over various specialized stores
  • 3. Analytics for Warehouse
  • 4. Architecture UI CLI Dashboards Other tools Gateway Presto Presto Warehouse Cluster Warehouse Cluster
  • 5. Deployment Presto HDFS Datanode MR HDFS Datanode MR HDFS Datanode Presto HDFS Datanode MR HDFS Datanode
  • 6. Stats • 1000s of internal daily active users • Millions of queries each month • Scan PBs of data every day • Process trillions of rows every day • 10s of concurrent queries
  • 7. Features • Pipelined partition/split enumeration • Streaming • Admission control • Resource management • System reliability
  • 8. Batch workloads
  • 9. Batch Requirements • INSERT OVERWRITE • More data types • UDFs • Physical properties (partitioning, etc)
  • 10. Analytics for User-facing Products
  • 11. Requirements • Hundreds of ms to seconds latency, low variability • Availability • Update semantics • 10 - 15 way joins
  • 12. Architecture Loader Presto Worker Presto Worker Presto Worker MySQL Loader MySQL MySQL Client
  • 13. Stats • > 99.99% query success rate • 100% system availability • 25 - 200 concurrent queries • 1 - 20 queries per second • <100ms - 5s latency
  • 14. Presto Raptor
  • 15. Requirements • Large data sets • Seconds to minutes latency • Predictable performance • 5-15 minute load latency • Reliable data loads (no duplicates, no missing data) • 10s of concurrent queries
  • 16. Basic Architecture Coordinator MySQL Worker Flash Worker Flash Worker Flash Client
  • 17. But isn’t that exactly what Hive does?
  • 18. Additional Features • Full featured and atomic DDL • Table statistics • Tiered storage • Atomic data loads • Physical organization
  • 19. Table Statistics • Table is divided into shards • Each shard is stored in a separate replication unit (i.e., file) • Typically 1 to 10 million rows • Node assignment and stats stored in MySQL
  • 20. Table Schema in MySQL Tables id name 1 orders 2 line_items 3 parts table1 shards uuid nodes c1_min c1_max c2_min c2_max c3_min c3_max 43a5 A 30 90 cat dog 2014 2014 6701 C 34 45 apple banana 2005 2015 9c0f A,D 25 26 cheese cracker 1982 1994 df31 B 23 71 tiger zebra 1999 2006
  • 21. Tiered Storage Coordinator MySQL Worker Flash Worker Flash Worker Flash Client Backup
  • 22. Tiered Storage • One copy in local, expensive, flash • Backup copy in cheap durable backup tier • Currently Gluster internally, but can be anything durable • Only assumes GET and PUT with client assigned ID methods
  • 23. Atomic Data Loads • Import data periodically from streaming event system • Internally a Scribe based system similar to Kafka or Kinesis • Provides continuation tokens • Loads performed using SQL
  • 24. Atomic Data Loads INSERT INTO target SELECT * FROM source_stream WHERE token BETWEEN ${last_token} AND ${next_token}
  • 25. Loader Process 1. Record new job with “now” token in MySQL 2. Execute INSERT from last committed token to “now” token with external batch id 3. Wait for INSERT to commit (check external batch status) 4. Record job complete 5. Repeat
  • 26. Failure Recovery • Loader crash • Check status of jobs using external batch id • INSERT hang • Cancel query and rollback job (verify status to avoid race) • Duplicate loader processes • Process guarantees only one job can complete • Monitor for lack of progress (catches no loaders also)
  • 27. Physical Organization • Temporal organization • Assure files don’t cross temporal boundaries • Common filter clause • Eases retention policies • Sorted files • Can reduce file sections processed (local stats) • Can reduce shards processed
  • 28. Unorganized Data Sort Columns Time
  • 29. Organized Data Sort Columns Time
  • 30. Background Organization • Compaction • Balance data • Eager data recover (from backup) • Garbage collection • Junk created by compaction, delete, balance, recovery
  • 31. Future Use Cases • Hot data cache for Hadoop data • 0-N local copies of “backup” tier • Query results cache • Raw, not rolled-up, data store for Sharded MySql customers • Materialized view store

Related Documents