11
Presto - Analytical Database
Wojciech Biela
Łukasz Osipiuk
https://prestodb.io
2
Who are we?
Center for Hadoop
3
History of Presto
FALL 2012
6 developers
start Presto
development
FALL 2014
88 Releases
41 Contributors
3943 Commits
FAL...
4
➔ 100% open source distributed ANSI SQL engine for Big Data
➔ Optimized for low latency, Interactive querying
◆ Cross pl...
5
High level architecture
Data stream API
Worker
Data stream API
Worker
Coordinator
Metadata
API
Parser/
analyzer Planner ...
6
Plan execution
Hive Presto
map
reduce
I/O
I/O
I/O
I/O
I/O
task task
task task
task task
task
I/O
7
Presto Extensibility – connector interfaces
Parser/
analyzer Planner
Worker
Data location API
Hive
Cassandra
Kafka
MySQL...
8
Presto Extensibility – plugins
➔ Connectors
➔ Data types
➔ Extra functions
➔ Security providers
9
➔ Facebook
◆ Multiple production clusters (100s of nodes total)
● Including 300PB Hadoop data warehouse
● Single cluster...
10
Netflix Data Pipeline
Suro / Kafka Cassandra
AegisthusUrsula
Amazon S3
TVs mobile laptop
dimensionsevents
TD
TVs mobile...
11
Presto use-cases at Facebook
➔ three use cases
◆ Data warehouse - big data
◆ User facing - small data
◆ User facing - m...
12
Presto use-cases at Facebook (data warehouse)
HDFS data warehouse
13
Presto use-cases at Facebook (data warehouse)
➔ Multiple clusters
➔ O(103
) of users
➔ O(106
) queries per month
➔ peta...
14
Presto use-cases at Facebook (data warehouse)
Loader
Client
Presto
Data Node
Presto
Data Node
M/R
Data Node
M/R
Data No...
15
Presto use-cases at Facebook (data warehouse)
Client
Presto
Presto
Dispatcher
Presto
Presto
Presto
Presto
Presto
16
Presto use-cases at Facebook (realtime)
Real time user facing
17
Presto use-cases at Facebook (realtime)
Requirements
➔ User facing
➔ 0.1-5 seconds latency
➔ Support for data updates
➔...
18
Presto use-cases at Facebook (realtime)
Loader
Client
mysql
Presto
Presto
Presto
mysql
mysql
mysql
mysql
19
Presto use-cases at Facebook (semi realtime)
Requirements
➔ Large data sets (smaller than warehouse)
➔ seconds to minut...
20
Presto use-cases at Facebook (semi realtime)
Raptor
21
Presto use-cases at Facebook (semi realtime)
Raptor
Loader
Client
Presto
Flash
Presto
Flash
Presto
Flash
Presto
Flash
P...
22
Presto use-cases at Facebook (semi realtime)
Raptor
Loader
Client
Presto
Flash
Presto
Flash
Presto
Flash
Presto
Flash
P...
23
Presto use-cases at Facebook (semi realtime)
Extra features
➔ Physical data reorganization
➔ Fully fledged and atomic D...
24
➔ Data stays in memory during execution and is pipelined across nodes MPP-
style
➔ Vectorized columnar processing
➔ Pre...
25
www.github.com/facebook/presto
www.github.com/prestodb
Certified Distro: www.teradata.com/presto
Website: www.prestodb....
26
Wojciech.Biela@teradata.com
Lukasz.Osipiuk@teradata.com
of 26

Presto - Analytical Database. Overview and use cases.

Presented at allegro.tech Data Science meet-up in Warsaw on Dec 16th 2015. http://www.meetup.com/allegrotech/events/227110112
Published on: Mar 4, 2016
Published in: Data & Analytics      
Source: www.slideshare.net


Transcripts - Presto - Analytical Database. Overview and use cases.

  • 1. 11 Presto - Analytical Database Wojciech Biela Łukasz Osipiuk https://prestodb.io
  • 2. 2 Who are we? Center for Hadoop
  • 3. 3 History of Presto FALL 2012 6 developers start Presto development FALL 2014 88 Releases 41 Contributors 3943 Commits FALL 2015 132 Releases 105 Contributors 6300 Commits --------- Teradata part of Presto community & offers support SPRING 2013 Presto rolled out within Facebook FALL 2013 Facebook open sources Presto FALL 2008 Facebook open sources Hive
  • 4. 4 ➔ 100% open source distributed ANSI SQL engine for Big Data ➔ Optimized for low latency, Interactive querying ◆ Cross platform query capability, not only SQL on Hadoop ◆ Distributed under the Apache license, now supported by Teradata ◆ Used by a community of well known, well respected technology companies ◆ Modern code base ◆ Proven scalability What is Presto?
  • 5. 5 High level architecture Data stream API Worker Data stream API Worker Coordinator Metadata API Parser/ analyzer Planner Scheduler Worker Client Data location API Pluggable
  • 6. 6 Plan execution Hive Presto map reduce I/O I/O I/O I/O I/O task task task task task task task I/O
  • 7. 7 Presto Extensibility – connector interfaces Parser/ analyzer Planner Worker Data location API Hive Cassandra Kafka MySQL … Metadata API Hive Cassandra Kafka MySQL … Data stream API Hive Cassandra Kafka MySQL … Scheduler Coordinator
  • 8. 8 Presto Extensibility – plugins ➔ Connectors ➔ Data types ➔ Extra functions ➔ Security providers
  • 9. 9 ➔ Facebook ◆ Multiple production clusters (100s of nodes total) ● Including 300PB Hadoop data warehouse ● Single cluster size order of 10s of nodes ◆ 1000s of internal daily active users ◆ Millions of queries each month ◆ Multiple PBs scanned every day ◆ Trillions of rows a day ◆ ORC format ➔ Netflix ◆ Over 250-node production cluster on EC2 ◆ Over 15 PB in S3 (Parquet format) ◆ Over 300 users and 2.5K queries daily ◆ presto-cli, R, Python, BI tools ◆ 50% queries under 4s Some usage facts
  • 10. 10 Netflix Data Pipeline Suro / Kafka Cassandra AegisthusUrsula Amazon S3 TVs mobile laptop dimensionsevents TD TVs mobile laptopTVs mobile laptop
  • 11. 11 Presto use-cases at Facebook ➔ three use cases ◆ Data warehouse - big data ◆ User facing - small data ◆ User facing - medium data
  • 12. 12 Presto use-cases at Facebook (data warehouse) HDFS data warehouse
  • 13. 13 Presto use-cases at Facebook (data warehouse) ➔ Multiple clusters ➔ O(103 ) of users ➔ O(106 ) queries per month ➔ petabytes of data scanned every day ➔ 100s of concurrent queries
  • 14. 14 Presto use-cases at Facebook (data warehouse) Loader Client Presto Data Node Presto Data Node M/R Data Node M/R Data Node Presto Data Node Presto Hive
  • 15. 15 Presto use-cases at Facebook (data warehouse) Client Presto Presto Dispatcher Presto Presto Presto Presto Presto
  • 16. 16 Presto use-cases at Facebook (realtime) Real time user facing
  • 17. 17 Presto use-cases at Facebook (realtime) Requirements ➔ User facing ➔ 0.1-5 seconds latency ➔ Support for data updates ➔ highly available ➔ 10-15 way joins
  • 18. 18 Presto use-cases at Facebook (realtime) Loader Client mysql Presto Presto Presto mysql mysql mysql mysql
  • 19. 19 Presto use-cases at Facebook (semi realtime) Requirements ➔ Large data sets (smaller than warehouse) ➔ seconds to minutes latency ➔ predictable performance ➔ 5-15 minutes load latency ➔ 100s concurrent queries
  • 20. 20 Presto use-cases at Facebook (semi realtime) Raptor
  • 21. 21 Presto use-cases at Facebook (semi realtime) Raptor Loader Client Presto Flash Presto Flash Presto Flash Presto Flash Presto mysql Kafka Kafka Kafka Kafka Loader Gluster Gluster backup tier
  • 22. 22 Presto use-cases at Facebook (semi realtime) Raptor Loader Client Presto Flash Presto Flash Presto Flash Presto Flash Presto mysql Kafka Kafka Kafka Kafka Loader Gluster Gluster backup tier INSERT INTO raptor_table SELECT * from kafka_table where token BETWEEN ${last_token} AND ${next_token} MARK LOAD in PROGRESS in MySQL
  • 23. 23 Presto use-cases at Facebook (semi realtime) Extra features ➔ Physical data reorganization ➔ Fully fledged and atomic DDL ➔ Atomic data loading ➔ Tiered architecture
  • 24. 24 ➔ Data stays in memory during execution and is pipelined across nodes MPP- style ➔ Vectorized columnar processing ➔ Presto is written in highly tuned Java ◆ Efficient in-memory data structures ◆ Very careful coding of inner loops ◆ Bytecode generation ➔ Optimized ORC reader ➔ Predicates push-down ➔ Query optimizer Presto = Performance
  • 25. 25 www.github.com/facebook/presto www.github.com/prestodb Certified Distro: www.teradata.com/presto Website: www.prestodb.io Presto : User’s Group: www.groups.google.com/group/presto-users Interested in joining Teradata? ● Presto development ● other Hadoop related development and consulting contact our Recruitment Partner: Renata Rosłoniec (VBC) tel. 514 035 237, renata.rosloniec@vbconsulting.pl How can I contribute?
  • 26. 26 Wojciech.Biela@teradata.com Lukasz.Osipiuk@teradata.com

Related Documents