Polyglottany Is Nota Sin Eric Lubow @elubow elubow@simplereach.com #MongoBoston
Overview• SimpleReach• Definitions and Data Stores• Evolution to Polyglottany• Tie It Together• Final Thoughts• ...
Socially IntelligentPolyglottany Is Not A Sin Eric Lubow @elubow
Size• 150m events recorded per day and growing• 600m Pageviews per month and growing Polyglottany Is Not...
Polyglot PersistencePolyglot Persistence, like polyglot programming, is all aboutchoosing the right persistence option for...
Right Tool For The JobPolyglottany Is Not A Sin Eric Lubow @elubow
Decisions. Decisions. • Is the• Wha...
No One Size Fits AllPolyglottany Is Not A Sin Eric Lubow @elubow
Tools C*Polyglottany Is Not A Sin Eric Lubow @elubow
Free vs. CostPolyglottany Is Not A Sin Eric Lubow @elubow
LanguagesPolyglottany Is Not A Sin Eric Lubow @elubow
Pre-ScalePolyglottany Is Not A Sin Eric Lubow @elubow
SimpleReach Pre-ScalePolyglottany Is Not A Sin Eric Lubow @elubow
ScalePolyglottany Is Not A Sin Eric Lubow @elubow
SimpleReach C*Polyglottany Is Not A Sin Eric Lubow @elubow
Mongo ConferencePolyglottany Is Not A Sin Eric Lubow @elubow
Cassandra C*• Large data volume ingestion at high velocity• Real...
MongoDB• Fast atomic increments (Node.js is native JSON)• Sharding• Solid ORM for Rails (MongoID)• Fast access for...
Redis• Supports hundreds of thousands transactions per second• Great caching engine• Supports useful variable types ...
Infobright• Works with standard MySQL driver• Column Stores for ad-hoc analytics queries in SQL• Databases built for...
Ruby, Node.js, Python• Polyglottany doesn’t only apply to data stores• Each language has its own benefit to each data ...
ChoicePolyglottany Is Not A Sin Eric Lubow @elubow
Cons• Redis - Can only utilize a single core. SerDe price.• MySQL Column Store - DELETE/UPDATEs are VERY expensive• ...
Tying It TogetherEven with the right tools, 80% of the work of building abig data system is acquiring and refining the raw...
Tying It TogetherPolyglottany Is Not A Sin Eric Lubow @elubow
Tying It Together• Service Oriented Architecture (Internal API)• Data accuracy checks: visual and programmatic• Buil...
Service Architecture Analytics C* Real-time C* Internal APIPolyglottany Is No...
Distributed Architecture US-EAST-1a US-EAST-1b US-EAST-1e CASSANDRA-0001 CASS...
Points To Consider• Data consistency - Same in all data stores• How important is data durability?• Managing many ser...
Expertise• What happens when you need help?• How do you become experts?• What happens when you need more experts? ...
Summary• Polyglottany is not a sin• Know your data read/write patterns• Know the tools available to you• Know y...
We’re HiringPolyglottany Is Not A Sin Eric Lubow @elubow
Questions are guaranteed in life.Answers aren’t. Eric Lubow @elubow elubow@simpl...
of 33

Polyglottany Is Not A Sin

Published on: Mar 4, 2016
Source: www.slideshare.net


Transcripts - Polyglottany Is Not A Sin

  • 1. Polyglottany Is Nota Sin Eric Lubow @elubow elubow@simplereach.com #MongoBoston
  • 2. Overview• SimpleReach• Definitions and Data Stores• Evolution to Polyglottany• Tie It Together• Final Thoughts• Questions Polyglottany Is Not A Sin Eric Lubow @elubow
  • 3. Socially IntelligentPolyglottany Is Not A Sin Eric Lubow @elubow
  • 4. Size• 150m events recorded per day and growing• 600m Pageviews per month and growing Polyglottany Is Not A Sin Eric Lubow @elubow
  • 5. Polyglot PersistencePolyglot Persistence, like polyglot programming, is all aboutchoosing the right persistence option for the task at hand. http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence Polyglottany Is Not A Sin Eric Lubow @elubow
  • 6. Right Tool For The JobPolyglottany Is Not A Sin Eric Lubow @elubow
  • 7. Decisions. Decisions. • Is the• What are my query patterns? • Are my display requirements • How fault tolerant is the system? encryption/authentication/authoriz for realtime data? ation support sufficient for my Is my data ingestion high volume/high What supporting tools do I need? Tech• • needs? Data velocity? • Do I need to aggregate data on the fly? • Is there support for my language? • Are there monitoring• Am I batch loading data? architectures already built? • Is my data structured or• Am I write heavy or read heavy? unstructured? • Are there best practices guides already• Are data relationships important? • Does my data lend itself to a specific design pattern? • Will the data need to be• Does my data need to be distributed? immediately available everywhere? Data Tech Financial Other• Am I cloud based? Do I have legal requirements (HIPAA/FIPS/Sarbanes Oxley/PII)? Other • Financial• Am I hardware based? • What kind of enterprise support is available?• Am I a cloud/iron hybrid? • What is the community like?• How much am I willing to spend? • Does the product roadmap pertain to my roadmap?• How much am I willing to spend if something goes wrong? Polyglottany Is Not A Sin Eric Lubow @elubow
  • 8. No One Size Fits AllPolyglottany Is Not A Sin Eric Lubow @elubow
  • 9. Tools C*Polyglottany Is Not A Sin Eric Lubow @elubow
  • 10. Free vs. CostPolyglottany Is Not A Sin Eric Lubow @elubow
  • 11. LanguagesPolyglottany Is Not A Sin Eric Lubow @elubow
  • 12. Pre-ScalePolyglottany Is Not A Sin Eric Lubow @elubow
  • 13. SimpleReach Pre-ScalePolyglottany Is Not A Sin Eric Lubow @elubow
  • 14. ScalePolyglottany Is Not A Sin Eric Lubow @elubow
  • 15. SimpleReach C*Polyglottany Is Not A Sin Eric Lubow @elubow
  • 16. Mongo ConferencePolyglottany Is Not A Sin Eric Lubow @elubow
  • 17. Cassandra C*• Large data volume ingestion at high velocity• Really fast writes to many locations (eventual consistency)• Query by column groups within rows (slicing)• Opscenter• Data toolkit: more than a data storage layer• TTLs for small group aggregation• Wrote Helenus, Node.js driver for Cassandra Polyglottany Is Not A Sin Eric Lubow @elubow
  • 18. MongoDB• Fast atomic increments (Node.js is native JSON)• Sharding• Solid ORM for Rails (MongoID)• Fast access for pub/sub of durable/persisted documents• B-Tree Indexes• Document based via JSON• TTLs for ephemeral data Polyglottany Is Not A Sin Eric Lubow @elubow
  • 19. Redis• Supports hundreds of thousands transactions per second• Great caching engine• Supports useful variable types like sets, sorted set, lists• Everything is guaranteed to Memory Mapped (mmap)• Transactional and supports bulk operations• Centralized queueing and locking system Polyglottany Is Not A Sin Eric Lubow @elubow
  • 20. Infobright• Works with standard MySQL driver• Column Stores for ad-hoc analytics queries in SQL• Databases built for business intelligence• Heavy compression of data• Pre-aggregated data (Knowledge Grid) Polyglottany Is Not A Sin Eric Lubow @elubow
  • 21. Ruby, Node.js, Python• Polyglottany doesn’t only apply to data stores• Each language has its own benefit to each data storage layer• Each language has its own individual benefits• JSON, APIs, Performance Polyglottany Is Not A Sin Eric Lubow @elubow
  • 22. ChoicePolyglottany Is Not A Sin Eric Lubow @elubow
  • 23. Cons• Redis - Can only utilize a single core. SerDe price.• MySQL Column Store - DELETE/UPDATEs are VERY expensive• Cassandra - No btree indexes• Mongo - Indexes must fit in memory. Forced Replica ping times• Python - Whitespace. Community• Ruby - Not high performance enough for our standards• Javascript (Node.js) - Bad for CPU or IO intensive workloads Polyglottany Is Not A Sin Eric Lubow @elubow
  • 24. Tying It TogetherEven with the right tools, 80% of the work of building abig data system is acquiring and refining the raw data intousable data. Polyglottany Is Not A Sin Eric Lubow @elubow
  • 25. Tying It TogetherPolyglottany Is Not A Sin Eric Lubow @elubow
  • 26. Tying It Together• Service Oriented Architecture (Internal API)• Data accuracy checks: visual and programmatic• Built framework for testing out storage engines• Access to many toolsets (for all languages and DBs) Polyglottany Is Not A Sin Eric Lubow @elubow
  • 27. Service Architecture Analytics C* Real-time C* Internal APIPolyglottany Is Not A Sin Eric Lubow @elubow
  • 28. Distributed Architecture US-EAST-1a US-EAST-1b US-EAST-1e CASSANDRA-0001 CASSANDRA-0002 CASSANDRA-0003 CASSANDRA-0010 CASSANDRA-0011 CASSANDRA-0012 REDIS-0001A REDIS-0001B MYSQL-0001 MYSQL-0002 MONGO-SHARD-0000-A MONGO-SHARD-0000-B MONGO-SHARD-0001-B MONGO-SHARD-0001-A MONGO-SHARD-0002-B MONGO-SHARD-0002-A iAPI-0001 iAPI-0002 iAPI-0003Polyglottany Is Not A Sin Eric Lubow @elubow
  • 29. Points To Consider• Data consistency - Same in all data stores• How important is data durability?• Managing many servers (Chef, AWS, CSSH)• Managing and learning many different applications and tuning for them• Expertise Polyglottany Is Not A Sin Eric Lubow @elubow
  • 30. Expertise• What happens when you need help?• How do you become experts?• What happens when you need more experts? Polyglottany Is Not A Sin Eric Lubow @elubow
  • 31. Summary• Polyglottany is not a sin• Know your data read/write patterns• Know the tools available to you• Know your compromises• Expertise Polyglottany Is Not A Sin Eric Lubow @elubow
  • 32. We’re HiringPolyglottany Is Not A Sin Eric Lubow @elubow
  • 33. Questions are guaranteed in life.Answers aren’t. Eric Lubow @elubow elubow@simplereach.com #MongoBoston Thank you.

Related Documents