SQL, NoSQL, NewSQL?Whats a developer to do? Chris Richardson Author of POJOs in Action Founder of CloudFoundry.com ...
Overall presentation goal The joy and pain of building Java applications that use NoSQL and NewSQL 2
About Chris 3
(About Chris) 4
About Chris() 5
About Chris 6
About Chris http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/ 7
About Chris Developer Advocate for CloudFoundry.com Signup at CloudFoundry.com using promo code JFokus ...
Agendao  Why NoSQL? NewSQL?o  Persisting entitieso  Implementing queries 2/14/12 Copyright (c) 2011 Chris Richard...
Food to Goo  Take-out food delivery serviceo  “Launched” in 2006o  Used a relational database (naturally) ...
Success è Growth challengeso  Increasing traffico  Increasing data volumeo  Distribute across a few data centerso  In...
Limitations of relational databaseso  Scalingo  Distributiono  Updating schemao  O/R impedance mismatcho  Handling se...
Solution: Spend Money o  Buy SSD and RAM ...
Solution: Use NoSQL Benefits Higher Limited performance transaction...
MongoDBo  Document-oriented database n  JSON-style documents: Lists, Maps, primitives n  Schema-lesso  Transaction =...
Apache Cassandrao  Column-oriented database/Extensible row store n  Think Row ~= java.util.SortedMapo  Transaction ...
Other NoSQL databasesType ExamplesExtensible columns/Column- Hbaseoriented ...
Solution: Use NewSQLo  Relational databases with SQL and ACID transactions ANDo ...
Stonebraker’s motivations “…Current databases are designed for 1970s hardware ...
About VoltDBo  Open-sourceo  In-memory relational databaseo  Durability thru replication; snapshots and loggingo  T...
The future is polyglot persistence e.g. Netflix ...
Spring Data is here to help For NoSQL databaseshttp://www.springsource...
Agendao  Why NoSQL? NewSQL?o  Persisting entitieso  Implementing queries 2/14/12 Copyright (c) 2011 Chris Richar...
Food to Go – Place Order use case1.  Customer enters delivery address and delivery time2.  System displays available re...
Food to Go – Domain model (partial)class Restaurant { class TimeRange { long id; lo...
Database schemaID Name … RESTAU...
How to implement the repository? public interface AvailableRestaurantRepository { void add(Restaurant restau...
MongoDB: persisting restaurants is easy Server ...
Spring Data for Mongo code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl implements AvailableRe...
Spring Configuration@Configurationpublic class MongoConfig extends AbstractDatabaseConfig { @Value("#{mongoDbProperties.da...
Cassandra data model Column Column Row Name Value Key ...
Cassandra– inserting/updating data Column Family ...
Cassandra– retrieving data Column Family K1 N1 V1 TS...
Option #1: Use a column per attribute Column Name = path/expression to access property value ...
Option #2: Use a single column Column value = serialized object graph, e.g. JSON ...
Cassandra codepublic class AvailableRestaurantRepositoryCassandraKeyImpl implements AvailableRestaurantRepository...
Using VoltDBo  Use the original schemao  Standard SQL statements BUT YOU MUSTo  Write stored procedur...
About VoltDB stored procedureso  Key part of VoltDBo  Replication = executing stored procedure on replicao  Logging ...
About partitioning Partition column RESTAURANT table ID ...
Example cluster Partition 1a Partition 2a Partition 3a ...
Single partition procedure: FAST SELECT * FROM RESTAURANT WHERE ID = 1 High-performance lock free code ID Name … ...
Multi-partition procedure: SLOWER SELECT * FROM RESTAURANT WHERE NAME = ‘Ajanta’ Communication/Coordi...
Chosen partitioning scheme<partitions> <partition table="restaurant" column="id"/> <partition table="service_area"...
Stored procedure – AddRestaurant@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”)public class AddRest...
VoltDb repository – add()@Repositorypublic class AvailableRestaurantRepositoryVoltdbImpl implements AvailableRes...
VoltDbTemplate wrapper classpublic class VoltDbTemplate { private Client client; VoltDB client API public Volt...
VoltDb server configuration<?xml version="1.0"?> <deployment><project> <info> ...
Agendao  Why NoSQL? NewSQL?o  Persisting entitieso  Implementing queries 2/14/12 Copyright (c) 2011 Chris Richard...
Finding available restaurantsAvailable restaurants = Serve the zip code of the delivery addressAND Are open at t...
Finding available restaurants on Monday,6.15pm for 94619 zipselect r.* Straightforwardfrom restaurant r t...
MongoDB = easy to query{ serviceArea:"94619", Find a openingHours: { $elemMatch : { ...
MongoTemplate-based code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl imp...
BUT how to do this with Cassandra??!o  How can Cassandra support a query that has ? n  A 3-...
Simplification #1: DenormalizationRestaurant_id Day_of_week Open_time Close_time Zip_code1 Monday ...
Simplification #2: Application filteringSELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Mond...
Simplification #3: Eliminate multiple =’s withconcatenation Restaurant_id Zip_dow Open_time Close_time 1 ...
Column family with composite column names as an index Restaurant_id Zip_dow Open_time C...
Querying with a slice Column Family: AvailableRestaurants ...
Needs a few pages of code private void insertAvailability(Restaurant restaurant) { for (String zipC...
What did I just do to query the data? 60
Mongo vs. Cassandra DC1 DC2 Shard A Ma...
VoltDB - attempt #1@ProcInfo( singlePartition = false)public class FindAvailableRestaurants extends VoltProcedure { ... }E...
VoltDB - attempt #2@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”)public class AddRestaurant extend...
VoltDB - attempt #3<partitions> ... <partition table="available_time_range" column="zip_code"/></partitions>@ProcInfo(...
Summary…o  Relational databases are great BUT there are limitationso  Each NoSQL database solves some problems BUT...
… Summaryo  Very carefully pick the NewSQL/ NoSQL DB for your applicationo  Consider a polyglot persistence archit...
Thank you! Signup at CloudFoundry.com using promo code JFokus My contact info: chris.richardson@spri...
of 67

SQL, NoSQL, NewSQL? What's a developer to do?

The database world is undergoing a major upheaval. NoSQL databases such as MongoDB and Cassandra are emerging as a compelling choice for many applications. They can simplify the persistence of complex data models and offering significantly better scalability and performance. But these databases have a very different and unfamiliar data model and APIs as well as a limited transaction model. Moreover, the relational world is fighting back with so-called NewSQL databases such as VoltDB, which by using a radically different architecture offers high scalability and performance as well as the familiar relational model and ACID transactions. Sounds great but unlike the traditional relational database you can't use JDBC and must partition your data.In this presentation you will learn about popular NoSQL databases - MongoDB, and Cassandra - as well at VoltDB. We will compare and contrast each database's data model and Java API using NoSQL and NewSQL versions of a use case from the book POJOs in Action. We will learn about the benefits and drawbacks of using NoSQL and NewSQL databases.
Published on: Mar 4, 2016
Published in: Technology      
Source: www.slideshare.net


Transcripts - SQL, NoSQL, NewSQL? What's a developer to do?

  • 1. SQL, NoSQL, NewSQL?Whats a developer to do? Chris Richardson Author of POJOs in Action Founder of CloudFoundry.com chris.richardson@springsource.com @crichardson
  • 2. Overall presentation goal The joy and pain of building Java applications that use NoSQL and NewSQL 2
  • 3. About Chris 3
  • 4. (About Chris) 4
  • 5. About Chris() 5
  • 6. About Chris 6
  • 7. About Chris http://www.theregister.co.uk/2009/08/19/springsource_cloud_foundry/ 7
  • 8. About Chris Developer Advocate for CloudFoundry.com Signup at CloudFoundry.com using promo code JFokus 8
  • 9. Agendao  Why NoSQL? NewSQL?o  Persisting entitieso  Implementing queries 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 9
  • 10. Food to Goo  Take-out food delivery serviceo  “Launched” in 2006o  Used a relational database (naturally) 10
  • 11. Success è Growth challengeso  Increasing traffico  Increasing data volumeo  Distribute across a few data centerso  Increasing domain model complexity
  • 12. Limitations of relational databaseso  Scalingo  Distributiono  Updating schemao  O/R impedance mismatcho  Handling semi-structured data 12
  • 13. Solution: Spend Money o  Buy SSD and RAM o  Buy Oracle o  Buy high-end servers o  … ORhttp://upload.wikimedia.org/wikipedia/commons/e/e5/Rising_Sun_Yacht.JPGo  Hire more DevOpso  Use application-level shardingo  Build your own middlewareo  … http://www.trekbikes.com/us/en/bikes/road/race_performance/madone_5_series/madone_5_2/# 13
  • 14. Solution: Use NoSQL Benefits Higher Limited performance transactions Higher scalability Relaxed Richer data- consistency model Unconstrained Schema-less data Drawbacks 14
  • 15. MongoDBo  Document-oriented database n  JSON-style documents: Lists, Maps, primitives n  Schema-lesso  Transaction = update of a single documento  Rich query language for dynamic querieso  Tunable writes: speed ó reliabilityo  Highly scalable and availableo  Use cases n  High volume writes n  Complex data n  Semi-structured data 15
  • 16. Apache Cassandrao  Column-oriented database/Extensible row store n  Think Row ~= java.util.SortedMapo  Transaction = update of a rowo  Fast writes = append to a logo  Tunable reads/writes: consistency ó latency/ availabilityo  Extremely scalable n  Transparent and dynamic clustering n  Rack and datacenter aware data replicationo  CQL = “SQL”-like DDL and DMLo  Use cases n  Big data n  Multiple Data Center distributed database n  (Write intensive) Logging n  High-availability (writes) 16
  • 17. Other NoSQL databasesType ExamplesExtensible columns/Column- Hbaseoriented SimpleDB DynamoDBGraph Neo4jKey-value Redis MembaseDocument CouchDb http://nosql-database.org/ lists 122+ NoSQL databases 17
  • 18. Solution: Use NewSQLo  Relational databases with SQL and ACID transactions ANDo  New and improved architectureo  Radically better scalability and performanceo  NewSQL vendors: ScaleDB, NimbusDB, …, VoltDB 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 18
  • 19. Stonebraker’s motivations “…Current databases are designed for 1970s hardware …” Stonebraker: http://www.slideshare.net/VoltDB/sql-myths-webinar Significant overhead in “…logging, latching, locking, B-tree, and buffer management operations…” SIGMOD 08: Though the looking glass: http://dl.acm.org/citation.cfm?id=1376713 19
  • 20. About VoltDBo  Open-sourceo  In-memory relational databaseo  Durability thru replication; snapshots and loggingo  Transparent partitioningo  Fast and scalable …VoltDB is very scalable; it should scale to 120 partitions, 39 servers, and 1.6 million complex transactions per second at over 300 CPU cores… http://www.mysqlperformanceblog.com/2011/02/28/is-voltdb-really-as-scalable-as-they-claim/ 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 20
  • 21. The future is polyglot persistence e.g. Netflix •  RDBMS •  SimpleDB •  Cassandra •  Hadoop/Hbase IEEE Software Sept/October 2010 - Debasish Ghosh / Twitter @debasishg 21
  • 22. Spring Data is here to help For NoSQL databaseshttp://www.springsource.org/spring-data 22
  • 23. Agendao  Why NoSQL? NewSQL?o  Persisting entitieso  Implementing queries 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 23
  • 24. Food to Go – Place Order use case1.  Customer enters delivery address and delivery time2.  System displays available restaurants3.  Customer picks restaurant4.  System displays menu5.  Customer selects menu items6.  Customer places order 24
  • 25. Food to Go – Domain model (partial)class Restaurant { class TimeRange { long id; long id; String name; int dayOfWeek; Set<String> serviceArea; int openingTime; Set<TimeRange> openingHours; int closingTime; List<MenuItem> menuItems; }} class MenuItem { String name; double price; } 25
  • 26. Database schemaID Name … RESTAURANT1 Ajanta table2 Montclair EggshopRestaurant_id zipcode RESTAURANT_ZIPCODE1 94707 table1 946192 946112 94619 RESTAURANT_TIME_RANGE tableRestaurant_id dayOfWeek openTime closeTime1 Monday 1130 14301 Monday 1730 21302 Tuesday 1130 … 26
  • 27. How to implement the repository? public interface AvailableRestaurantRepository { void add(Restaurant restaurant); Restaurant findDetailsById(int id); … } Restaurant ó ? TimeRange MenuItemRestaurant aggregate 27
  • 28. MongoDB: persisting restaurants is easy Server Database: Food To Go Collection: Restaurants { "_id" : ObjectId("4bddc2f49d1505567c6220a0") "name": "Ajanta", "serviceArea": ["94619", "99999"], BSON = "openingHours": [ { binary "dayOfWeek": 1, JSON "open": 1130, "close": 1430 }, { Sequence "dayOfWeek": 2, "open": 1130, of bytes on "close": 1430 disk è fast }, … ] i/o } 28
  • 29. Spring Data for Mongo code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl implements AvailableRestaurantRepository { public static String AVAILABLE_RESTAURANTS_COLLECTION = "availableRestaurants"; @Autowired private MongoTemplate mongoTemplate; @Override public void add(Restaurant restaurant) { mongoTemplate.insert(restaurant, AVAILABLE_RESTAURANTS_COLLECTION); } @Override public Restaurant findDetailsById(int id) { return mongoTemplate.findOne(new Query(where("_id").is(id)), Restaurant.class, AVAILABLE_RESTAURANTS_COLLECTION); }} 29
  • 30. Spring Configuration@Configurationpublic class MongoConfig extends AbstractDatabaseConfig { @Value("#{mongoDbProperties.databaseName}") private String mongoDbDatabase;@Beanpublic Mongo mongo() throws UnknownHostException, MongoException { return new Mongo(databaseHostName);} @Bean public MongoTemplate mongoTemplate(Mongo mongo) throws Exception { MongoTemplate mongoTemplate = new MongoTemplate(mongo, mongoDbDatabase); mongoTemplate.setWriteConcern(WriteConcern.SAFE); mongoTemplate.setWriteResultChecking(WriteResultChecking.EXCEPTION); return mongoTemplate; }} 30
  • 31. Cassandra data model Column Column Row Name Value Key Timestamp Keyspace Column Family K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 K2 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3Column name/value: number, string, Boolean, timestamp, and composite 31
  • 32. Cassandra– inserting/updating data Column Family K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 …Idempotent= transaction CF.insert(key=K1, (N4, V4, TS4), …) Column Family K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4 … 32
  • 33. Cassandra– retrieving data Column Family K1 N1 V1 TS1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4… CF.slice(key=K1, startColumn=N2, endColumn=N4) K1 N2 V2 TS2 N3 V3 TS3 N4 V4 TS4 Cassandra has secondary indexes but they aren’t helpful for these use cases 33
  • 34. Option #1: Use a column per attribute Column Name = path/expression to access property value Column Family: RestaurantDetails openingHours[0].dayOfWeek Monday name Ajanta serviceArea[0] 946191 openingHours[0].open 1130 type indian serviceArea[1] 94707 openingHours[0].close 1430 Egg openingHours[0].dayOfWeek Monday name serviceArea[0] 94611 shop2 Break openingHours[0].open 0830 type serviceArea[1] 94619 Fast openingHours[0].close 1430
  • 35. Option #2: Use a single column Column value = serialized object graph, e.g. JSON Column Family: RestaurantDetails 2 attributes: { name: “Montclair Eggshop”, … } 1 attributes { name: “Ajanta”, …} 2 attributes { name: “Eggshop”, …} ✔ 35
  • 36. Cassandra codepublic class AvailableRestaurantRepositoryCassandraKeyImpl implements AvailableRestaurantRepository {@Autowired Home grownprivate final CassandraTemplate cassandraTemplate; wrapper classpublic void add(Restaurant restaurant) { cassandraTemplate.insertEntity(keyspace, RESTAURANT_DETAILS_CF, restaurant);}public Restaurant findDetailsById(int id) { String key = Integer.toString(id); return cassandraTemplate.findEntity(Restaurant.class, keyspace, key, RESTAURANT_DETAILS_CF); …}… http://en.wikipedia.org/wiki/Hector 36
  • 37. Using VoltDBo  Use the original schemao  Standard SQL statements BUT YOU MUSTo  Write stored procedures and invoke them using proprietary interfaceo  Partition your data 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 37
  • 38. About VoltDB stored procedureso  Key part of VoltDBo  Replication = executing stored procedure on replicao  Logging = log stored procedure invocationo  Stored procedure invocation = transaction 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 38
  • 39. About partitioning Partition column RESTAURANT table ID Name … 1 Ajanta 2 Eggshop … 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 39
  • 40. Example cluster Partition 1a Partition 2a Partition 3a ID Name … ID Name … ID Name … 1 Ajanta 2 Eggshop … .. … … … Partition 3b Partition 1b Partition 2b ID Name … ID Name … ID Name … … .. 1 Ajanta 2 Eggshop … … …VoltDB Server 1 VoltDB Server 2 VoltDB Server 3 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 40
  • 41. Single partition procedure: FAST SELECT * FROM RESTAURANT WHERE ID = 1 High-performance lock free code ID Name … ID Name … ID Name … 1 Ajanta 1 Eggshop … .. … … … … … …VoltDB Server 1 VoltDB Server 2 VoltDB Server 3 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 41
  • 42. Multi-partition procedure: SLOWER SELECT * FROM RESTAURANT WHERE NAME = ‘Ajanta’ Communication/Coordination overhead ID Name … ID Name … ID Name … 1 Ajanta 1 Eggshop … .. … … … … … …VoltDB Server 1 VoltDB Server 2 VoltDB Server 3 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 42
  • 43. Chosen partitioning scheme<partitions> <partition table="restaurant" column="id"/> <partition table="service_area" column="restaurant_id"/> <partition table="menu_item" column="restaurant_id"/> <partition table="time_range" column="restaurant_id"/> <partition table="available_time_range" column="restaurant_id"/></partitions> Performance is excellent: much faster than MySQL 43
  • 44. Stored procedure – AddRestaurant@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”)public class AddRestaurant extends VoltProcedure { public final SQLStmt insertRestaurant = new SQLStmt("INSERT INTO Restaurant VALUES (?,?);"); public final SQLStmt insertServiceArea = new SQLStmt("INSERT INTO service_area VALUES (?,?);"); public final SQLStmt insertOpeningTimes = new SQLStmt("INSERT INTO time_range VALUES (?,?,?,?);"); public final SQLStmt insertMenuItem = new SQLStmt("INSERT INTO menu_item VALUES (?,?,?);");public long run(int id, String name, String[] serviceArea, long[] daysOfWeek, long[] openingTimes, long[] closingTimes, String[] names, double[] prices) { voltQueueSQL(insertRestaurant, id, name); for (String zipCode : serviceArea) voltQueueSQL(insertServiceArea, id, zipCode); for (int i = 0; i < daysOfWeek.length ; i++) voltQueueSQL(insertOpeningTimes, id, daysOfWeek[i], openingTimes[i], closingTimes[i]); for (int i = 0; i < names.length ; i++) voltQueueSQL(insertMenuItem, id, names[i], prices[i]); voltExecuteSQL(true); return 0; }} 44
  • 45. VoltDb repository – add()@Repositorypublic class AvailableRestaurantRepositoryVoltdbImpl implements AvailableRestaurantRepository { @Autowired private VoltDbTemplate voltDbTemplate; @Override public void add(Restaurant restaurant) { invokeRestaurantProcedure("AddRestaurant", restaurant); } private void invokeRestaurantProcedure(String procedureName, Restaurant restaurant) { Object[] serviceArea = restaurant.getServiceArea().toArray(); long[][] openingHours = toArray(restaurant.getOpeningHours()); Flatten Object[][] menuItems = toArray(restaurant.getMenuItems()); Restaurant voltDbTemplate.update(procedureName, restaurant.getId(), restaurant.getName(), serviceArea, openingHours[0], openingHours[1], openingHours[2], menuItems[0], menuItems[1]);} 45
  • 46. VoltDbTemplate wrapper classpublic class VoltDbTemplate { private Client client; VoltDB client API public VoltDbTemplate(Client client) { this.client = client; } public void update(String procedureName, Object... params) { try { ClientResponse x = client.callProcedure(procedureName, params); … } catch (Exception e) { throw new RuntimeException(e); } } 46
  • 47. VoltDb server configuration<?xml version="1.0"?> <deployment><project> <info> <cluster hostcount="1" <name>Food To Go</name> sitesperhost="5" kfactor="0" /> ... </info> </deployment> <database> <schemas> <schema path=schema.sql /> </schemas> <partitions> <partition table="restaurant" column="id"/> ... </partitions> <procedures> <procedure class=net.chrisrichardson.foodToGo.newsql.voltdb.procs.AddRestaurant /> ... </procedures> </database></project>voltcompiler target/classes src/main/resources/sql/voltdb-project.xml foodtogo.jarbin/voltdb leader localhost catalog foodtogo.jar deployment deployment.xml 47
  • 48. Agendao  Why NoSQL? NewSQL?o  Persisting entitieso  Implementing queries 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 48
  • 49. Finding available restaurantsAvailable restaurants = Serve the zip code of the delivery addressAND Are open at the delivery timepublic interface AvailableRestaurantRepository { List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime); …} 49
  • 50. Finding available restaurants on Monday,6.15pm for 94619 zipselect r.* Straightforwardfrom restaurant r three-way join inner join restaurant_time_range tr on r.id =tr.restaurant_id inner join restaurant_zipcode sa on r.id = sa.restaurant_idWhere ’94619’ = sa.zip_codeand tr.day_of_week=’monday’and tr.openingtime <= 1815and 1815 <= tr.closingtime 50
  • 51. MongoDB = easy to query{ serviceArea:"94619", Find a openingHours: { $elemMatch : { restaurant "dayOfWeek" : "Monday", "open": {$lte: 1815}, that serves } "close": {$gte: 1815} the 94619 zip} } code and is open atDBCursor cursor = collection.find(qbeObject);while (cursor.hasNext()) { 6.15pm on a DBObject o = cursor.next(); … Monday }db.availableRestaurants.ensureIndex({serviceArea: 1}) 51
  • 52. MongoTemplate-based code@Repositorypublic class AvailableRestaurantRepositoryMongoDbImpl implements AvailableRestaurantRepository {@Autowired private final MongoTemplate mongoTemplate;@Overridepublic List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime); Query query = new Query(where("serviceArea").is(deliveryAddress.getZip()) .and("openingHours”).elemMatch(where("dayOfWeek").is(dayOfWeek) .and("openingTime").lte(timeOfDay) .and("closingTime").gte(timeOfDay))); return mongoTemplate.find(AVAILABLE_RESTAURANTS_COLLECTION, query, AvailableRestaurant.class);} mongoTemplate.ensureIndex(“availableRestaurants”, new Index().on("serviceArea", Order.ASCENDING)); 52
  • 53. BUT how to do this with Cassandra??!o  How can Cassandra support a query that has ? n  A 3-way join n  Multiple = n  > and <è We need to implement an index Queries instead of data model drives NoSQL database design 53
  • 54. Simplification #1: DenormalizationRestaurant_id Day_of_week Open_time Close_time Zip_code1 Monday 1130 1430 947071 Monday 1130 1430 946191 Monday 1730 2130 947071 Monday 1730 2130 946192 Monday 0700 1430 94619… SELECT restaurant_id FROM time_range_zip_code WHERE day_of_week = ‘Monday’ Simpler query: AND zip_code = 94619 §  No joins §  Two = and two < AND 1815 < close_time AND open_time < 1815 54
  • 55. Simplification #2: Application filteringSELECT restaurant_id, open_time FROM time_range_zip_code WHERE day_of_week = ‘Monday’ Even simpler query AND zip_code = 94619 •  No joins AND 1815 < close_time •  Two = and one < AND open_time < 1815 55
  • 56. Simplification #3: Eliminate multiple =’s withconcatenation Restaurant_id Zip_dow Open_time Close_time 1 94707:Monday 1130 1430 1 94619:Monday 1130 1430 1 94707:Monday 1730 2130 1 94619:Monday 1730 2130 2 94619:Monday 0700 1430 … SELECT restaurant_id, open_time FROM time_range_zip_code WHERE zip_code_day_of_week = ‘94619:Monday’ AND 1815 < close_time key range 56
  • 57. Column family with composite column names as an index Restaurant_id Zip_dow Open_time Close_time 1 94707:Monday 1130 1430 1 94619:Monday 1130 1430 1 94707:Monday 1730 2130 1 94619:Monday 1730 2130 2 94619:Monday 0700 1430 … Column Family: AvailableRestaurants JSON FOR JSON FOR (1430,0700,2) (2130,1730,1)94619:Monday EGG AJANTA JSON FOR (1430,1130,1) AJANTA
  • 58. Querying with a slice Column Family: AvailableRestaurants JSON FOR JSON FOR (1430,0700,2) (2130,1730,1) EGG AJANTA94619:Monday JSON FOR (1430,1130,1) AJANTA slice(key= 94619:Monday, sliceStart = (1815, *, *), sliceEnd = (2359, *, *)) JSON FOR (2130,1730,1)94619:Monday AJANTA 18:15 is after 17:30 è {Ajanta} 58
  • 59. Needs a few pages of code private void insertAvailability(Restaurant restaurant) { for (String zipCode : (Set<String>) restaurant.getServiceArea()) {@Override for (TimeRange tr : (Set<TimeRange>) restaurant.getOpeningHours()) { public List<AvailableRestaurant> findAvailableRestaurants(Address deliveryAddress, Date deliveryTime) { String dayOfWeek = format2(tr.getDayOfWeek()); int dayOfWeek = DateTimeUtil.dayOfWeek(deliveryTime); String openingTime = format4(tr.getOpeningTime()); int timeOfDay = DateTimeUtil.timeOfDay(deliveryTime); String closingTime = format4(tr.getClosingTime()); String zipCode = deliveryAddress.getZip(); String key = formatKey(zipCode, format2(dayOfWeek)); String restaurantId = format8(restaurant.getId()); HSlicePredicate<Composite> predicate = new HSlicePredicate<Composite>(new CompositeSerializer()); String key = formatKey(zipCode, dayOfWeek); Composite start = new Composite(); String columnValue = toJson(restaurant); Composite finish = new Composite(); start.addComponent(0, format4(timeOfDay), ComponentEquality.GREATER_THAN_EQUAL); finish.addComponent(0, format4(2359), ComponentEquality.GREATER_THAN_EQUAL); Composite columnName = new Composite(); predicate.setRange(start, finish, false, 100); columnName.add(0, closingTime); final List<AvailableRestaurantIndexEntry> closingAfter = new ArrayList<AvailableRestaurantIndexEntry>(); columnName.add(1, openingTime); columnName.add(2, restaurantId); ColumnFamilyRowMapper<String, Composite, Object> mapper = new ColumnFamilyRowMapper<String, Composite, Object>() { @Override ColumnFamilyUpdater<String, Composite> updater public Object mapRow(ColumnFamilyResult<String, Composite> results) { = compositeCloseTemplate.createUpdater(key); for (Composite columnName : results.getColumnNames()) { String openTime = columnName.get(1, new StringSerializer()); updater.setString(columnName, columnValue); String restaurantId = columnName.get(2, new StringSerializer()); closingAfter.add(new AvailableRestaurantIndexEntry(openTime, restaurantId, results.getString(columnName))); } return null; } }; compositeCloseTemplate.update(updater); } compositeCloseTemplate.queryColumns(key, predicate, mapper); } List<AvailableRestaurant> result = new LinkedList<AvailableRestaurant>(); } for (AvailableRestaurantIndexEntry trIdAndAvailableRestaurant : closingAfter) { if (trIdAndAvailableRestaurant.isOpenBefore(timeOfDay)) result.add(trIdAndAvailableRestaurant.getAvailableRestaurant()); } return result; } 59
  • 60. What did I just do to query the data? 60
  • 61. Mongo vs. Cassandra DC1 DC2 Shard A Master Shard B MasterMongoDB Remote DC1 Client DC2 Client DC1 DC2 Async Cassandra Or CassandraCassandra Sync DC1 Client DC2 Client 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 61
  • 62. VoltDB - attempt #1@ProcInfo( singlePartition = false)public class FindAvailableRestaurants extends VoltProcedure { ... }ERROR 10:12:03,251 [main] COMPILER: Failed to plan for statementtype(findAvailableRestaurants_with_join) select r.* from restaurantr,time_range tr, service_area sa Where ? = sa.zip_code and r.id=tr.restaurant_id and r.id = sa.restaurant_id and tr.day_of_week=?and tr.open_time <= ? and ? <= tr.close_time Error: "Unable to planfor statement. Likely statement is joining two partitioned tables in amulti-partition statement. This is not supported at this time."ERROR 10:12:03,251 [main] COMPILER: Catalog compilation failed. Bummer! 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 62
  • 63. VoltDB - attempt #2@ProcInfo( singlePartition = true, partitionInfo = "Restaurant.id: 0”)public class AddRestaurant extends VoltProcedure { public final SQLStmt insertAvailable= new SQLStmt("INSERT INTO available_time_range VALUES (?,?,?, ?, ?, ?);"); public long run(....) { ... for (int i = 0; i < daysOfWeek.length ; i++) { voltQueueSQL(insertOpeningTimes, id, daysOfWeek[i], openingTimes[i], closingTimes[i]); for (String zipCode : serviceArea) { voltQueueSQL(insertAvailable, id, daysOfWeek[i], openingTimes[i], closingTimes[i], zipCode, name); } } ... public final SQLStmt findAvailableRestaurants_denorm = new SQLStmt( voltExecuteSQL(true); "select restaurant_id, name from available_time_range tr " + return 0; "where ? = tr.zip_code " + } "and tr.day_of_week=? " +} "and tr.open_time <= ? " + " and ? <= tr.close_time "); Works but queries are only slightly faster than MySQL! 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 63
  • 64. VoltDB - attempt #3<partitions> ... <partition table="available_time_range" column="zip_code"/></partitions>@ProcInfo( singlePartition = false, ...)public class AddRestaurant extends VoltProcedure { ... }@ProcInfo( singlePartition = true, partitionInfo = "available_time_range.zip_code: 0")public class FindAvailableRestaurants extends VoltProcedure { ... }Queries are really fast but inserts are not LPartitioning scheme – optimal for some usecases but not others 64
  • 65. Summary…o  Relational databases are great BUT there are limitationso  Each NoSQL database solves some problems BUT n  Limited transactions: NoSQL = NoACID n  One day needing ACID è major rewrite n  Query-driven, denormalized database design n  …o  NewSQL databases such as VoltDB provides SQL, ACID transactions and incredible performance BUT n  Not all operations are fast n  Non-JDBC API 65
  • 66. … Summaryo  Very carefully pick the NewSQL/ NoSQL DB for your applicationo  Consider a polyglot persistence architectureo  Encapsulate your data access code so you can switcho  Startups = avoid NewSQL/NoSQL for shorter time to market? 2/14/12 Copyright (c) 2011 Chris Richardson. All rights reserved. Slide 66
  • 67. Thank you! Signup at CloudFoundry.com using promo code JFokus My contact info: chris.richardson@springsource.com @crichardson 67

Related Documents