Polyglot Parallelism A Case Study in Using Erlang and Ruby at Rackspace
The Problem Part 1
20,000network devices
9 Datacenters, 3 Continents
devices not designed for high-throughput management
we need a highthroughput solution
the time spent in I/O is the primary bottleneck
if you want to speedthings up you have totalk to more devices in parallel
The Problem Part II
huge blobs of data
lots of backupsequals big database
ad-hoc searching is difficult but important
customer SLA meansneed to restore from backup quickly
an event must begenerated for eachdevice interaction
migrations areproblematic with that much data
rigid schema made adapting to new devices difficult
each device type hasdifferent properties
“backup” meansdifferent things for each device type
need to grow with the business
PreviousSolution
multiple Ruby apps
difficult to scale
vendor device managers
New Solution
the simplest thingthat could possibly work
most db writes come fromscheduled jobs
Other ClientsRails ReST API Erlang MongoDB Network Devices
Joe Armstrong
AXD301ATM Switch
99.9999999%
Functional
Dynamically Typed
Single Assignment
A = 1. %=> 1A = 2. %=> badmatch
[B, 2, C] = [1, 2, 3].B = 1. %=> 1C = 3. %=> 3
Immutable DataStructures
D = dict:new().D1 = dict:store(foo, 1, D).D2 = dict:store(bar, 2, D1).
Concurrency Oriented
-module(fact).-export([fac/1]).fac(0) -> 1;fac(N) -> N * fac(N-1).
-module(quicksort).-export([quicksort/1]).quicksort([]) -> [];quicksort([Pivot|Rest]) -> quicksort([Front || Front <- Res...
Details
jobs framework
Runner Callback Workers Module
Callback Runner Worker Module start readyitem process pr...
“behaviour” is interface
behaviour_info(callbacks) -> [ {init, 1}, {process_item, 3}, {worker_died, 5}, {job_stopping, 1}, {job_comp...
running({worker_ready, WorkerPid, ok}, S) -> case queue:out(S#state.items) of {empty, I2} -> stop_worker(WorkerPi...
handle_info({DOWN, _, process, WorkerPid, Info}, StateName, S) -> {Item, StartTime} = clear_worker(WorkerPid, ...
handle_cast({process, Item, StartTime, JS}, S) -> Callback = S#state.callback, Continue = try Callback:process_i...
story time
ReSTful API with WebmachineThe Convention Over Configuration Webserver
http://webmachine.basho.comHTTP Request Lifecycle Diagram
If you know HTTPWebmachine Is SimpleAs Proven by the “Number of Types of Things” Measurement of Complexity
The 3 Most Important Types of Things In Webmachine1. Dispatch Rules (pure data--barely a thing!)2. Resources (composed ...
Dispatch Rules { ["devices", server], device_resource, [] } GET /devices/12345 Webmachine inspects th...
Resources• POEM (Plain Old Erlang Module)• Composed of referentially transparent functions*• Functions are callbacks into ...
Resource Functions Perma-404resource_exists(Request, Context) -> {false, Request, Context}. ...
Requests• The first argument to each resource function• Set and read request & response data RemoteIP = wrq:peer(Req...
Retrieving a JSON Firewall Representationcontent_types_provided(Request, Context) -> Types = [{"application/json", to_j...
Gotchas
primitive obsession
string-ish “hi how are you” <<“hello there”>>[<<"easy as ">>, [$a, $b, $c], " ☺n"].
hashes vs records
to loop is human, to recur divine
Erlang conditionalsalways return a value
design for testability
don’t spawn,use OTP
Downsides
Erlang changes very slowly
3rd party libraries
standard librarycan be inconsistent
packagemanagement
Questions
http://spkr8.com/t/7806Phil: @philtolandhttp://github.com/tolandhttp://philtoland.comMike: @lifeinzemblahttp://github.com/...
Polyglot parallelism
Polyglot parallelism
of 76

Polyglot parallelism

Two years ago Rackspace had a problem: how do we backup 20K network devices, in 8 datacenters, across 3 continents, with less than a 1% failure rate -- every single day? Many solutions were tried and found wanting: a pure Perl solution, a vendor solution and then one in Ruby, none worked well enough. They not fast enough or they were not reliable enough, or they were not transparent enough when things went wrong. Now we all love Ruby but good Rubyists know that it is not always the best tool for the job. After re-examining the problem we decided to rewrite the application in a mixture of Erlang and Ruby. By exploiting the strengths of both -- Erlang's astonishing support for parallelism and Ruby's strengths in web development -- the problem was solved. In this talk we'll get down and dirty with the details: the problems we faced and how we solved them. We'll cover the application architecture, how Ruby and Erlang work together, and the Erlang approach to asynchronous operations (hint: it does not involve callbacks). So come on by and find out how you can get these two great languages to work together.
Published on: Mar 4, 2016
Published in: Technology      Business      
Source: www.slideshare.net


Transcripts - Polyglot parallelism

  • 1. Polyglot Parallelism A Case Study in Using Erlang and Ruby at Rackspace
  • 2. The Problem Part 1
  • 3. 20,000network devices
  • 4. 9 Datacenters, 3 Continents
  • 5. devices not designed for high-throughput management
  • 6. we need a highthroughput solution
  • 7. the time spent in I/O is the primary bottleneck
  • 8. if you want to speedthings up you have totalk to more devices in parallel
  • 9. The Problem Part II
  • 10. huge blobs of data
  • 11. lots of backupsequals big database
  • 12. ad-hoc searching is difficult but important
  • 13. customer SLA meansneed to restore from backup quickly
  • 14. an event must begenerated for eachdevice interaction
  • 15. migrations areproblematic with that much data
  • 16. rigid schema made adapting to new devices difficult
  • 17. each device type hasdifferent properties
  • 18. “backup” meansdifferent things for each device type
  • 19. need to grow with the business
  • 20. PreviousSolution
  • 21. multiple Ruby apps
  • 22. difficult to scale
  • 23. vendor device managers
  • 24. New Solution
  • 25. the simplest thingthat could possibly work
  • 26. most db writes come fromscheduled jobs
  • 27. Other ClientsRails ReST API Erlang MongoDB Network Devices
  • 28. Joe Armstrong
  • 29. AXD301ATM Switch
  • 30. 99.9999999%
  • 31. Functional
  • 32. Dynamically Typed
  • 33. Single Assignment
  • 34. A = 1. %=> 1A = 2. %=> badmatch
  • 35. [B, 2, C] = [1, 2, 3].B = 1. %=> 1C = 3. %=> 3
  • 36. Immutable DataStructures
  • 37. D = dict:new().D1 = dict:store(foo, 1, D).D2 = dict:store(bar, 2, D1).
  • 38. Concurrency Oriented
  • 39. -module(fact).-export([fac/1]).fac(0) -> 1;fac(N) -> N * fac(N-1).
  • 40. -module(quicksort).-export([quicksort/1]).quicksort([]) -> [];quicksort([Pivot|Rest]) -> quicksort([Front || Front <- Rest, Front < Pivot]) ++ [Pivot] ++ quicksort([Back || Back <- Rest, Back >= Pivot]).
  • 41. Details
  • 42. jobs framework
  • 43. Runner Callback Workers Module
  • 44. Callback Runner Worker Module start readyitem process process ready . . . stop
  • 45. “behaviour” is interface
  • 46. behaviour_info(callbacks) -> [ {init, 1}, {process_item, 3}, {worker_died, 5}, {job_stopping, 1}, {job_complete, 2}].
  • 47. running({worker_ready, WorkerPid, ok}, S) -> case queue:out(S#state.items) of {empty, I2} -> stop_worker(WorkerPid, S), {next_state, complete, S#state{items = I2}}; {{value, Item}, I2} -> job_worker:process(WorkerPid, Item, now(), S#state.job_state), {next_state, running, S#state{items = I2}} end;
  • 48. handle_info({DOWN, _, process, WorkerPid, Info}, StateName, S) -> {Item, StartTime} = clear_worker(WorkerPid, S), Callback = S#state.callback, spawn(Callback, worker_died, [Item, WorkerPid, StartTime, Info, S#state.job_state]), %% Start a replacement worker start_workers(1, Callback), {next_state, StateName, S};
  • 49. handle_cast({process, Item, StartTime, JS}, S) -> Callback = S#state.callback, Continue = try Callback:process_item(Item, StartTime, JS) catch throw: Error -> error_logger:error_report(Error), ok end, job_runner:worker_ready(S#state.runner, self(), Continue), {noreply, S}.
  • 50. story time
  • 51. ReSTful API with WebmachineThe Convention Over Configuration Webserver
  • 52. http://webmachine.basho.comHTTP Request Lifecycle Diagram
  • 53. If you know HTTPWebmachine Is SimpleAs Proven by the “Number of Types of Things” Measurement of Complexity
  • 54. The 3 Most Important Types of Things In Webmachine1. Dispatch Rules (pure data--barely a thing!)2. Resources (composed of simple functions!)3. Requests (simple get/set interface!)
  • 55. Dispatch Rules { ["devices", server], device_resource, [] } GET /devices/12345 Webmachine inspects the device_resource module fordefined callbacks, and sets the Request record’s “server” value to 12345.
  • 56. Resources• POEM (Plain Old Erlang Module)• Composed of referentially transparent functions*• Functions are callbacks into the request lifecycle• Approximately 30 possible callback functions, e.g.: • resource_exists → 404 Not Found • is_authorized → 401 Not Authorized * mostly
  • 57. Resource Functions Perma-404resource_exists(Request, Context) -> {false, Request, Context}. Lucky Authis_authorized(Request, Context) -> S = calendar:time_to_seconds(now()), case S rem 2 of 0 -> {true, Request, Context}; 1 -> {“Basic realm=lucky”, Request, Context} end.
  • 58. Requests• The first argument to each resource function• Set and read request & response data RemoteIP = wrq:peer(Request).wrq:set_resp_header(“X-Answer”, “42”, Request).
  • 59. Retrieving a JSON Firewall Representationcontent_types_provided(Request, Context) -> Types = [{"application/json", to_json}], {Types, Request, Context}.to_json(Request, Context) -> Device = proplists:get_value(device, Context), UserId = get_user_id(Request), case fe_api_firewall:get_config(Device, UserId) of {ok, Config} -> success_response(Config, Request, Context); {error, Reason} -> error_response(502, Reason, Request, Context) end.
  • 60. Gotchas
  • 61. primitive obsession
  • 62. string-ish “hi how are you” <<“hello there”>>[<<"easy as ">>, [$a, $b, $c], " ☺n"].
  • 63. hashes vs records
  • 64. to loop is human, to recur divine
  • 65. Erlang conditionalsalways return a value
  • 66. design for testability
  • 67. don’t spawn,use OTP
  • 68. Downsides
  • 69. Erlang changes very slowly
  • 70. 3rd party libraries
  • 71. standard librarycan be inconsistent
  • 72. packagemanagement
  • 73. Questions
  • 74. http://spkr8.com/t/7806Phil: @philtolandhttp://github.com/tolandhttp://philtoland.comMike: @lifeinzemblahttp://github.com/msassak

Related Documents