ruote tmp/log_2012-04-10.html

2012-04-10 17:26:01 utc myron I'm interested in trying out ruote for the first time. I noticed the newest release on is over a year old (2.2 from 2/28/2011) but github has lots of commits since then
2012-04-10 17:26:16 utc myron should I stick with the release from last february? or try the latest on github?
2012-04-10 18:46:01 utc myron anyone here?
2012-04-10 18:55:54 utc Mugatu myron: I've been using the latest from github
2012-04-10 18:56:00 utc Mugatu Has worked fine for me so far
2012-04-10 18:56:04 utc myron good to hear
2012-04-10 18:56:14 utc myron has it been tested on 1.9.3?
2012-04-10 18:56:16 utc Mugatu It seems that much of the documentation is already switching to use the newer conventions
2012-04-10 18:56:26 utc myron newer conventions?
2012-04-10 18:56:48 utc Mugatu There appears to be some API transition occurring between the last release and what is in github
2012-04-10 18:56:56 utc Mugatu the older APIs still work fine though
2012-04-10 18:57:01 utc Mugatu participant methods for instance are changing
2012-04-10 18:57:15 utc Mugatu It's referenced in the participant documentation
2012-04-10 18:57:35 utc myron ok
2012-04-10 18:57:36 utc Mugatu As far as 1.9.3 goes, the best I can tell you is that I've been using it on 1.9.3, no issues
2012-04-10 18:57:42 utc myron I'm a total noob here but I'll take a look
2012-04-10 18:57:46 utc Mugatu (I'm just a casual user, not expert by any means)
2012-04-10 18:59:05 utc myron I'm trying to figure out if ruote will be a good fit for my project
2012-04-10 19:06:58 utc myron @Mugatu -- do you have some recommended resources to help me get started with ruote?
2012-04-10 19:07:14 utc myron It's a lot to take in initially and I'm not sure of the best way to start
2012-04-10 22:09:30 utc myron jmettraux -- you around?
2012-04-10 22:09:48 utc jmettraux myron: hello Myron, welcome to #ruote
2012-04-10 22:09:53 utc myron thanks
2012-04-10 22:10:16 utc jmettraux you're the author of VCR among other gems?
2012-04-10 22:10:19 utc myron yep
2012-04-10 22:10:29 utc jmettraux excellent, thanks for that
2012-04-10 22:10:42 utc myron cool, I'm always glad to hear others find it useful :)
2012-04-10 22:11:01 utc myron anyhow, I'm trying to evaluate ruote for a project I'm working on
2012-04-10 22:11:05 utc myron got a second to answer a few questions?
2012-04-10 22:11:19 utc jmettraux Mugatu: thanks for helping Myron
2012-04-10 22:11:24 utc jmettraux yes
2012-04-10 22:11:29 utc myron cool
2012-04-10 22:11:38 utc myron let me say a bit about what we're trying to build....
2012-04-10 22:12:24 utc myron we collect lots of data for our users on a weekly schedule. the data is collected by backend services that use some large scaleable datastore underneath (e.g. riak or cassandra)
2012-04-10 22:12:51 utc myron now we're trying to build a middle-tier aggregating service that builds a weekly index on combined views of the data so we can serve it to our users in interesting ways
2012-04-10 22:14:03 utc myron we're using MySQL for the middletier and sharding the data on a per-user basis (since it's updated weekly and is just a cache of the canonical data), and I'm working on building a processing pipeline that will build a new shard anytime a backend has new data for the user
2012-04-10 22:14:05 utc myron does that make sense?
2012-04-10 22:14:16 utc jmettraux yes
2012-04-10 22:14:55 utc myron so...I want to be able to define a bunch of independent steps, each of which has zero or more dependencies on backends or other previous steps
2012-04-10 22:15:31 utc myron e.g. some steps my use data written to the DB in a previous step, and combine it with data from another previous step
2012-04-10 22:15:57 utc myron I was going to start working on a little gem to help us define these steps when I found ruote, and it seems to support most of the sorts of things I was thinking of building
2012-04-10 22:16:03 utc myron does this sound like a good use-case for ruote?
2012-04-10 22:18:40 utc jmettraux the "dependency on previous step" thing makes me think you need something more like a rule system
2012-04-10 22:19:00 utc myron right, it was one thing I wasn't quite sure how to achieve with ruote
2012-04-10 22:19:09 utc jmettraux but the orchestration of the steps part sure is ruotesque
2012-04-10 22:20:21 utc jmettraux I have the impression I have seen libraries on github that address some of the aspects of your use case, but I cannot remember their names
2012-04-10 22:20:49 utc myron are they libraries that hook into ruote? Or standalone libs?
2012-04-10 22:21:52 utc jmettraux more like frameworks
2012-04-10 22:22:01 utc jmettraux nothing for ruote
2012-04-10 22:22:06 utc myron gotcha
2012-04-10 22:22:13 utc jmettraux (I'd have remembered)
2012-04-10 22:22:54 utc myron so let's say I have these processing steps...
2012-04-10 22:22:58 utc myron 1) fetch_social_data
2012-04-10 22:23:01 utc myron 2) fetch_traffic_data
2012-04-10 22:23:08 utc myron 3) aggregate_traffic_and_social
2012-04-10 22:23:29 utc myron fetch_social_data can be run as soon as the social-data backend has new data
2012-04-10 22:23:41 utc myron fetch_traffic_data can be run as soon as the traffic data backend has new data
2012-04-10 22:24:06 utc myron aggregate_traffic_and_social should be run as soon as #1 and #2 are both done.
2012-04-10 22:24:15 utc myron but the backends may update at different times of day, hours apart
2012-04-10 22:24:23 utc myron is this doable with ruote?
2012-04-10 22:25:06 utc jmettraux yes, but I'm not sure it'd behave as you'd wish it to
2012-04-10 22:25:29 utc jmettraux Ruote.define { concurrence { fetch_social; fetch_traffic }; aggregate }
2012-04-10 22:25:29 utc myron how so?
2012-04-10 22:25:56 utc myron right, I've been playing with the concurrence stuff a bit
2012-04-10 22:26:02 utc jmettraux this process would fetch social and traffic in parallel, when both are done, it would aggregate
2012-04-10 22:26:19 utc jmettraux but maybe you want the fetch_social and fetch_traffic to run all the time
2012-04-10 22:26:33 utc jmettraux and they'd emit to a queue of work for aggregate
2012-04-10 22:26:55 utc jmettraux and then aggregate would decide on its own if it has enough data for an aggregation
2012-04-10 22:27:25 utc jmettraux not sure if it's your use case, but it decouples well like this
2012-04-10 22:27:39 utc jmettraux ruote is more like do this, this and then that
2012-04-10 22:27:45 utc myron hmmm...I'll have to play with it, I think
2012-04-10 22:28:02 utc myron so maybe not a good fit if my processing pipeline is primarily a dependency graph?
2012-04-10 22:28:34 utc jmettraux if it's a "pipeline" then maybe setting up queues and consumers would serve you better
2012-04-10 22:28:58 utc jmettraux if it's more punctual, like "fetch the data for today, then aggregate", ruote is OK
2012-04-10 22:29:53 utc myron it kinda wants to be the latter...but it also needs to be tolerant of problems with one of the backends
2012-04-10 22:29:56 utc myron tricky to find the right balance :(
2012-04-10 22:30:06 utc jmettraux +1
2012-04-10 22:30:25 utc myron thanks for the advice, though, it's helpful
2012-04-10 22:30:35 utc jmettraux you're welcome
2012-04-10 22:31:08 utc myron does ruote provide any guarantees of not dropping messages or workitems on the floor?
2012-04-10 22:31:37 utc jmettraux it tries hard not to, but in the end, it depends on the storage implementation you're using
2012-04-10 22:31:56 utc myron right, we're likely to use redis and that's of course primarily an in-memory datastore
2012-04-10 22:33:04 utc myron if a ruote worker is killed (or our datacenter has a power-outage...yes it's happened...), will it pick up exactly where it left off when we restart as long as the storage implementation didn't drop anything?
2012-04-10 22:33:08 utc jmettraux I've seen it used by people on the mailing list, we had an issue with dropped messages, but it was my fault, seems to work like a charm
2012-04-10 22:33:18 utc myron good to konw
2012-04-10 22:33:19 utc jmettraux it tries hard to
2012-04-10 22:33:26 utc myron right, these are hard problems
2012-04-10 22:33:36 utc jmettraux let me find a message about that
2012-04-10 22:33:56 utc jmettraux
2012-04-10 22:34:25 utc jmettraux processes can get "stalled" in those cases
2012-04-10 22:34:41 utc jmettraux the email thread explains one way of recovering
2012-04-10 22:34:46 utc myron right, reading now :)
2012-04-10 22:36:19 utc myron the last gem release is over a year ago--is the recommendation now just to use what's on github?
2012-04-10 22:38:35 utc jmettraux please use what's on github, I have to release soon, but day job is interfering (though I use ruote there)
2012-04-10 22:38:42 utc myron cool, will do