ruote tmp/log_2012-11-08.html

2012-11-08 02:53:41 utc

mburnett

i should be able to use process variables in situtaionts like: iterator :times => '${v:blah}' do ... end, right?

2012-11-08 03:06:48 utc

jmettraux

mburnett: hello

2012-11-08 03:07:08 utc

jmettraux

iirc, yes

2012-11-08 03:07:24 utc

mburnett

hi jmettraux :)

2012-11-08 03:08:46 utc

jmettraux

if it doesn't work, please fill an issue report: https://github.com/jmettraux/ruote/issues

2012-11-08 03:11:31 utc

mburnett

ok, i'm trying to make sure that i'm just not crazy...trying with echo etc

2012-11-08 04:02:02 utc

jmettraux

this works for me: https://gist.github.com/4036670

2012-11-08 04:21:24 utc

mburnett

ah, i have been trying to pass them into launch:

2012-11-08 04:21:42 utc

mburnett

dashboard.launch(pdef, {}, {:somevar => 6})

2012-11-08 04:22:10 utc

mburnett

i think i must be doing that wrong

2012-11-08 04:35:36 utc

jmettraux

this works: https://gist.github.com/4036784

2012-11-08 04:35:52 utc

jmettraux

but { :blah => 3 } doesn't work

2012-11-08 04:37:16 utc

mburnett

aha

2012-11-08 04:37:18 utc

mburnett

thank you :)

2012-11-08 04:38:31 utc

jmettraux

you're welcome, https://github.com/jmettraux/ruote/issues/67

2012-11-08 04:39:37 utc

mburnett

it looks like i need to read up on the difference between 'blah' => 3 and :blah => 3

2012-11-08 04:39:49 utc

mburnett

i am not a rubyist really

2012-11-08 04:40:18 utc

jmettraux

really? Your code looks neat

2012-11-08 04:40:40 utc

jmettraux

are you using Python?

2012-11-08 04:40:50 utc

mburnett

well, i like python a lot

2012-11-08 04:40:56 utc

mburnett

but work is mostly perl....

2012-11-08 04:40:58 utc

jmettraux

:-)

2012-11-08 04:41:02 utc

mburnett

and thanks :D

2012-11-08 04:41:03 utc

jmettraux

very cool

2012-11-08 04:41:24 utc

jmettraux

your gist from the other day was a pleasure to read

2012-11-08 04:41:32 utc

mburnett

really? thanks :)

2012-11-08 04:42:13 utc

mburnett

we're going to release whatever we come up with (i think ruote is by far the best contender right now)

2012-11-08 04:42:34 utc

jmettraux

oh cool

2012-11-08 04:42:51 utc

mburnett

i'm also going to be doing some benchmarking to see how the complexity of serial vs concurrent workflows grows with number of operations

2012-11-08 04:43:00 utc

mburnett

for noops

2012-11-08 04:43:22 utc

mburnett

we would like to be able to do (eventually) 10^6 concurrent things

2012-11-08 04:43:36 utc

jmettraux

understood, though ruote was really not meant for such wide flows

2012-11-08 04:43:37 utc

mburnett

our current requirements are basically 5-10k concurrent things

2012-11-08 04:43:40 utc

mburnett

sure

2012-11-08 04:44:20 utc

jmettraux

I was a bit surprised when you mentioned the number the other day, I was thinking that such width was for the grid system ultimately

2012-11-08 04:44:21 utc

mburnett

but our current system is really incapable even at 3k

2012-11-08 04:44:26 utc

jmettraux

ouch

2012-11-08 04:44:54 utc

mburnett

also, the current system doesn't let us run say 10-100 engines to help that performance (also something we want to measure)

2012-11-08 04:45:05 utc

jmettraux

woah

2012-11-08 04:45:21 utc

mburnett

well, we probably won't run 100

2012-11-08 04:45:22 utc

mburnett

lol

2012-11-08 04:45:26 utc

jmettraux

you build it, if I remember correclty?

2012-11-08 04:46:01 utc

jmettraux

correctly

2012-11-08 04:46:45 utc

mburnett

well, i'm working with kindjal to evaluate whether ruote is a good fit to replace our current system

2012-11-08 04:46:51 utc

mburnett

other teams are evaluating other systems

2012-11-08 04:46:58 utc

mburnett

our team is about 20 people

2012-11-08 04:47:11 utc

jmettraux

ah really, you guys work together?

2012-11-08 04:47:24 utc

mburnett

yeah, but we didn't know it the first day in irc

2012-11-08 04:47:34 utc

mburnett

he works in systems, and i'm a programmer

2012-11-08 04:47:36 utc

jmettraux

your org must be big

2012-11-08 04:47:39 utc

jmettraux

aaah

2012-11-08 04:48:19 utc

mburnett

well, we do genome sequencing

2012-11-08 04:48:24 utc

mburnett

it's computationally fairly intense

2012-11-08 04:50:36 utc

jmettraux

very cool

2012-11-08 04:52:02 utc

mburnett

our immediate use case for ~5k concurrence is to iterate over a list of ids that locate data to operate on

2012-11-08 04:52:21 utc

mburnett

so it's not a show stopper if the workflow calculations themselves are not super fast

2012-11-08 04:52:28 utc

mburnett

just fyi

2012-11-08 04:52:35 utc

jmettraux

ok

2012-11-08 04:52:57 utc

mburnett

another programmer and i were trying to understand theoretically why these systems seem to give n^2

2012-11-08 04:53:18 utc

mburnett

intuitively, it seems like it should be O(num ops + num connections/dependencies)

2012-11-08 04:53:28 utc

mburnett

we did not dig into the ruote code though

2012-11-08 04:54:09 utc

mburnett

our only hypothesis was that all the operations were being read/written to storage for every operation, but that seems unlikely

2012-11-08 04:54:20 utc

jmettraux

let me write a quick piece of code

2012-11-08 04:54:27 utc

jmettraux

yes, it's roughly what happens

2012-11-08 04:54:38 utc

mburnett

oh, well that explains it

2012-11-08 04:55:18 utc

mburnett

i have some numbers for sequential iterator that i generated tonight with hash storage and a single worker if you want them

2012-11-08 04:55:45 utc

mburnett

the concurrent iterator numbers are running now

2012-11-08 04:55:48 utc

jmettraux

ok

2012-11-08 04:56:02 utc

mburnett

should i paste them here? it's 1k-10k every 1k

2012-11-08 04:56:15 utc

mburnett

i can gist the code too

2012-11-08 04:56:39 utc

mburnett

maybe i'll put them as comments in the gist

2012-11-08 04:56:43 utc

jmettraux

gists are most welcome

2012-11-08 05:00:03 utc

mburnett

https://gist.github.com/4036852

2012-11-08 05:00:28 utc

mburnett

we actually did this for much smaller numbers a week or so ago

2012-11-08 05:00:39 utc

mburnett

sequential looked linear for small numbers < 1k

2012-11-08 05:00:47 utc

mburnett

but concurrent looked quadratic even then

2012-11-08 05:02:11 utc

mburnett

forgive the sloppy snippet, i had given up on using a process variable at that point

2012-11-08 05:02:35 utc

jmettraux

no worries

2012-11-08 05:04:25 utc

jmettraux

running a small bench that count ops

2012-11-08 05:04:35 utc

jmettraux

well, that count msgs processed

2012-11-08 05:05:25 utc

mburnett

cool

2012-11-08 05:05:56 utc

jmettraux

fighting the wait_for timeout

2012-11-08 05:08:21 utc

jmettraux

wow, c-iterator time is awful

2012-11-08 05:09:11 utc

jmettraux

1000 iterations is < 1 min, but 1000 citerations are ~3 min

2012-11-08 05:09:44 utc

jmettraux

https://gist.github.com/4036883

2012-11-08 05:10:11 utc

jmettraux

and c-iterations require half the msgs count

2012-11-08 05:10:36 utc

jmettraux

I wonder if there could something to optimize in there

2012-11-08 05:10:49 utc

jmettraux

compiling an issue...

2012-11-08 05:10:53 utc

mburnett

interesting

2012-11-08 05:12:29 utc

mburnett

let me know if you find an improvement, i present our proof-of-concept monday the 19th

2012-11-08 05:12:44 utc

jmettraux

ok

2012-11-08 05:12:50 utc

mburnett

so even if i can say that you've identified an improvement, but haven't fixed it yet is good

2012-11-08 05:13:11 utc

jmettraux

ok

2012-11-08 05:13:34 utc

mburnett

anyway, i'm off to bed

2012-11-08 05:13:38 utc

mburnett

thanks for your help again :)

2012-11-08 05:13:49 utc

jmettraux

hey, you're welcome

2012-11-08 05:14:07 utc

jmettraux

https://github.com/jmettraux/ruote/issues/68

2012-11-08 05:16:44 utc

jmettraux

I have to go now, have a good night

2012-11-08 20:00:28 utc

mburnett

is there a way to have multiple dashboards waiting for the same event? i'm trying to see what happens to wallclock time as number of engines increases, and i'd like to have all the engine processes exit when the single test process is complete

2012-11-08 20:24:15 utc

mburnett

nevermind

2012-11-08 20:24:25 utc

mburnett

i'm just using process observers with exit

2012-11-08 21:37:13 utc

jmettraux

mburnett: hello, when you say "the number of engines increases", do you mean "the number of workers increases"?

2012-11-08 21:38:01 utc

mburnett

hi jmettraux :)

2012-11-08 21:38:04 utc

mburnett

yeah, sorry

2012-11-08 21:38:15 utc

mburnett

i've actually just run a few quick tests

2012-11-08 21:38:20 utc

jmettraux

ah, that puts yesterday's conversation in a different light

2012-11-08 21:38:32 utc

mburnett

i didn't realize there was some locking going on

2012-11-08 21:38:57 utc

jmettraux

I was really thinking you wanted to use multiple engines (ie multiple set of workers)

2012-11-08 21:39:01 utc

jmettraux

ok

2012-11-08 21:39:04 utc

mburnett

oohh

2012-11-08 21:39:08 utc

mburnett

sorry, my confusion

2012-11-08 21:39:20 utc

jmettraux

it's an interesting axis anyway

2012-11-08 21:40:09 utc

mburnett

may i post 3 lines of timings vs # workers?

2012-11-08 21:40:19 utc

mburnett

for citerator :times => 1000

2012-11-08 21:40:25 utc

jmettraux

a main engine with a serious storage, and sub engines with transient storages

2012-11-08 21:40:59 utc

jmettraux

ok, please post here, I'll append the link to this log to the issue

2012-11-08 21:41:08 utc

mburnett

ok

2012-11-08 21:42:02 utc

mburnett

oops, it's 4 lines, with format (# operations, # workers, wallclock seconds, max resident kb of worker processes)

2012-11-08 21:42:20 utc

jmettraux

gist maybe

2012-11-08 21:42:31 utc

mburnett

ok

2012-11-08 21:45:23 utc

mburnett

https://gist.github.com/4041818

2012-11-08 21:46:07 utc

mburnett

the way i killed all the processes is obviously very hacky

2012-11-08 21:46:58 utc

jmettraux

thanks a lot!

2012-11-08 21:47:14 utc

mburnett

my pleasure

2012-11-08 21:47:48 utc

jmettraux

looking forward to find some time to work on this citerator

2012-11-08 21:48:03 utc

mburnett

cool :)

2012-11-08 22:00:54 utc

mburnett

it's starting to look like if i have one of these slow processes that it blocks other processes from proceeding quickly even with multiple workers, is that right?

2012-11-08 22:01:48 utc

jmettraux

there is no priority mechanism

2012-11-08 22:02:04 utc

jmettraux

first msg wins

2012-11-08 22:02:20 utc

mburnett

and only one worker can handle a message at a time?

2012-11-08 22:03:05 utc

mburnett

or they just populate the queue at once

2012-11-08 22:03:12 utc

jmettraux

workers pull batches of messages, then iterate on them, for each msg they try to reserve it, if the reserve is successful, the worker processes the msg

2012-11-08 22:03:32 utc

mburnett

ah, so the workers may be pulling the same batches of messages?

2012-11-08 22:03:39 utc

mburnett

hence lots of conflicts/failed reserves?

2012-11-08 22:03:47 utc

jmettraux

yes

2012-11-08 22:03:50 utc

mburnett

interesting

2012-11-08 22:04:32 utc

jmettraux

the current routine tries for a while, after a certain number of "already reserved", it discards the head of the batch and tries with the tail

2012-11-08 22:05:13 utc

jmettraux

that's the vanilla thing, now if you use the Redis storage, it's different

2012-11-08 22:05:23 utc

mburnett

oh, it is using redis

2012-11-08 22:05:34 utc

mburnett

i was going to suggest that i could write a hybrid amq storage for messages

2012-11-08 22:05:45 utc

mburnett

how is redis different?

2012-11-08 22:06:10 utc

jmettraux

each worker pops a batch

2012-11-08 22:06:29 utc

jmettraux

there is no overlap among the batches for two different workers

2012-11-08 22:06:33 utc

mburnett

ah

2012-11-08 22:06:35 utc

jmettraux

reserve is always successful

2012-11-08 22:06:46 utc

mburnett

right

2012-11-08 22:06:48 utc

jmettraux

may not be the best solution though

2012-11-08 22:07:03 utc

mburnett

sounds like amq wouldn't buy us anything on that front

2012-11-08 22:07:39 utc

mburnett

if the worker dies, does the popped batch disappear?

2012-11-08 22:07:42 utc

jmettraux

it could quite possibly slow down things

2012-11-08 22:08:01 utc

jmettraux

yes, the popped batch is gone, it can result in stalled workflows

2012-11-08 22:08:07 utc

mburnett

ok, that's good to know

2012-11-08 22:08:53 utc

jmettraux

the implementation is minimal: https://github.com/jmettraux/ruote-redis/blob/master/lib/ruote/redis/storage.rb

2012-11-08 22:09:08 utc

jmettraux

I know some company derived their own with optimizations

2012-11-08 22:17:19 utc

mburnett

well, i have to get going. hopefully, i'll be able to work on this some more this weekend, but i'm going to be out of town, so i'm not sure.

2012-11-08 22:17:23 utc

mburnett

take care

2012-11-08 22:19:30 utc

jmettraux

ok, enjoy!