ruote tmp/log_2013-03-15.html

2013-03-15 00:15:05 utc

phaeron

jmettraux: launching this workflow 50 times in a row causes very high cpu usage , and increase of about 10% in memory usage https://gist.github.com/iamer/5166396

2013-03-15 00:15:25 utc

phaeron

valgrind reports it is all on the heap (nothing reported as leaked)

2013-03-15 00:16:13 utc

phaeron

10% here is about 50Mb

2013-03-15 00:17:10 utc

phaeron

taking out dumper and trying again

2013-03-15 00:17:41 utc

phaeron

it is of course faster as there is no message passing overhead

2013-03-15 00:18:31 utc

phaeron

but still high cpu usage 90 to 07%

2013-03-15 00:18:34 utc

phaeron

97

2013-03-15 00:27:39 utc

jmettraux

phaeron: what is dumper?

2013-03-15 00:28:47 utc

jmettraux

when there is work, the worker tries to do it as quickly as possible, polling the storage for more work

2013-03-15 00:28:49 utc

phaeron

simple ampq script that just prints the workitem

2013-03-15 00:29:20 utc

phaeron

jmettraux: yeah I am not complaining about the cpu , just reporting. problem is this heap growing without reclaiming.

2013-03-15 00:29:41 utc

jmettraux

without the dumper the problem persists?

2013-03-15 00:29:52 utc

phaeron

without the dumper in the process memory grows slower about 2% per 50 invocations

2013-03-15 00:30:12 utc

lbt

ACTION wonders about the amqp side then

2013-03-15 00:30:27 utc

jmettraux

I should re-read about linux and process mem management

2013-03-15 00:30:28 utc

lbt

we're pretty heavy amqp users

2013-03-15 00:30:39 utc

phaeron

maybe we should upgrade ruote-amqp gem

2013-03-15 00:30:46 utc

phaeron

I bet our version is out of date

2013-03-15 00:31:22 utc

jmettraux

ruby 1.8.7 is going unmaintained in july iirc

2013-03-15 00:31:50 utc

jmettraux

ok, I have to leave

2013-03-15 00:31:58 utc

phaeron

yeah I am getting sleepy too

2013-03-15 00:32:05 utc

phaeron

jmettraux: thanks for all the help

2013-03-15 00:32:32 utc

jmettraux

sorry, didn't do much, just a sherlock partner

2013-03-15 00:32:42 utc

jmettraux

talk to you later!

2013-03-15 00:32:50 utc

phaeron

bye :)

2013-03-15 00:32:57 utc

jmettraux

bye :)

2013-03-15 08:29:12 utc

phaeron

jmettraux: just a quick update, I have update the bundle ruote, and ruote-amqp gems and their dependencies. I can see there's a lot of improvement in the behavior in general ( both cpu and memory ).

2013-03-15 08:29:24 utc

phaeron

will provide more details later

2013-03-15 08:29:27 utc

jmettraux

oh cool :-)

2013-03-15 08:29:34 utc

phaeron

ah you are awake

2013-03-15 08:29:45 utc

jmettraux

did the amqp stuff get upgraded as well?

2013-03-15 08:29:49 utc

phaeron

yes

2013-03-15 08:30:06 utc

jmettraux

could be in the ruby-amqp maybe

2013-03-15 08:30:08 utc

phaeron

I let the new ruote-amqp pull in the specific amqp version it likes

2013-03-15 08:30:27 utc

phaeron

well I did read some relevant points in amqp gem changelog

2013-03-15 08:30:31 utc

jmettraux

ruby-amqp is very active

2013-03-15 08:32:00 utc

phaeron

anyway I gtg catch a bus

2013-03-15 08:32:07 utc

phaeron

will push the new bundle later

2013-03-15 08:33:08 utc

jmettraux

ok, have a good trip!

2013-03-15 08:33:10 utc

phaeron

at least I can launch 100 consecutive invocations of the process without the memory increasing beyond 20%

2013-03-15 08:33:24 utc

jmettraux

excellent!

2013-03-15 08:33:35 utc

phaeron

oh and upgraded to ruby 1.9 too

2013-03-15 09:28:37 utc

phaeron

back

2013-03-15 09:43:56 utc

phaeron

jmettraux: interesting test case. just raise error in the process causes memory increase

2013-03-15 09:51:31 utc

jmettraux

phaeron: copying the stacktrace probably

2013-03-15 09:59:24 utc

phaeron

it's not that big a deal

2013-03-15 09:59:53 utc

phaeron

now I have to fix ruote-kit and package this bundle and deploy it and hope I be happy

2013-03-15 10:00:17 utc

jmettraux

fix ruote-kit?

2013-03-15 10:25:26 utc

phaeron

jmettraux: the sinatra ruote-kit wrapper we use , currently it stopped working after upgrading so much stuff

2013-03-15 10:44:56 utc

jmettraux

ah, understood

2013-03-15 21:49:03 utc

jmettraux

speaking of memory leaks: http://blog.nelhage.com/2013/03/tracking-an-eventmachine-leak/

2013-03-15 21:53:08 utc

ypz

jmettraux good morning

2013-03-15 21:54:05 utc

ypz

I am reading the doc http://ruote.rubyforge.org/rdoc/Ruote/Dashboard.html on resume, you have a note: Note : this is supposed to be called on paused expressions / instances, this is NOT meant to be called to unstuck / unhang a process.

2013-03-15 21:54:58 utc

ypz

could you elaborate on the differences between "paused" process and "stuck / hang " processes ?

2013-03-15 21:58:18 utc

jmettraux

ypz: hello, good afternoon

2013-03-15 21:58:59 utc

jmettraux

the execution of processes depends on "msgs" (messages), they are like those order sheet in restaurants

2013-03-15 21:59:22 utc

jmettraux

depending on the storage implementation, one of those msgs could get lost

2013-03-15 22:00:11 utc

jmettraux

a typical cause of loss would be a worker going down, taking a yet unprocessed msg with him

2013-03-15 22:00:56 utc

ypz

what state would a process be in when one of its participant has an error ?

2013-03-15 22:01:09 utc

jmettraux

the customer is like "I ordered a steak, but it's not coming", in fact the order sheet remained in the waiter's pocket as he went back home, new waiter doesn't know anything, kitchen doesn't know anything

2013-03-15 22:02:09 utc

jmettraux

if the process has no concurrent branches and has an error, the whole process state could be considered "in error"

2013-03-15 22:02:38 utc

jmettraux

so my waiter story describes processes (or branches of processes) that are "stuck"

2013-03-15 22:04:27 utc

ypz

once the error condition is eliminated, could I resume the "in error" process from the participant in error by redoing that participant ?

2013-03-15 22:04:48 utc

jmettraux

yes

2013-03-15 22:07:25 utc

ypz

I tried dashboard.replay_at_error(err), the participant which had the error get rerun and succeeded, but the process does not continue to next participant. my pdef is just a simply sequence, what did I miss ?

2013-03-15 22:08:03 utc

jmettraux

I don't know, could you package that in a way that I can take a look?

2013-03-15 22:08:53 utc

ypz

hm, not easily, I am afraid

2013-03-15 22:09:08 utc

jmettraux

ok, let me write the gist

2013-03-15 22:16:04 utc

jmettraux

ypz: here is a basic case: https://gist.github.com/anonymous/5173425 feel free to download it and play with it

2013-03-15 22:16:07 utc

ypz

I 'll try to get a gist as well

2013-03-15 22:16:47 utc

jmettraux

does your process terminate or simply gets stuck?

2013-03-15 22:16:55 utc

jmettraux

(sorry I should have asked immediately)

2013-03-15 22:20:06 utc

ypz

it was stuck with error = 1

2013-03-15 22:20:25 utc

jmettraux

same error as before?

2013-03-15 22:20:26 utc

ypz

once I ran replay_at_error, error = 0

2013-03-15 22:20:32 utc

jmettraux

ah

2013-03-15 22:20:53 utc

jmettraux

did the process vanish or did it stay, but "stuck"?

2013-03-15 22:21:15 utc

ypz

hm, I think i throw a route.pause(wfid) to that process at somewhere

2013-03-15 22:21:57 utc

ypz

tried ruote@resume(wfid) seem to get it going

2013-03-15 22:22:35 utc

jmettraux

ok

2013-03-15 22:23:26 utc

ypz

so for "in error" process, => fix error condition and replay_at_error(err) to get it going; for "paused" process => use resume to get it going; am I correct on these ?

2013-03-15 22:23:38 utc

jmettraux

yes

2013-03-15 22:23:54 utc

jmettraux

one rule of thumb would be: "never use pause/resume"

2013-03-15 22:24:11 utc

jmettraux

unless you really really have a use case for it

2013-03-15 22:24:39 utc

ypz

when would one use "pause" ?

2013-03-15 22:25:16 utc

jmettraux

I guess to pause a resource intensive workflow

2013-03-15 22:25:33 utc

jmettraux

but you're never sure of when the pause msg will reach the leaves of the workflow

2013-03-15 22:25:55 utc

jmettraux

depending on the use cases, it might be easier to implement the pausing/resuming via the participants

2013-03-15 22:26:14 utc

phaeron

jmettraux: that's pretty hardcode debugging

2013-03-15 22:27:07 utc

jmettraux

a global system switch telling the participant to pause, the engine would just sit waiting for participants, forced pause...

2013-03-15 22:27:31 utc

jmettraux

phaeron: yeah, ruby, eventmachine, ruby debugger, gc, ...

2013-03-15 22:30:07 utc

ypz

in your earlier restaurant example, that "stuck" process has to redo the entire process from beginning, right ? customer had to re-order his steak !

2013-03-15 22:31:35 utc

jmettraux

fortunately, most of the time, the order can be retrieved, the technique is packaged in the #respark method: https://github.com/jmettraux/ruote/blob/master/lib/ruote/dashboard.rb#L411-L431

2013-03-15 22:35:50 utc

ypz

hey, it's undocumented :D

2013-03-15 22:36:39 utc

jmettraux

really, lines 411 to 423 are documentation, right?

2013-03-15 22:37:03 utc

ypz

but nothing about it on http://ruote.rubyforge.org/rdoc/Ruote/Dashboard.html page

2013-03-15 22:37:11 utc

jmettraux

aah

2013-03-15 22:37:39 utc

jmettraux

I have to find the time to re-generate that part of the doc

2013-03-15 22:38:12 utc

ypz

I need to get into the habit of reading the source code more often !

2013-03-15 22:39:02 utc

jmettraux

sorry about that, old documentation may be misleading

2013-03-15 22:40:58 utc

ypz

yea, source code is the ultimate authoritative source

2013-03-15 22:54:29 utc

jmettraux

http://rdoc.info/github/jmettraux/ruote/ is super old too

2013-03-15 22:55:38 utc

jmettraux

that rdoc.info site seems stuck in the past