ruote tmp/log_2013-03-15.html

2013-03-15 00:15:05 utc phaeron jmettraux: launching this workflow 50 times in a row causes very high cpu usage , and increase of about 10% in memory usage
2013-03-15 00:15:25 utc phaeron valgrind reports it is all on the heap (nothing reported as leaked)
2013-03-15 00:16:13 utc phaeron 10% here is about 50Mb
2013-03-15 00:17:10 utc phaeron taking out dumper and trying again
2013-03-15 00:17:41 utc phaeron it is of course faster as there is no message passing overhead
2013-03-15 00:18:31 utc phaeron but still high cpu usage 90 to 07%
2013-03-15 00:18:34 utc phaeron 97
2013-03-15 00:27:39 utc jmettraux phaeron: what is dumper?
2013-03-15 00:28:47 utc jmettraux when there is work, the worker tries to do it as quickly as possible, polling the storage for more work
2013-03-15 00:28:49 utc phaeron simple ampq script that just prints the workitem
2013-03-15 00:29:20 utc phaeron jmettraux: yeah I am not complaining about the cpu , just reporting. problem is this heap growing without reclaiming.
2013-03-15 00:29:41 utc jmettraux without the dumper the problem persists?
2013-03-15 00:29:52 utc phaeron without the dumper in the process memory grows slower about 2% per 50 invocations
2013-03-15 00:30:12 utc lbt ACTION wonders about the amqp side then
2013-03-15 00:30:27 utc jmettraux I should re-read about linux and process mem management
2013-03-15 00:30:28 utc lbt we're pretty heavy amqp users
2013-03-15 00:30:39 utc phaeron maybe we should upgrade ruote-amqp gem
2013-03-15 00:30:46 utc phaeron I bet our version is out of date
2013-03-15 00:31:22 utc jmettraux ruby 1.8.7 is going unmaintained in july iirc
2013-03-15 00:31:50 utc jmettraux ok, I have to leave
2013-03-15 00:31:58 utc phaeron yeah I am getting sleepy too
2013-03-15 00:32:05 utc phaeron jmettraux: thanks for all the help
2013-03-15 00:32:32 utc jmettraux sorry, didn't do much, just a sherlock partner
2013-03-15 00:32:42 utc jmettraux talk to you later!
2013-03-15 00:32:50 utc phaeron bye :)
2013-03-15 00:32:57 utc jmettraux bye :)
2013-03-15 08:29:12 utc phaeron jmettraux: just a quick update, I have update the bundle ruote, and ruote-amqp gems and their dependencies. I can see there's a lot of improvement in the behavior in general ( both cpu and memory ).
2013-03-15 08:29:24 utc phaeron will provide more details later
2013-03-15 08:29:27 utc jmettraux oh cool :-)
2013-03-15 08:29:34 utc phaeron ah you are awake
2013-03-15 08:29:45 utc jmettraux did the amqp stuff get upgraded as well?
2013-03-15 08:29:49 utc phaeron yes
2013-03-15 08:30:06 utc jmettraux could be in the ruby-amqp maybe
2013-03-15 08:30:08 utc phaeron I let the new ruote-amqp pull in the specific amqp version it likes
2013-03-15 08:30:27 utc phaeron well I did read some relevant points in amqp gem changelog
2013-03-15 08:30:31 utc jmettraux ruby-amqp is very active
2013-03-15 08:32:00 utc phaeron anyway I gtg catch a bus
2013-03-15 08:32:07 utc phaeron will push the new bundle later
2013-03-15 08:33:08 utc jmettraux ok, have a good trip!
2013-03-15 08:33:10 utc phaeron at least I can launch 100 consecutive invocations of the process without the memory increasing beyond 20%
2013-03-15 08:33:24 utc jmettraux excellent!
2013-03-15 08:33:35 utc phaeron oh and upgraded to ruby 1.9 too
2013-03-15 09:28:37 utc phaeron back
2013-03-15 09:43:56 utc phaeron jmettraux: interesting test case. just raise error in the process causes memory increase
2013-03-15 09:51:31 utc jmettraux phaeron: copying the stacktrace probably
2013-03-15 09:59:24 utc phaeron it's not that big a deal
2013-03-15 09:59:53 utc phaeron now I have to fix ruote-kit and package this bundle and deploy it and hope I be happy
2013-03-15 10:00:17 utc jmettraux fix ruote-kit?
2013-03-15 10:25:26 utc phaeron jmettraux: the sinatra ruote-kit wrapper we use , currently it stopped working after upgrading so much stuff
2013-03-15 10:44:56 utc jmettraux ah, understood
2013-03-15 21:49:03 utc jmettraux speaking of memory leaks:
2013-03-15 21:53:08 utc ypz jmettraux good morning
2013-03-15 21:54:05 utc ypz I am reading the doc on resume, you have a note: Note : this is supposed to be called on paused expressions / instances, this is NOT meant to be called to unstuck / unhang a process.
2013-03-15 21:54:58 utc ypz could you elaborate on the differences between "paused" process and "stuck / hang " processes ?
2013-03-15 21:58:18 utc jmettraux ypz: hello, good afternoon
2013-03-15 21:58:59 utc jmettraux the execution of processes depends on "msgs" (messages), they are like those order sheet in restaurants
2013-03-15 21:59:22 utc jmettraux depending on the storage implementation, one of those msgs could get lost
2013-03-15 22:00:11 utc jmettraux a typical cause of loss would be a worker going down, taking a yet unprocessed msg with him
2013-03-15 22:00:56 utc ypz what state would a process be in when one of its participant has an error ?
2013-03-15 22:01:09 utc jmettraux the customer is like "I ordered a steak, but it's not coming", in fact the order sheet remained in the waiter's pocket as he went back home, new waiter doesn't know anything, kitchen doesn't know anything
2013-03-15 22:02:09 utc jmettraux if the process has no concurrent branches and has an error, the whole process state could be considered "in error"
2013-03-15 22:02:38 utc jmettraux so my waiter story describes processes (or branches of processes) that are "stuck"
2013-03-15 22:04:27 utc ypz once the error condition is eliminated, could I resume the "in error" process from the participant in error by redoing that participant ?
2013-03-15 22:04:48 utc jmettraux yes
2013-03-15 22:07:25 utc ypz I tried dashboard.replay_at_error(err), the participant which had the error get rerun and succeeded, but the process does not continue to next participant. my pdef is just a simply sequence, what did I miss ?
2013-03-15 22:08:03 utc jmettraux I don't know, could you package that in a way that I can take a look?
2013-03-15 22:08:53 utc ypz hm, not easily, I am afraid
2013-03-15 22:09:08 utc jmettraux ok, let me write the gist
2013-03-15 22:16:04 utc jmettraux ypz: here is a basic case: feel free to download it and play with it
2013-03-15 22:16:07 utc ypz I 'll try to get a gist as well
2013-03-15 22:16:47 utc jmettraux does your process terminate or simply gets stuck?
2013-03-15 22:16:55 utc jmettraux (sorry I should have asked immediately)
2013-03-15 22:20:06 utc ypz it was stuck with error = 1
2013-03-15 22:20:25 utc jmettraux same error as before?
2013-03-15 22:20:26 utc ypz once I ran replay_at_error, error = 0
2013-03-15 22:20:32 utc jmettraux ah
2013-03-15 22:20:53 utc jmettraux did the process vanish or did it stay, but "stuck"?
2013-03-15 22:21:15 utc ypz hm, I think i throw a route.pause(wfid) to that process at somewhere
2013-03-15 22:21:57 utc ypz tried ruote@resume(wfid) seem to get it going
2013-03-15 22:22:35 utc jmettraux ok
2013-03-15 22:23:26 utc ypz so for "in error" process, => fix error condition and replay_at_error(err) to get it going; for "paused" process => use resume to get it going; am I correct on these ?
2013-03-15 22:23:38 utc jmettraux yes
2013-03-15 22:23:54 utc jmettraux one rule of thumb would be: "never use pause/resume"
2013-03-15 22:24:11 utc jmettraux unless you really really have a use case for it
2013-03-15 22:24:39 utc ypz when would one use "pause" ?
2013-03-15 22:25:16 utc jmettraux I guess to pause a resource intensive workflow
2013-03-15 22:25:33 utc jmettraux but you're never sure of when the pause msg will reach the leaves of the workflow
2013-03-15 22:25:55 utc jmettraux depending on the use cases, it might be easier to implement the pausing/resuming via the participants
2013-03-15 22:26:14 utc phaeron jmettraux: that's pretty hardcode debugging
2013-03-15 22:27:07 utc jmettraux a global system switch telling the participant to pause, the engine would just sit waiting for participants, forced pause...
2013-03-15 22:27:31 utc jmettraux phaeron: yeah, ruby, eventmachine, ruby debugger, gc, ...
2013-03-15 22:30:07 utc ypz in your earlier restaurant example, that "stuck" process has to redo the entire process from beginning, right ? customer had to re-order his steak !
2013-03-15 22:31:35 utc jmettraux fortunately, most of the time, the order can be retrieved, the technique is packaged in the #respark method:
2013-03-15 22:35:50 utc ypz hey, it's undocumented :D
2013-03-15 22:36:39 utc jmettraux really, lines 411 to 423 are documentation, right?
2013-03-15 22:37:03 utc ypz but nothing about it on page
2013-03-15 22:37:11 utc jmettraux aah
2013-03-15 22:37:39 utc jmettraux I have to find the time to re-generate that part of the doc
2013-03-15 22:38:12 utc ypz I need to get into the habit of reading the source code more often !
2013-03-15 22:39:02 utc jmettraux sorry about that, old documentation may be misleading
2013-03-15 22:40:58 utc ypz yea, source code is the ultimate authoritative source
2013-03-15 22:54:29 utc jmettraux is super old too
2013-03-15 22:55:38 utc jmettraux that site seems stuck in the past