| 2013-03-15 00:15:05 utc | phaeron | jmettraux: launching this workflow 50 times in a row causes very high cpu usage , and increase of about 10% in memory usage https://gist.github.com/iamer/5166396 | 
| 2013-03-15 00:15:25 utc | phaeron | valgrind reports it is all on the heap (nothing reported as leaked) | 
| 2013-03-15 00:16:13 utc | phaeron | 10% here is about 50Mb | 
| 2013-03-15 00:17:10 utc | phaeron | taking out dumper and trying again | 
| 2013-03-15 00:17:41 utc | phaeron | it is of course faster as there is no message passing overhead | 
| 2013-03-15 00:18:31 utc | phaeron | but still high cpu usage 90 to 07% | 
| 2013-03-15 00:18:34 utc | phaeron | 97 | 
| 2013-03-15 00:27:39 utc | jmettraux | phaeron: what is dumper? | 
| 2013-03-15 00:28:47 utc | jmettraux | when there is work, the worker tries to do it as quickly as possible, polling the storage for more work | 
| 2013-03-15 00:28:49 utc | phaeron | simple ampq script that just prints the workitem | 
| 2013-03-15 00:29:20 utc | phaeron | jmettraux: yeah I am not complaining about the cpu , just reporting. problem is this heap growing without reclaiming. | 
| 2013-03-15 00:29:41 utc | jmettraux | without the dumper the problem persists? | 
| 2013-03-15 00:29:52 utc | phaeron | without the dumper in the process memory grows slower about 2% per 50 invocations | 
| 2013-03-15 00:30:12 utc | lbt | ACTION wonders about the amqp side then | 
| 2013-03-15 00:30:27 utc | jmettraux | I should re-read about linux and process mem management | 
| 2013-03-15 00:30:28 utc | lbt | we're pretty heavy amqp users | 
| 2013-03-15 00:30:39 utc | phaeron | maybe we should upgrade ruote-amqp gem | 
| 2013-03-15 00:30:46 utc | phaeron | I bet our version is out of date | 
| 2013-03-15 00:31:22 utc | jmettraux | ruby 1.8.7 is going unmaintained in july iirc | 
| 2013-03-15 00:31:50 utc | jmettraux | ok, I have to leave | 
| 2013-03-15 00:31:58 utc | phaeron | yeah I am getting sleepy too | 
| 2013-03-15 00:32:05 utc | phaeron | jmettraux: thanks for all the help | 
| 2013-03-15 00:32:32 utc | jmettraux | sorry, didn't do much, just a sherlock partner | 
| 2013-03-15 00:32:42 utc | jmettraux | talk to you later! | 
| 2013-03-15 00:32:50 utc | phaeron | bye :) | 
| 2013-03-15 00:32:57 utc | jmettraux | bye :) | 
| 2013-03-15 08:29:12 utc | phaeron | jmettraux: just a quick update, I have update the bundle ruote, and ruote-amqp gems and their dependencies. I can see there's a lot of improvement in the behavior in general ( both cpu and memory ). | 
| 2013-03-15 08:29:24 utc | phaeron | will provide more details later | 
| 2013-03-15 08:29:27 utc | jmettraux | oh cool :-) | 
| 2013-03-15 08:29:34 utc | phaeron | ah you are awake | 
| 2013-03-15 08:29:45 utc | jmettraux | did the amqp stuff get upgraded as well? | 
| 2013-03-15 08:29:49 utc | phaeron | yes | 
| 2013-03-15 08:30:06 utc | jmettraux | could be in the ruby-amqp maybe | 
| 2013-03-15 08:30:08 utc | phaeron | I let the new ruote-amqp pull in the specific amqp version it likes | 
| 2013-03-15 08:30:27 utc | phaeron | well I did read some relevant points in amqp gem changelog | 
| 2013-03-15 08:30:31 utc | jmettraux | ruby-amqp is very active | 
| 2013-03-15 08:32:00 utc | phaeron | anyway I gtg catch a bus | 
| 2013-03-15 08:32:07 utc | phaeron | will push the new bundle later | 
| 2013-03-15 08:33:08 utc | jmettraux | ok, have a good trip! | 
| 2013-03-15 08:33:10 utc | phaeron | at least I can launch 100 consecutive invocations of the process without the memory increasing beyond 20% | 
| 2013-03-15 08:33:24 utc | jmettraux | excellent! | 
| 2013-03-15 08:33:35 utc | phaeron | oh and upgraded to ruby 1.9 too | 
| 2013-03-15 09:28:37 utc | phaeron | back | 
| 2013-03-15 09:43:56 utc | phaeron | jmettraux: interesting test case. just raise error in the process causes memory increase | 
| 2013-03-15 09:51:31 utc | jmettraux | phaeron: copying the stacktrace probably | 
| 2013-03-15 09:59:24 utc | phaeron | it's not that big a deal | 
| 2013-03-15 09:59:53 utc | phaeron | now I have to fix ruote-kit and package this bundle and deploy it and hope I be happy | 
| 2013-03-15 10:00:17 utc | jmettraux | fix ruote-kit? | 
| 2013-03-15 10:25:26 utc | phaeron | jmettraux: the sinatra ruote-kit wrapper we use , currently it stopped working after upgrading so much stuff | 
| 2013-03-15 10:44:56 utc | jmettraux | ah, understood | 
| 2013-03-15 21:49:03 utc | jmettraux | speaking of memory leaks: http://blog.nelhage.com/2013/03/tracking-an-eventmachine-leak/ | 
| 2013-03-15 21:53:08 utc | ypz | jmettraux good morning | 
| 2013-03-15 21:54:05 utc | ypz | I am reading the doc http://ruote.rubyforge.org/rdoc/Ruote/Dashboard.html on resume, you have a note: Note : this is supposed to be called on paused expressions / instances, this is NOT meant to be called to unstuck / unhang a process. | 
| 2013-03-15 21:54:58 utc | ypz | could you elaborate on the differences between "paused" process and "stuck / hang " processes ? | 
| 2013-03-15 21:58:18 utc | jmettraux | ypz: hello, good afternoon | 
| 2013-03-15 21:58:59 utc | jmettraux | the execution of processes depends on "msgs" (messages), they are like those order sheet in restaurants | 
| 2013-03-15 21:59:22 utc | jmettraux | depending on the storage implementation, one of those msgs could get lost | 
| 2013-03-15 22:00:11 utc | jmettraux | a typical cause of loss would be a worker going down, taking a yet unprocessed msg with him | 
| 2013-03-15 22:00:56 utc | ypz | what state would a process be in when one of its participant has an error ? | 
| 2013-03-15 22:01:09 utc | jmettraux | the customer is like "I ordered a steak, but it's not coming", in fact the order sheet remained in the waiter's pocket as he went back home, new waiter doesn't know anything, kitchen doesn't know anything | 
| 2013-03-15 22:02:09 utc | jmettraux | if the process has no concurrent branches and has an error, the whole process state could be considered "in error" | 
| 2013-03-15 22:02:38 utc | jmettraux | so my waiter story describes processes (or branches of processes) that are "stuck" | 
| 2013-03-15 22:04:27 utc | ypz | once the error condition is eliminated, could I resume the "in error" process from the participant in error by redoing that participant ? | 
| 2013-03-15 22:04:48 utc | jmettraux | yes | 
| 2013-03-15 22:07:25 utc | ypz | I tried dashboard.replay_at_error(err), the participant which had the error get rerun and succeeded, but the process does not continue to next participant. my pdef is just a simply sequence, what did I miss ? | 
| 2013-03-15 22:08:03 utc | jmettraux | I don't know, could you package that in a way that I can take a look? | 
| 2013-03-15 22:08:53 utc | ypz | hm, not easily, I am afraid | 
| 2013-03-15 22:09:08 utc | jmettraux | ok, let me write the gist | 
| 2013-03-15 22:16:04 utc | jmettraux | ypz: here is a basic case: https://gist.github.com/anonymous/5173425 feel free to download it and play with it | 
| 2013-03-15 22:16:07 utc | ypz | I 'll try to get a gist as well | 
| 2013-03-15 22:16:47 utc | jmettraux | does your process terminate or simply gets stuck? | 
| 2013-03-15 22:16:55 utc | jmettraux | (sorry I should have asked immediately) | 
| 2013-03-15 22:20:06 utc | ypz | it was stuck with error = 1 | 
| 2013-03-15 22:20:25 utc | jmettraux | same error as before? | 
| 2013-03-15 22:20:26 utc | ypz | once I ran replay_at_error, error = 0 | 
| 2013-03-15 22:20:32 utc | jmettraux | ah | 
| 2013-03-15 22:20:53 utc | jmettraux | did the process vanish or did it stay, but "stuck"? | 
| 2013-03-15 22:21:15 utc | ypz | hm, I think i throw a route.pause(wfid) to that process at somewhere | 
| 2013-03-15 22:21:57 utc | ypz | tried ruote@resume(wfid) seem to get it going | 
| 2013-03-15 22:22:35 utc | jmettraux | ok | 
| 2013-03-15 22:23:26 utc | ypz | so for "in error" process, => fix error condition and replay_at_error(err) to get it going; for "paused" process => use resume to get it going; am I correct on these ? | 
| 2013-03-15 22:23:38 utc | jmettraux | yes | 
| 2013-03-15 22:23:54 utc | jmettraux | one rule of thumb would be: "never use pause/resume" | 
| 2013-03-15 22:24:11 utc | jmettraux | unless you really really have a use case for it | 
| 2013-03-15 22:24:39 utc | ypz | when would one use "pause" ? | 
| 2013-03-15 22:25:16 utc | jmettraux | I guess to pause a resource intensive workflow | 
| 2013-03-15 22:25:33 utc | jmettraux | but you're never sure of when the pause msg will reach the leaves of the workflow | 
| 2013-03-15 22:25:55 utc | jmettraux | depending on the use cases, it might be easier to implement the pausing/resuming via the participants | 
| 2013-03-15 22:26:14 utc | phaeron | jmettraux: that's pretty hardcode debugging | 
| 2013-03-15 22:27:07 utc | jmettraux | a global system switch telling the participant to pause, the engine would just sit waiting for participants, forced pause... | 
| 2013-03-15 22:27:31 utc | jmettraux | phaeron: yeah, ruby, eventmachine, ruby debugger, gc, ... | 
| 2013-03-15 22:30:07 utc | ypz | in your earlier restaurant example, that "stuck" process has to redo the entire process from beginning, right ? customer had to re-order his steak ! | 
| 2013-03-15 22:31:35 utc | jmettraux | fortunately, most of the time, the order can be retrieved, the technique is packaged in the #respark method: https://github.com/jmettraux/ruote/blob/master/lib/ruote/dashboard.rb#L411-L431 | 
| 2013-03-15 22:35:50 utc | ypz | hey, it's undocumented :D | 
| 2013-03-15 22:36:39 utc | jmettraux | really, lines 411 to 423 are documentation, right? | 
| 2013-03-15 22:37:03 utc | ypz | but nothing about it on http://ruote.rubyforge.org/rdoc/Ruote/Dashboard.html page | 
| 2013-03-15 22:37:11 utc | jmettraux | aah | 
| 2013-03-15 22:37:39 utc | jmettraux | I have to find the time to re-generate that part of the doc | 
| 2013-03-15 22:38:12 utc | ypz | I need to get into the habit of reading the source code more often ! | 
| 2013-03-15 22:39:02 utc | jmettraux | sorry about that, old documentation may be misleading | 
| 2013-03-15 22:40:58 utc | ypz | yea, source code is the ultimate authoritative source | 
| 2013-03-15 22:54:29 utc | jmettraux | http://rdoc.info/github/jmettraux/ruote/ is super old too | 
| 2013-03-15 22:55:38 utc | jmettraux | that rdoc.info site seems stuck in the past |