| 2013-03-15 00:15:05 utc | phaeron | jmettraux: launching this workflow 50 times in a row causes very high cpu usage , and increase of about 10% in memory usage https://gist.github.com/iamer/5166396 |
| 2013-03-15 00:15:25 utc | phaeron | valgrind reports it is all on the heap (nothing reported as leaked) |
| 2013-03-15 00:16:13 utc | phaeron | 10% here is about 50Mb |
| 2013-03-15 00:17:10 utc | phaeron | taking out dumper and trying again |
| 2013-03-15 00:17:41 utc | phaeron | it is of course faster as there is no message passing overhead |
| 2013-03-15 00:18:31 utc | phaeron | but still high cpu usage 90 to 07% |
| 2013-03-15 00:18:34 utc | phaeron | 97 |
| 2013-03-15 00:27:39 utc | jmettraux | phaeron: what is dumper? |
| 2013-03-15 00:28:47 utc | jmettraux | when there is work, the worker tries to do it as quickly as possible, polling the storage for more work |
| 2013-03-15 00:28:49 utc | phaeron | simple ampq script that just prints the workitem |
| 2013-03-15 00:29:20 utc | phaeron | jmettraux: yeah I am not complaining about the cpu , just reporting. problem is this heap growing without reclaiming. |
| 2013-03-15 00:29:41 utc | jmettraux | without the dumper the problem persists? |
| 2013-03-15 00:29:52 utc | phaeron | without the dumper in the process memory grows slower about 2% per 50 invocations |
| 2013-03-15 00:30:12 utc | lbt | ACTION wonders about the amqp side then |
| 2013-03-15 00:30:27 utc | jmettraux | I should re-read about linux and process mem management |
| 2013-03-15 00:30:28 utc | lbt | we're pretty heavy amqp users |
| 2013-03-15 00:30:39 utc | phaeron | maybe we should upgrade ruote-amqp gem |
| 2013-03-15 00:30:46 utc | phaeron | I bet our version is out of date |
| 2013-03-15 00:31:22 utc | jmettraux | ruby 1.8.7 is going unmaintained in july iirc |
| 2013-03-15 00:31:50 utc | jmettraux | ok, I have to leave |
| 2013-03-15 00:31:58 utc | phaeron | yeah I am getting sleepy too |
| 2013-03-15 00:32:05 utc | phaeron | jmettraux: thanks for all the help |
| 2013-03-15 00:32:32 utc | jmettraux | sorry, didn't do much, just a sherlock partner |
| 2013-03-15 00:32:42 utc | jmettraux | talk to you later! |
| 2013-03-15 00:32:50 utc | phaeron | bye :) |
| 2013-03-15 00:32:57 utc | jmettraux | bye :) |
| 2013-03-15 08:29:12 utc | phaeron | jmettraux: just a quick update, I have update the bundle ruote, and ruote-amqp gems and their dependencies. I can see there's a lot of improvement in the behavior in general ( both cpu and memory ). |
| 2013-03-15 08:29:24 utc | phaeron | will provide more details later |
| 2013-03-15 08:29:27 utc | jmettraux | oh cool :-) |
| 2013-03-15 08:29:34 utc | phaeron | ah you are awake |
| 2013-03-15 08:29:45 utc | jmettraux | did the amqp stuff get upgraded as well? |
| 2013-03-15 08:29:49 utc | phaeron | yes |
| 2013-03-15 08:30:06 utc | jmettraux | could be in the ruby-amqp maybe |
| 2013-03-15 08:30:08 utc | phaeron | I let the new ruote-amqp pull in the specific amqp version it likes |
| 2013-03-15 08:30:27 utc | phaeron | well I did read some relevant points in amqp gem changelog |
| 2013-03-15 08:30:31 utc | jmettraux | ruby-amqp is very active |
| 2013-03-15 08:32:00 utc | phaeron | anyway I gtg catch a bus |
| 2013-03-15 08:32:07 utc | phaeron | will push the new bundle later |
| 2013-03-15 08:33:08 utc | jmettraux | ok, have a good trip! |
| 2013-03-15 08:33:10 utc | phaeron | at least I can launch 100 consecutive invocations of the process without the memory increasing beyond 20% |
| 2013-03-15 08:33:24 utc | jmettraux | excellent! |
| 2013-03-15 08:33:35 utc | phaeron | oh and upgraded to ruby 1.9 too |
| 2013-03-15 09:28:37 utc | phaeron | back |
| 2013-03-15 09:43:56 utc | phaeron | jmettraux: interesting test case. just raise error in the process causes memory increase |
| 2013-03-15 09:51:31 utc | jmettraux | phaeron: copying the stacktrace probably |
| 2013-03-15 09:59:24 utc | phaeron | it's not that big a deal |
| 2013-03-15 09:59:53 utc | phaeron | now I have to fix ruote-kit and package this bundle and deploy it and hope I be happy |
| 2013-03-15 10:00:17 utc | jmettraux | fix ruote-kit? |
| 2013-03-15 10:25:26 utc | phaeron | jmettraux: the sinatra ruote-kit wrapper we use , currently it stopped working after upgrading so much stuff |
| 2013-03-15 10:44:56 utc | jmettraux | ah, understood |
| 2013-03-15 21:49:03 utc | jmettraux | speaking of memory leaks: http://blog.nelhage.com/2013/03/tracking-an-eventmachine-leak/ |
| 2013-03-15 21:53:08 utc | ypz | jmettraux good morning |
| 2013-03-15 21:54:05 utc | ypz | I am reading the doc http://ruote.rubyforge.org/rdoc/Ruote/Dashboard.html on resume, you have a note: Note : this is supposed to be called on paused expressions / instances, this is NOT meant to be called to unstuck / unhang a process. |
| 2013-03-15 21:54:58 utc | ypz | could you elaborate on the differences between "paused" process and "stuck / hang " processes ? |
| 2013-03-15 21:58:18 utc | jmettraux | ypz: hello, good afternoon |
| 2013-03-15 21:58:59 utc | jmettraux | the execution of processes depends on "msgs" (messages), they are like those order sheet in restaurants |
| 2013-03-15 21:59:22 utc | jmettraux | depending on the storage implementation, one of those msgs could get lost |
| 2013-03-15 22:00:11 utc | jmettraux | a typical cause of loss would be a worker going down, taking a yet unprocessed msg with him |
| 2013-03-15 22:00:56 utc | ypz | what state would a process be in when one of its participant has an error ? |
| 2013-03-15 22:01:09 utc | jmettraux | the customer is like "I ordered a steak, but it's not coming", in fact the order sheet remained in the waiter's pocket as he went back home, new waiter doesn't know anything, kitchen doesn't know anything |
| 2013-03-15 22:02:09 utc | jmettraux | if the process has no concurrent branches and has an error, the whole process state could be considered "in error" |
| 2013-03-15 22:02:38 utc | jmettraux | so my waiter story describes processes (or branches of processes) that are "stuck" |
| 2013-03-15 22:04:27 utc | ypz | once the error condition is eliminated, could I resume the "in error" process from the participant in error by redoing that participant ? |
| 2013-03-15 22:04:48 utc | jmettraux | yes |
| 2013-03-15 22:07:25 utc | ypz | I tried dashboard.replay_at_error(err), the participant which had the error get rerun and succeeded, but the process does not continue to next participant. my pdef is just a simply sequence, what did I miss ? |
| 2013-03-15 22:08:03 utc | jmettraux | I don't know, could you package that in a way that I can take a look? |
| 2013-03-15 22:08:53 utc | ypz | hm, not easily, I am afraid |
| 2013-03-15 22:09:08 utc | jmettraux | ok, let me write the gist |
| 2013-03-15 22:16:04 utc | jmettraux | ypz: here is a basic case: https://gist.github.com/anonymous/5173425 feel free to download it and play with it |
| 2013-03-15 22:16:07 utc | ypz | I 'll try to get a gist as well |
| 2013-03-15 22:16:47 utc | jmettraux | does your process terminate or simply gets stuck? |
| 2013-03-15 22:16:55 utc | jmettraux | (sorry I should have asked immediately) |
| 2013-03-15 22:20:06 utc | ypz | it was stuck with error = 1 |
| 2013-03-15 22:20:25 utc | jmettraux | same error as before? |
| 2013-03-15 22:20:26 utc | ypz | once I ran replay_at_error, error = 0 |
| 2013-03-15 22:20:32 utc | jmettraux | ah |
| 2013-03-15 22:20:53 utc | jmettraux | did the process vanish or did it stay, but "stuck"? |
| 2013-03-15 22:21:15 utc | ypz | hm, I think i throw a route.pause(wfid) to that process at somewhere |
| 2013-03-15 22:21:57 utc | ypz | tried ruote@resume(wfid) seem to get it going |
| 2013-03-15 22:22:35 utc | jmettraux | ok |
| 2013-03-15 22:23:26 utc | ypz | so for "in error" process, => fix error condition and replay_at_error(err) to get it going; for "paused" process => use resume to get it going; am I correct on these ? |
| 2013-03-15 22:23:38 utc | jmettraux | yes |
| 2013-03-15 22:23:54 utc | jmettraux | one rule of thumb would be: "never use pause/resume" |
| 2013-03-15 22:24:11 utc | jmettraux | unless you really really have a use case for it |
| 2013-03-15 22:24:39 utc | ypz | when would one use "pause" ? |
| 2013-03-15 22:25:16 utc | jmettraux | I guess to pause a resource intensive workflow |
| 2013-03-15 22:25:33 utc | jmettraux | but you're never sure of when the pause msg will reach the leaves of the workflow |
| 2013-03-15 22:25:55 utc | jmettraux | depending on the use cases, it might be easier to implement the pausing/resuming via the participants |
| 2013-03-15 22:26:14 utc | phaeron | jmettraux: that's pretty hardcode debugging |
| 2013-03-15 22:27:07 utc | jmettraux | a global system switch telling the participant to pause, the engine would just sit waiting for participants, forced pause... |
| 2013-03-15 22:27:31 utc | jmettraux | phaeron: yeah, ruby, eventmachine, ruby debugger, gc, ... |
| 2013-03-15 22:30:07 utc | ypz | in your earlier restaurant example, that "stuck" process has to redo the entire process from beginning, right ? customer had to re-order his steak ! |
| 2013-03-15 22:31:35 utc | jmettraux | fortunately, most of the time, the order can be retrieved, the technique is packaged in the #respark method: https://github.com/jmettraux/ruote/blob/master/lib/ruote/dashboard.rb#L411-L431 |
| 2013-03-15 22:35:50 utc | ypz | hey, it's undocumented :D |
| 2013-03-15 22:36:39 utc | jmettraux | really, lines 411 to 423 are documentation, right? |
| 2013-03-15 22:37:03 utc | ypz | but nothing about it on http://ruote.rubyforge.org/rdoc/Ruote/Dashboard.html page |
| 2013-03-15 22:37:11 utc | jmettraux | aah |
| 2013-03-15 22:37:39 utc | jmettraux | I have to find the time to re-generate that part of the doc |
| 2013-03-15 22:38:12 utc | ypz | I need to get into the habit of reading the source code more often ! |
| 2013-03-15 22:39:02 utc | jmettraux | sorry about that, old documentation may be misleading |
| 2013-03-15 22:40:58 utc | ypz | yea, source code is the ultimate authoritative source |
| 2013-03-15 22:54:29 utc | jmettraux | http://rdoc.info/github/jmettraux/ruote/ is super old too |
| 2013-03-15 22:55:38 utc | jmettraux | that rdoc.info site seems stuck in the past |