| Commit message (Collapse) | Author |
|
|
|
|
|
|
|
|
|
Instead of the weird `struct msg` I had, I switched to the JSON-RPC
format. It's basically the same, but has a well-defined semantics in
case of errors.
|
|
|
|
I don't know what I was thinking, but contrary to my intention, the
worker stayed connected to the server all the time.
|
|
This is a major change, obviously; brought to me by Valgrind, which
noticed that we don't actually clean up after cimple-client threads.
For a more thorough explanation, please see the added comment in
tcp_server.c.
|
|
Thanks, Valgrind! As a note: if I think that Valgrind reports a false
positive, chances are, it's not.
|
|
|
|
|
|
Also, use cmake's configure_file to build string constants in.
|
|
|
|
|
|
|
|
|
|
Is this an overkill? I don't know.
The thing is, correctly intercepting SIGTERM (also SIGINT, etc.) is
incredibly tricky. For example, before this commit, my I/O loops in
server.c and worker.c were inherently racy.
This was immediately obvious if you tried to run the tests. The tests
(especially the Valgrind flavour) would run a worker, wait until it
prints a "Waiting for a new command" line, and try to kill it using
SIGTERM. The problem is, the global_stop_flag check could have already
been executed by the worker, and it would hang forever in recv().
The solution seems to be to use signalfd and select()/poll(). I've never
used either before, but it seems to work well enough - at least the very
same tests pass and don't hang now.
|
|
|
|
|
|
|
|
|
|
|
|
OK, this is a major rework.
* tcp_server: connection threads are not detached anymore, the caller has
to clean them up. This was done so that the server can clean up the
threads cleanly.
* run_queue: simple refactoring, run_queue_entry is called just run now.
* server: worker threads are now killed when a run is assigned to a
worker.
* worker: the connection to server is no longer persistent. A worker
sends "new-worker", waits for a task, closes the connection, and when
it's done, sends the "complete" message and waits for a new task.
This is supposed to improve resilience, since the worker-server
connections don't have to be maintained while the worker is doing a CI
run.
|
|
|
|
|
|
Also, some minor refactoring.
|
|
|
|
* I don't really need to declare all variables at the top of the
function anymore.
* Default-initialize variables more.
* Don't set the output parameter until the object is completely
constructed.
|
|
|
|
|
|
|
|
Explicit is better than implicit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Found when running in Docker.
|
|
Previously, the client had no way to distinguish errors from succesful
calls.
|
|
|
|
Previously, I had a stupid system where I would create a thread after
every accept(), and put worker descriptors in a queue. A special
"scheduler" thread would then pick them out, and give out jobs to
complete.
The problem was, of course, I couldn't conveniently poll job status from
workers. I thought about using poll(), but that turned out to be a
horribly complicated API. How do I deal with partial reads, for example?
I don't honestly know.
Then it hit me that I could just use the threads that handle accept()ed
connections as "worker threads", which would synchronously schedule jobs
and wait for them to complete. This solves every problem and removes the
need for a lot of inter-thread synchronization magic. It even works now,
holy crap! You can launch and terminate workers at will, and they will
pick up new jobs automatically.
As a side not, msg_recv_and_handle turned out to be too limiting and
complicated for me, so I got rid of that, and do normal
msg_recv/msg_send calls.
|
|
|
|
pthread functions return positive error codes.
|
|
Well, maybe "graceful" is a strong word, but now you _can_ do
./server &
./worker &
./client ci_run URL REV && kill "$( pidof worker )"
and the worker will wait for the CI run to complete.
|
|
|
|
This adds a basic "worker" program.
You can now do something like
./server &
./worker &
./client ci_run URL REV
and the server should pass a message to worker, after which it should
clone the repository at URL, checkout REV, and try to run the CI script.
It's extremely unfinished: I need to sort out the graceful shutdown, how
the server manages workers, etc.
|