Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running multiple agents on a single machine #803

Closed
wants to merge 28 commits into from

Conversation

dupetr
Copy link

@dupetr dupetr commented May 12, 2023

Why this was created

We needed a way to collect JMX metrics from Apache Spark jobs running on AWS EMR.

The way you tell Spark to expose these metrics is

spark submit ...
--conf "spark.executor.extraJavaOptions=-javaagent:./jmx_prometheus_javaagent-0.17.2.jar=<1 specific port>:<config file>"

This works fine as long as you have 1 JVM with 1 agent exposing the metrics as it will open the webserver on a specific port.
When you have 2 executors on 1 machine, one will not start (you cannot start 2 webservers on 1 port) and the job will crash.

The enhancement

  1. You can define a range of ports, in our case we use 39100-39115. Some of the ports are wasted as a headroom. Nevertheless, it needs a continuous block/range of ports.
  2. It will not fail, when it cannot start a webserver. It will try to pick next port in the range. It would fail if your port range have 5 ports, but you would want to start 10 exporters on that machine.

The exporters will compete for resource (port) and they act independently. For this there are backoff times and retries.

On driver/executor stdout it looks like this

Backing off at server start for 1218ms
Looking up free port. Checking: 39100, remaining ports in range: 16
Port 39100 is used. Trying next one.
Looking up free port. Checking: 39101, remaining ports in range: 15
Found free port 39101
Trying to start JMX agent on 39101
Started JMX agent on 39101. (retries left: 16)

fixes #627

cspetrdusak and others added 26 commits January 27, 2023 17:45
Signed-off-by: Petr Dušák <[email protected]>
Signed-off-by: Petr Dušák <[email protected]>
Signed-off-by: Petr Dušák <[email protected]>
Signed-off-by: doxsch <[email protected]>
Signed-off-by: Petr Dušák <[email protected]>
Signed-off-by: doxsch <[email protected]>
Signed-off-by: Petr Dušák <[email protected]>
…tp server. Added tests to validate the correct name and version is not unknown

Signed-off-by: Doug Hoard <[email protected]>
Signed-off-by: Petr Dušák <[email protected]>
Signed-off-by: Petr Dušák <[email protected]>
- Since 77a5b8f, "name" is "Prometheus
  JMX Exporter - Http Server", which break debian package generation
- use artifactId to get a correct name

Signed-off-by: Romain Bouvier <[email protected]>
Signed-off-by: Petr Dušák <[email protected]>
Maintain a lookup index for sample keys (name, labels)
and use that to check for duplicate sample during scraping
instead of O(n) list of samples

Also guard non-trivial computation behind Logger level check

Signed-off-by: Adi Muraru <[email protected]>
Signed-off-by: Petr Dušák <[email protected]>
Signed-off-by: Adi Muraru <[email protected]>
Signed-off-by: Petr Dušák <[email protected]>
Signed-off-by: Fabian Stäber <[email protected]>
Signed-off-by: Petr Dušák <[email protected]>
Signed-off-by: Geoffrey Muselli <[email protected]>
Signed-off-by: Petr Dušák <[email protected]>
Signed-off-by: William Morgan <[email protected]>
Co-authored-by: William Morgan <[email protected]>
Signed-off-by: Petr Dušák <[email protected]>
* Refactored integration test suite

Signed-off-by: dhoard <[email protected]>
* Changed build to use Java 11

Signed-off-by: dhoard <[email protected]>
Signed-off-by: Petr Dušák <[email protected]>
…en running in CircleCI (prometheus#798)

Signed-off-by: dhoard <[email protected]>
Signed-off-by: Petr Dušák <[email protected]>
…re the container was fully started (prometheus#799)

Signed-off-by: dhoard <[email protected]>
Signed-off-by: Petr Dušák <[email protected]>
dupetr and others added 2 commits May 12, 2023 16:37
@dupetr
Copy link
Author

dupetr commented May 15, 2023

Closing in favour of #805. Git got better of me.

@dupetr dupetr closed this May 15, 2023
@dupetr dupetr deleted the multiport-agent branch May 15, 2023 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multiple JVMs per machine with mutliple Ports
9 participants