Track external accumulators in tracer instead of using SparkInfo values#10553
Draft
charlesmyu wants to merge 1 commit intomasterfrom
Draft
Track external accumulators in tracer instead of using SparkInfo values#10553charlesmyu wants to merge 1 commit intomasterfrom
charlesmyu wants to merge 1 commit intomasterfrom
Conversation
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 61 metrics, 10 unstable metrics. Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.60.0-SNAPSHOT~89df516e40, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.067 s) : 0, 1066951
Total [baseline] (8.762 s) : 0, 8762474
Agent [candidate] (1.072 s) : 0, 1071647
Total [candidate] (8.741 s) : 0, 8740901
section iast
Agent [baseline] (1.23 s) : 0, 1230163
Total [baseline] (9.343 s) : 0, 9342596
Agent [candidate] (1.228 s) : 0, 1228426
Total [candidate] (9.409 s) : 0, 9408732
gantt
title insecure-bank - break down per module: candidate=1.60.0-SNAPSHOT~89df516e40, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.205 ms) : 0, 1205
crashtracking [candidate] (1.228 ms) : 0, 1228
BytebuddyAgent [baseline] (629.258 ms) : 0, 629258
BytebuddyAgent [candidate] (631.409 ms) : 0, 631409
AgentMeter [baseline] (29.181 ms) : 0, 29181
AgentMeter [candidate] (29.29 ms) : 0, 29290
GlobalTracer [baseline] (258.085 ms) : 0, 258085
GlobalTracer [candidate] (258.941 ms) : 0, 258941
AppSec [baseline] (33.045 ms) : 0, 33045
AppSec [candidate] (33.313 ms) : 0, 33313
Debugger [baseline] (63.247 ms) : 0, 63247
Debugger [candidate] (63.927 ms) : 0, 63927
Remote Config [baseline] (632.814 µs) : 0, 633
Remote Config [candidate] (623.007 µs) : 0, 623
Telemetry [baseline] (10.81 ms) : 0, 10810
Telemetry [candidate] (11.42 ms) : 0, 11420
Flare Poller [baseline] (5.291 ms) : 0, 5291
Flare Poller [candidate] (5.301 ms) : 0, 5301
section iast
crashtracking [baseline] (1.204 ms) : 0, 1204
crashtracking [candidate] (1.197 ms) : 0, 1197
BytebuddyAgent [baseline] (795.582 ms) : 0, 795582
BytebuddyAgent [candidate] (793.975 ms) : 0, 793975
AgentMeter [baseline] (11.305 ms) : 0, 11305
AgentMeter [candidate] (11.295 ms) : 0, 11295
GlobalTracer [baseline] (248.049 ms) : 0, 248049
GlobalTracer [candidate] (247.322 ms) : 0, 247322
IAST [baseline] (26.96 ms) : 0, 26960
IAST [candidate] (27.093 ms) : 0, 27093
AppSec [baseline] (32.947 ms) : 0, 32947
AppSec [candidate] (32.192 ms) : 0, 32192
Debugger [baseline] (65.705 ms) : 0, 65705
Debugger [candidate] (67.008 ms) : 0, 67008
Remote Config [baseline] (539.985 µs) : 0, 540
Remote Config [candidate] (535.624 µs) : 0, 536
Telemetry [baseline] (8.535 ms) : 0, 8535
Telemetry [candidate] (8.517 ms) : 0, 8517
Flare Poller [baseline] (3.451 ms) : 0, 3451
Flare Poller [candidate] (3.426 ms) : 0, 3426
Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.60.0-SNAPSHOT~89df516e40, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.073 s) : 0, 1072763
Total [baseline] (10.857 s) : 0, 10857190
Agent [candidate] (1.071 s) : 0, 1070515
Total [candidate] (10.843 s) : 0, 10842836
section appsec
Agent [baseline] (1.24 s) : 0, 1240019
Total [baseline] (11.001 s) : 0, 11001153
Agent [candidate] (1.247 s) : 0, 1247063
Total [candidate] (11.093 s) : 0, 11092576
section iast
Agent [baseline] (1.233 s) : 0, 1232679
Total [baseline] (11.264 s) : 0, 11264395
Agent [candidate] (1.234 s) : 0, 1234485
Total [candidate] (11.213 s) : 0, 11212586
section profiling
Agent [baseline] (1.193 s) : 0, 1192820
Total [baseline] (10.939 s) : 0, 10939474
Agent [candidate] (1.206 s) : 0, 1205788
Total [candidate] (11.102 s) : 0, 11101985
gantt
title petclinic - break down per module: candidate=1.60.0-SNAPSHOT~89df516e40, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section tracing
crashtracking [baseline] (1.204 ms) : 0, 1204
crashtracking [candidate] (1.199 ms) : 0, 1199
BytebuddyAgent [baseline] (632.47 ms) : 0, 632470
BytebuddyAgent [candidate] (632.185 ms) : 0, 632185
AgentMeter [baseline] (29.321 ms) : 0, 29321
AgentMeter [candidate] (29.205 ms) : 0, 29205
GlobalTracer [baseline] (259.002 ms) : 0, 259002
GlobalTracer [candidate] (258.783 ms) : 0, 258783
AppSec [baseline] (33.449 ms) : 0, 33449
AppSec [candidate] (33.379 ms) : 0, 33379
Debugger [baseline] (64.287 ms) : 0, 64287
Debugger [candidate] (65.305 ms) : 0, 65305
Remote Config [baseline] (632.504 µs) : 0, 633
Remote Config [candidate] (633.563 µs) : 0, 634
Telemetry [baseline] (10.092 ms) : 0, 10092
Telemetry [candidate] (9.126 ms) : 0, 9126
Flare Poller [baseline] (6.179 ms) : 0, 6179
Flare Poller [candidate] (4.535 ms) : 0, 4535
section appsec
crashtracking [baseline] (1.216 ms) : 0, 1216
crashtracking [candidate] (1.204 ms) : 0, 1204
BytebuddyAgent [baseline] (659.137 ms) : 0, 659137
BytebuddyAgent [candidate] (663.932 ms) : 0, 663932
AgentMeter [baseline] (11.955 ms) : 0, 11955
AgentMeter [candidate] (12.092 ms) : 0, 12092
GlobalTracer [baseline] (258.275 ms) : 0, 258275
GlobalTracer [candidate] (259.725 ms) : 0, 259725
IAST [baseline] (25.477 ms) : 0, 25477
IAST [candidate] (25.381 ms) : 0, 25381
AppSec [baseline] (167.705 ms) : 0, 167705
AppSec [candidate] (167.966 ms) : 0, 167966
Debugger [baseline] (66.509 ms) : 0, 66509
Debugger [candidate] (66.834 ms) : 0, 66834
Remote Config [baseline] (654.903 µs) : 0, 655
Remote Config [candidate] (649.586 µs) : 0, 650
Telemetry [baseline] (9.514 ms) : 0, 9514
Telemetry [candidate] (9.474 ms) : 0, 9474
Flare Poller [baseline] (3.656 ms) : 0, 3656
Flare Poller [candidate] (3.692 ms) : 0, 3692
section iast
crashtracking [baseline] (1.192 ms) : 0, 1192
crashtracking [candidate] (1.189 ms) : 0, 1189
BytebuddyAgent [baseline] (795.68 ms) : 0, 795680
BytebuddyAgent [candidate] (798.147 ms) : 0, 798147
AgentMeter [baseline] (11.281 ms) : 0, 11281
AgentMeter [candidate] (11.31 ms) : 0, 11310
GlobalTracer [baseline] (247.884 ms) : 0, 247884
GlobalTracer [candidate] (248.226 ms) : 0, 248226
IAST [baseline] (27.112 ms) : 0, 27112
IAST [candidate] (27.184 ms) : 0, 27184
AppSec [baseline] (33.202 ms) : 0, 33202
AppSec [candidate] (35.211 ms) : 0, 35211
Debugger [baseline] (67.654 ms) : 0, 67654
Debugger [candidate] (64.62 ms) : 0, 64620
Remote Config [baseline] (566.158 µs) : 0, 566
Remote Config [candidate] (534.236 µs) : 0, 534
Telemetry [baseline] (8.678 ms) : 0, 8678
Telemetry [candidate] (8.615 ms) : 0, 8615
Flare Poller [baseline] (3.483 ms) : 0, 3483
Flare Poller [candidate] (3.498 ms) : 0, 3498
section profiling
crashtracking [baseline] (1.202 ms) : 0, 1202
crashtracking [candidate] (1.197 ms) : 0, 1197
BytebuddyAgent [baseline] (682.637 ms) : 0, 682637
BytebuddyAgent [candidate] (692.428 ms) : 0, 692428
AgentMeter [baseline] (8.554 ms) : 0, 8554
AgentMeter [candidate] (8.627 ms) : 0, 8627
GlobalTracer [baseline] (216.313 ms) : 0, 216313
GlobalTracer [candidate] (218.58 ms) : 0, 218580
AppSec [baseline] (32.645 ms) : 0, 32645
AppSec [candidate] (33.308 ms) : 0, 33308
Debugger [baseline] (67.337 ms) : 0, 67337
Debugger [candidate] (67.997 ms) : 0, 67997
Remote Config [baseline] (631.883 µs) : 0, 632
Remote Config [candidate] (630.575 µs) : 0, 631
Telemetry [baseline] (8.977 ms) : 0, 8977
Telemetry [candidate] (9.002 ms) : 0, 9002
Flare Poller [baseline] (3.739 ms) : 0, 3739
Flare Poller [candidate] (3.738 ms) : 0, 3738
ProfilingAgent [baseline] (100.117 ms) : 0, 100117
ProfilingAgent [candidate] (98.862 ms) : 0, 98862
Profiling [baseline] (100.694 ms) : 0, 100694
Profiling [candidate] (99.436 ms) : 0, 99436
LoadParameters
See matching parameters
SummaryFound 3 performance improvements and 1 performance regressions! Performance is the same for 16 metrics, 16 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~89df516e40, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section baseline
no_agent (18.309 ms) : 18120, 18498
. : milestone, 18309,
appsec (18.697 ms) : 18507, 18887
. : milestone, 18697,
code_origins (17.74 ms) : 17566, 17913
. : milestone, 17740,
iast (17.572 ms) : 17397, 17746
. : milestone, 17572,
profiling (18.868 ms) : 18676, 19061
. : milestone, 18868,
tracing (17.735 ms) : 17560, 17910
. : milestone, 17735,
section candidate
no_agent (19.149 ms) : 18959, 19339
. : milestone, 19149,
appsec (18.765 ms) : 18575, 18956
. : milestone, 18765,
code_origins (17.884 ms) : 17707, 18061
. : milestone, 17884,
iast (17.693 ms) : 17520, 17865
. : milestone, 17693,
profiling (18.581 ms) : 18394, 18769
. : milestone, 18581,
tracing (18.584 ms) : 18400, 18768
. : milestone, 18584,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.60.0-SNAPSHOT~89df516e40, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section baseline
no_agent (1.178 ms) : 1167, 1190
. : milestone, 1178,
iast (3.247 ms) : 3201, 3294
. : milestone, 3247,
iast_FULL (5.788 ms) : 5730, 5846
. : milestone, 5788,
iast_GLOBAL (3.641 ms) : 3584, 3698
. : milestone, 3641,
profiling (2.125 ms) : 2107, 2144
. : milestone, 2125,
tracing (1.779 ms) : 1762, 1795
. : milestone, 1779,
section candidate
no_agent (1.168 ms) : 1157, 1179
. : milestone, 1168,
iast (2.998 ms) : 2962, 3033
. : milestone, 2998,
iast_FULL (5.736 ms) : 5680, 5793
. : milestone, 5736,
iast_GLOBAL (3.463 ms) : 3418, 3508
. : milestone, 3463,
profiling (2.046 ms) : 2025, 2067
. : milestone, 2046,
tracing (1.761 ms) : 1747, 1776
. : milestone, 1761,
DacapoParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 10 metrics, 2 unstable metrics. Execution time for tomcatgantt
title tomcat - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~89df516e40, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section baseline
no_agent (1.473 ms) : 1462, 1485
. : milestone, 1473,
appsec (3.773 ms) : 3552, 3993
. : milestone, 3773,
iast (2.239 ms) : 2170, 2308
. : milestone, 2239,
iast_GLOBAL (2.288 ms) : 2219, 2358
. : milestone, 2288,
profiling (2.088 ms) : 2032, 2144
. : milestone, 2088,
tracing (2.054 ms) : 2000, 2107
. : milestone, 2054,
section candidate
no_agent (1.47 ms) : 1459, 1482
. : milestone, 1470,
appsec (3.717 ms) : 3501, 3934
. : milestone, 3717,
iast (2.241 ms) : 2172, 2310
. : milestone, 2241,
iast_GLOBAL (2.298 ms) : 2228, 2368
. : milestone, 2298,
profiling (2.492 ms) : 2330, 2655
. : milestone, 2492,
tracing (2.059 ms) : 2005, 2112
. : milestone, 2059,
Execution time for biojavagantt
title biojava - execution time [CI 0.99] : candidate=1.60.0-SNAPSHOT~89df516e40, baseline=1.60.0-SNAPSHOT~af8b84438c
dateFormat X
axisFormat %s
section baseline
no_agent (14.707 s) : 14707000, 14707000
. : milestone, 14707000,
appsec (15.128 s) : 15128000, 15128000
. : milestone, 15128000,
iast (18.34 s) : 18340000, 18340000
. : milestone, 18340000,
iast_GLOBAL (17.907 s) : 17907000, 17907000
. : milestone, 17907000,
profiling (14.774 s) : 14774000, 14774000
. : milestone, 14774000,
tracing (14.675 s) : 14675000, 14675000
. : milestone, 14675000,
section candidate
no_agent (15.355 s) : 15355000, 15355000
. : milestone, 15355000,
appsec (14.995 s) : 14995000, 14995000
. : milestone, 14995000,
iast (18.166 s) : 18166000, 18166000
. : milestone, 18166000,
iast_GLOBAL (17.83 s) : 17830000, 17830000
. : milestone, 17830000,
profiling (15.319 s) : 15319000, 15319000
. : milestone, 15319000,
tracing (14.58 s) : 14580000, 14580000
. : milestone, 14580000,
|
4e5bdc7 to
ba09c80
Compare
cde7981 to
e52fbc5
Compare
e52fbc5 to
e413d1d
Compare
e413d1d to
89df516
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What Does This Do
Updates the metrics in the
_dd.spark.sql_planmeta field to use distributions calculated from individual task metrics, rather than the naively summed metrics provided by theStageInfoobjects from Spark. This is becauseStageInfonaively sums all accumulators, even though that may not make sense for certain Spark SQL metrics (e.g. avg hash probes per key for aggr operations). Instead, we should accumulate those ourselves into distribution metrics and emit them accordingly.Currently in the UI, this is only used in one place (in the Spark SQL metrics in the DJM product), so we're not too worried about changing the format here. UI update to follow.
Motivation
We'd like accurate metrics for Spark SQL operations that can reflect task-level characteristics as a distribution. This brings us more in line with what is shown in the Spark UI:

Additional Notes
We can't get rid of the original map that tracks accumulators to stages as we still use that to associate Spark SQL operations to stages. However, we can avoid storing the entire accumulator now, and instead just store a simple map of accumulator ID to stage ID. This will be done in a followup PR: #10645
Contributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issueJira ticket: [PROJ-IDENT]
Note: Once your PR is ready to merge, add it to the merge queue by commenting
/merge./merge -ccancels the queue request./merge -f --reason "reason"skips all merge queue checks; please use this judiciously, as some checks do not run at the PR-level. For more information, see this doc.