tmm
19 TopicsExplicit write control for iRules subtables
Note to the reader...apparently what is old is new again. There are some threads here on DevCentral that have already solved for this, albeit in different ways. The few brought to my attention by MVP Kai_Wilke are included in the list below for your benefit to read through. That said, the journey of discovery here in this article is worth your time to understand the nuances of how data is passed in a multi-TMM system. Dealing with iRule $variables for HTTP2 workload while HTTP MRF Router is enabled | DevCentral https://github.com/KaiWilke/F5-iRule-RADIUS-Server-Stack SPDY/HTTP2 Profile Impact on Variable Use | DevCentral The TL;DR TMM subtables on BIG-IP are partitioned across TMMs by hashing the subtable name. Writing to a subtable from a non-owner TMM is roughly 1000x slower than writing from the owner...single-digit clock clicks vs. tens of thousands. If you want fast per-TMM local storage, you cannot pick the subtable name yourself; you have to *discover* a locally-owned name by timing trial writes. Deterministic naming schemes do not work, even when they look obviously correct. The Problem A colleague had an iRule that maintained per-connection state across many CLIENT_DATA events. The natural data structure was a TMM session subtable. His quick experimenting showed the writes were slow enough to push the system CPU under modest load and needed to understand why before scaling further. There's an example proc library from Nat_Thirasuttakorn "LOCALDB" that uses a clever timing trick: it generates a random subtable name, times a probe write, and only keeps the name if the write completes under some threshold (50 clock clicks in the original). The implication was that most random names produce slow writes and only a few are fast. I read the code, figured I understood it, and rewrote it "cleanly" using deterministic per-TMM names: `localdb_tmm_0`, `localdb_tmm_1`, `localdb_tmm_2`, ... one per TMM, no probing required. Each TMM would write only to its own name. Done, right? Wrong. The diagram above is the mental model the rest of this post leans on. Two independent hashes are happening: the DAG hashes the inbound 4-tuple to choose which TMM accepts the connection, and TMOS separately hashes the subtable name to choose which TMM *owns* the storage for that name. A write succeeds only when both hashes agree; when the TMM that received the connection is also the owner of the subtable being written to. When they disagree, the write costs roughly 7000x more. The Investigation The deterministic version "worked" — writes succeeded, distribution looked plausible, throughput was decent. Then I added timing instrumentation per TMM and looked at the percentiles: TMM samples min avg max 0 74 121 64855.6 229089 1 34 136 71536.3 236204 2 38 121 88516.9 293259 3 62 3 13.3 25 TMM 3 was writing in 3-25 clicks. Every other TMM was averaging tens of thousands, which is a 5,000-7,000x gap! Something was very wrong. The diagnosis came from a `/probe` endpoint I'd added for unrelated reasons: hit the same subtable name from many connections, time each write, count which TMM responds fast. Probing each of the four "deterministic" names produced: localdb_tmm_0 → owner is TMM 2 localdb_tmm_1 → owner is TMM 2 localdb_tmm_2 → owner is TMM 3 localdb_tmm_3 → owner is TMM 3 Visualizing the result for one of those probes makes the signal unambiguous: Two of the four names hashed to TMM 2, the other two hashed to TMM 3. TMMs 0 and 1 didn't own any of the subtables I'd "assigned" to them. This is the key insight: **the subtable name `localdb_tmm_3` doesn't get owned by TMM 3 just because its name ends in 3.** TMOS hashes the whole name string and assigns ownership based on that hash. The hash is opaque, and it's stable, but it has no relationship to the content of the name. My deterministic scheme was generating four unique names, which guaranteed no key collisions across TMMs — but it didn't guarantee, and couldn't guarantee, that name N landed on TMM N. Why The Original Trick Was Right Going back to the LOCALDB proc library pattern from DevCentral: while { $try < $maxtry } { set name [expr rand()] set before [clock clicks] table set -subtable $name test_$name $name 5 set after [clock clicks] set diff [expr {$after - $before}] if { $diff < $maxdiff } { break } incr try } Generate a random name. Probe it. If it's fast, keep it; if not, throw it away and try another. Each TMM independently does this, and on average needs ~N tries on an N-TMM system to find a name it owns. The probe is the *only* reliable way to know. The randomness is load-bearing. The timing measurement is load-bearing. Neither is decorative. My "elegant" rewrite removed both and produced a system that looked fine but was burning 99% of its potential throughput shipping writes between TMMs. How to Verify A timing histogram per TMM is the diagnostic. The test workflow: Add a `/probe?name=X` endpoint that times a single `table set` against an arbitrary subtable name and reports clicks + the responding TMM Hit it many times from a multi-threaded client Aggregate per-TMM: hits, OWNER count (writes under threshold), NON_OWNER count, min/avg/max clicks The owner of name X will show up as ~all-OWNER with consistently low clicks; everyone else shows ~all-NON_OWNER with high clicks A handful of stray "OWNER" tags on non-owners is just noisy variance in `clock clicks` measurement. The real signal is overwhelming: 50+ OWNER tags vs 0-3 OWNER tags, and average clicks differing by 1000-10000x. Lessons About TMM Subtables A few things worth internalizing if you work with these: Names are global; storage is partitioned Two TMMs writing the same name reach the same logical subtable, but only the owner stores it locally. Non-owners pay an inter-TMM coordination tax on every operation. This is fundamentally a sharding scheme where the shard key is the subtable name and the shard map is hidden from you. Construction can't replace discovery Anywhere a system uses an opaque hash to assign ownership of named resources, you cannot construct a locally-owned name, you can only find one by trying. This pattern shows up well beyond TMOS: Cassandra token ranges, Redis Cluster slots, Kafka partition assignments, consistent-hashing rings in general. Discovery beats construction whenever the mapping function is hidden. O(n) reads in hot paths kill throughput I had a `count` proc that called `table keys -subtable X` and ran `llength` on the result. With per-TMM subtables of ~25k entries, that's 25k strings to enumerate per request. Throughput decayed from 3300/s to 600/s over a 40k-record run, a perfect 1/n curve. Maintaining the count incrementally in a `static::` variable made it O(1) and throughput stayed flat. The fix is obvious in hindsight; the bug is invisible without per-second throughput measurement. Static variables are per-TMM This is great when you want it (per-TMM owned-subtable name, per-TMM counters) and confusing when you don't (you can't share state across TMMs through statics alone). The variables are also persistent across rule reloads in some versions, which means a rule update that adds a new static can leave you with TMMs running the new code but missing the new state. Defensive existence checks at the top of every proc are worthwhile. Sampling debug logs is mandatory at scale Logging every write to `/var/log/ltm` for a million-record load is 1M log lines, hundreds of MB, and enough log I/O to tank throughput on its own. Sample 1-in-N (where N grows with load size), and gate calling-rule logs on the same sample point so the log narrative stays coherent. A `should_log` helper proc shared between the library and its callers keeps this clean. Test harnesses should reset, not reload I initially "reset" between runs by reloading the iRule. `RULE_INIT` re-ran and statics reset, but the *subtable contents* persisted in TMM session memory because they're indexed by name, not by rule. Each rule reload picked a new random name and orphaned the old subtable's entries. Over many runs, memory accumulated. A `/reset` endpoint that walks `table keys` and deletes them is the right abstraction. What "Done" Looked Like After the fix, a 100k-record run on a 4-TMM system: TMM samples min avg max 0 98 3 17.4 71 1 101 4 18.9 88 2 99 3 16.8 77 3 102 4 19.1 91 Throughput stayed flat at ~3000/s for the entire run. Every TMM in the same low-clicks range. No `SLOW` tags in the sampled logs. The before-and-after chart (log scale) makes the impact unmistakable: TMM 3 is interesting on its own. Under the broken design it was already fast (averaging 13.3 clicks) because the deterministic names happened to hash to it, meaning every other TMM was ferrying its writes over to TMM 3. Under the fix, TMM 3 stops being a single hot point and instead does roughly the same work as everyone else, on its own subtable. The fact that TMM 3's "broken" bar isn't dramatically taller is what makes this kind of bug survive a smoke test: writes were succeeding, throughput looked plausible, *one* TMM was even fast. The percentile breakdown is what gave it away. The Validated Test Session Here is the actual end-to-end verification run, command by command, on a 4-TMM lab BIG-IP. This is the workflow that I ended up codifying in the project's `USAGE.md` — it both validates that the fix works and demonstrates each tool's role. Step 1: Verify Every TMM Picked a Unique Subtable After deploying the LOCALDB rule and the calling rule, hit `/whoami` enough times that fresh TCP connections fan out across all TMMs: $ for i in $(seq 1 30); do curl -s http://10.0.2.49/whoami; done | sort -u tmm 0 subtable localdb_tmm_0_865802 total_tmms 4 writes 0 entries 0 tmm 1 subtable localdb_tmm_1_922743 total_tmms 4 writes 0 entries 0 tmm 2 subtable localdb_tmm_2_5946 total_tmms 4 writes 0 entries 0 tmm 3 subtable localdb_tmm_3_441563 total_tmms 4 writes 0 entries 0 Four things to read out of this: Four unique TMMs (0, 1, 2, 3) responded meaning full coverage. With `Connection: close` from curl, each request gets a fresh ephemeral source port and the BIG-IP's DAG re-hashes; 30 requests against 4 TMMs is essentially guaranteed to hit all of them. Four unique subtable names, each with the responding TMM number as a prefix and a random suffix. The TMM-number prefix is just a label for human readability. The random suffix is what `init_table` actually iterates on during timing-probe discovery, throwing away names that hash to other TMMs and keeping the first one whose write completes under the threshold. `total_tmms=4` is consistent on every row. `TMM::cmp_count` is reporting the cluster size correctly. writes=0 entries=0` everywhere. Clean baseline before any load. Step 2: Reset to a Clean Baseline $ python tbl-loader.py reset --host 10.0.2.49 --port 80 Discovering TMM count from 10.0.2.49:80/info ... BIG-IP reports 4 TMMs. Sending 200 /reset requests with 32 workers... Reset summary: TMM hits first_deleted total_deleted ------------------------------------------ 0 50 0 0 1 47 0 0 2 55 0 0 3 48 0 0 All 4 TMMs cleared. Total entries removed (first-hit): 0 200 reset requests, 50 / 47 / 55 / 48 distribution across the four TMMs. That's essentially perfect uniform. Expected mean is 50, observed range is 47-55, which is well within the natural variance of a fair hash. Worth confirming because the same DAG is what'll spread the load run; uneven reset distribution would predict uneven load distribution, which complicates the analysis. `first_deleted=0` everywhere because the previous step's `whoami` had already shown empty subtables. After a load run, this column tells you exactly how many entries each TMM was holding. Step 3: Run the Load $ python tbl-loader.py load --host 10.0.2.49 --port 80 --count 100000 --workers 64 ... completed=100,000/100,000 (100.0%) rate=4376/s coverage=4/4 missing=[] errors=0 Done. completed=100,000 errors=0 elapsed=22.9s rate=4375/s Final distribution: tmm 0: 25,198 writes (25.20%) tmm 1: 24,782 writes (24.78%) tmm 2: 24,914 writes (24.91%) tmm 3: 25,106 writes (25.11%) Three numbers worth lingering on: Sustained 4,375/s throughput, completely flat Earlier in the project, before the O(1) `count` fix, the equivalent run started at 3,300/s and decayed to 600/s by the 40k-record mark, a perfect 1/n curve from the hidden `table keys` + `llength` cost in the calling rule. With `static::LOCALDB_entries` maintained incrementally, the per-write work is genuinely constant and throughput stays where it starts. Distribution within ±0.25% of perfect uniform 25.20% / 24.78% / 24.91% / 25.11% is what fair hashing produces over 100k samples. The DAG is doing its job; nothing is being funneled through one TMM the way the broken-locality version was. Zero errors over 100k fresh TCP connections No TIME_WAIT exhaustion on the client (the ephemeral port range is wide enough), no rate limiting on the BIG-IP, no socket timeouts. Suggests the workload is well within both ends' capacity. The 22.9 second elapsed time works out to ~5 microseconds per write end-to-end, including the full TCP setup/teardown for each request. The actual `table set` is in the tens of clock clicks (single-digit microseconds), so HTTP and TCP overhead dominate, which is the right answer when the iRule work itself is fast and local. Step 4: Verify Per-TMM Locality from the Logs The throughput and distribution numbers tell us writes are happening evenly, but they don't directly prove each write is *local*. For that, pull the sampled timing lines from the BIG-IP's log and run them through the analyzer. Filter to the test window so earlier (broken) runs don't pollute the stats: $ ssh [email protected] "grep '^May 6 16' /var/log/ltm | grep 'sampled'" \ | python3 timing_stats.py Sample rate: 1/1000 Locality threshold: 100 clicks TMM n FAST SLOW min p50 avg p95 p99 max ------------------------------------------------------------------------------ 0 25 25 0 3 5 5.5 10 11 11 1 24 24 0 3 5 6.1 11 18 18 2 24 24 0 2 6 6.1 10 11 11 3 25 25 0 2 6 6.5 12 13 13 ------------------------------------------------------------------------------ Total: 98 samples across 4 TMMs FAST_LOCAL=98 SLOW=0 OK: all TMMs have average write timing below 100 clicks. Per-TMM locality is working. This is the centerpiece of the validation. Reading it line by line: Sample counts 25 / 24 / 24 / 25 samples per TMM matches the 25.20% / 24.78% / 24.91% / 25.11% write distribution from the load output, which is what you'd expect if the BIG-IP is logging 1-in-1000 of all writes uniformly. Timing Single-digit minimums (2-3 clicks). Averages of 5.5-6.5 clicks. p99s of 11-18. Max of 18 across all 98 samples. Compare to the broken run earlier in the project (shown at the top of the article in the investigation section), on the same hardware with the same workload but the wrong `init_table`. That's a **10,000x improvement on three of the four TMMs** between the two runs. The only thing that changed was `init_table` switching from deterministic naming to timing-probe discovery. Tag tally 98 FAST_LOCAL, 0 SLOW. Not a single sampled write missed the locality threshold. The 100-click threshold has plenty of headroom, the actual max was 18, an order of magnitude below. Verdict The script's automated check confirms locality is working. This is the line you'd grep for in CI if you wanted regression coverage. Step 5: Spot-Check Ownership of a Discovered Name The timing report proves writes were fast, but it doesn't prove that the *names* each TMM picked are actually owned by those TMMs (only that their writes were fast for whatever reason). To close that gap, take one of the names from `whoami` and probe it directly: $ python tbl-loader.py probe --host 10.0.2.49 --port 80 --name localdb_tmm_2_5946 --requests 200 ... Results for subtable 'localdb_tmm_2_5946': TMM hits OWNER NON_OWNER min_clicks avg_clicks max_clicks ---------------------------------------------------------------- 0 55 0 55 286 5139.9 19814 1 70 0 70 127 12475.3 52544 2 8 8 0 3 8.6 20 3 67 0 67 238 7126.6 51939 Likely owner of subtable 'localdb_tmm_2_5946': TMM 2 (avg 8.6 clicks, tagged OWNER 8 times) This is unambiguous: TMM 2 wrote in 3-20 clicks, average 8.6 Consistent with the 6.1 average from `timing_stats.py` during the load. Small differences, both well under threshold, both unambiguously local. TMMs 0, 1, 3 took 127-52,544 clicks, averages 5,139 / 12,475 / 7,126 Roughly 600x to 1,500x slower than TMM 2 on the same operation. They're paying the inter-TMM coordination tax because the subtable is owned by TMM 2. Zero stray OWNER tags on non-owning TMMs Earlier probe runs against fresh subtables sometimes had 1-3 stray OWNER tags from non-owners due to `clock clicks` jitter on small subtables. With this subtable now containing ~25k entries, the non-owner penalty is large enough (mins of 127-286 clicks) that no stray write made it under the 100-click threshold. The bigger the subtable, the cleaner the signal. TMM 2 only got 8 hits That's just sampling variance. The DAG hashed inbound connections 55 / 70 / 8 / 67, which over 200 requests is a normal-looking spread. With 1000 requests you'd see ~250 hits per TMM. The 8 hits TMM 2 did get were unanimous on OWNER, which is what matters. A run against any of the other discovered names (`localdb_tmm_0_865802`, `localdb_tmm_1_922743`, `localdb_tmm_3_441563`) produces the same shape of result with the corresponding TMM as owner. What This Validates Step 1 proves every TMM ran `init_table` and picked a unique name. Step 2 proves clean baseline and even DAG distribution. Step 3 proves throughput is sustained and writes spread evenly across TMMs at scale. Step 4 proves every write was fast at the time it happened. Step 5 proves the names each TMM picked are genuinely owned by those TMMs. Together they're a complete proof of the design: the timing-probe discovery in `init_table` correctly identifies a locally-owned subtable name on each TMM, and operations against those names cost ~10 clock clicks instead of ~70,000. The cost gap is the entire reason the per-TMM-subtable pattern exists, and it's now empirically demonstrated end-to-end. This validation run took maybe three minutes of wall time. It's the kind of verification I should have been running before believing the original "deterministic naming" rewrite worked, not after watching it fail under load. Pushing Throughput: Per-Write to Bulk-POST The validated workflow above writes one key per HTTP request. That's the right shape for testing locality (each write is a clean, isolated trial), but it makes TCP connection setup the dominant cost. At ~4,375 writes per second on a 4-TMM box, the iRule is spending most of its time accepting connections, parsing headers, and tearing down sockets, not writing to subtables. The natural next step is to batch many writes into a single HTTP request. A separate `/bulk_load` endpoint accepts a POST body of newline-separated keys (UUIDs in our test case), collects the body via `HTTP::collect`, and walks the lines in a tight loop calling `LOCALDB::set_unique` on each. One TCP connection now writes 15,625 keys instead of one. Per-batch timing comes back in the response so the loader can aggregate it client-side. The throughput result is striking: Same hardware, same iRule logic, same per-TMM locality — the 30× gap is purely TCP setup cost saved. The per-write timing inside the iRule barely changed (3-6 clicks per `LOCALDB::set_unique` either way), but the request-level overhead collapsed because we stopped paying it 1M times. A few things worth noting about this bulk path that aren't obvious: Locality holds inside the loop A `/bulk_load` request that lands on TMM 2 will do all 15,625 of its writes against TMM 2's local subtable. There's no opportunity for a single batch to "leak" writes to other TMMs, because the connection is pinned to one TMM by DAG and the subtable name is fixed by `static::LOCALDB_name`. So the locality verdict from the per-write test carries over without needing re-verification and the loader's per-batch `clicks_per_write` measurement confirms it stays in the 3-6 click range. DAG fan-out still distributes work With 64 fresh POSTs, each gets its own ephemeral source port, so the DAG hashes them across TMMs the same way it did with single-write requests. After enough batches, the per-TMM POST counts converge. In one of the runs, 4 TMMs each took exactly 16 of 64 POSTs. Body size matters for HTTP::collect The `/bulk_load` handler reads `Content-Length` and calls `HTTP::collect $cl` to buffer the entire body before processing. We cap at 16 MiB to protect TMM memory; that's plenty of headroom (~400k UUIDs per batch) but it's a real ceiling worth knowing about. The default of 15,625 UUIDs is ~580 KiB, which is well within bounds. An aside: log volume kills throughput at this rate Our first three bulk-post runs showed throughput drifting downward across consecutive runs...163k/s, then 129k/s, then 122k/s on the same hardware with no other state changes between them. The cause turned out to be the calling rule's logging itself. The `/bulk_load` and `/reset` handlers each had unconditional `log local0.` statements, producing 64 + 200 = 264 syslog writes per test cycle on top of the LOCALDB sample logs. After silencing those handlers (the response bodies already carried the per-batch timing data, so we lost no visibility), runs stabilized at ~133k writes/s ± 4% and survived 60-second sleeps with no warmup penalty. The lesson generalizes: at high write rates, the rule path needs to be quiet, not just "not chatty." Even gated log statements run their gate evaluation on every request, and unconditional ones write to syslog regardless of intent. When the per-write iRule cost is in the single-digit microseconds, *any* per-request work shows up. The rule of thumb that emerged: log statements that fire once per HTTP request are fine for diagnostics (`/probe`, `/whoami`) but should be sampled or removed entirely from the hot path (`/load`, `/bulk_load`, `/reset`). The loader can carry timing data back in response bodies and aggregate it client-side, which is both faster and more useful for analysis. Worth flagging that the absolute throughput numbers here (130-160k writes/s) reflect the test environment: a BIG-IP VE running on an Intel NUC under VMware, sharing the host with the load generator and other VMs. Those are not headroom numbers; they're contention-dominated. A 16-vCPU appliance without that contention should comfortably scale 5-10× from these figures, putting bulk-load throughput into the millions of writes per second on real hardware. The Code The updated `LOCALDB.tcl`, the test harness `subtable_test_updates.tcl`, the Python loader/prober/timing-analyzer, and the USAGE.md are all in the irules-subtable-discovery repo out on Github. Two key bits to study: The `init_table` proc that does the timing-probe discovery, including the fallback path that logs a WARNING and uses a slow name rather than failing silently when discovery exhausts its tries. The 200-try ceiling is sized for 16+ TMMs; on a 4-TMM box you'll typically find a local name in 1-3 tries. The `/probe` endpoint and the loader's `probe` mode. Together they let you take any subtable name and identify which TMM owns it in seconds. Worth keeping in your toolkit; it's the cleanest way I've found to interrogate TMOS's hash assignments. Closing Thoughts The whole episode reinforced something I keep relearning: when a working pattern looks weirdly complicated, the complications are usually load-bearing. The original LOCALDB rule looked over-engineered with its random names and timing probes and retry loops. It was actually exactly as engineered as it needed to be. My "cleaner" rewrite was simpler because I'd quietly assumed something untrue about how TMOS assigns ownership. The truth was readable from a 6-line timing report; I just hadn't generated one yet. If you're going to deviate from a working pattern, the deviation should be the thing you instrument first. Note: the original LocalDB proc library I built this from has been updated by the author in a couple different ways since I shared my work with him. I didn't fold that work in here, but I'll post those updates along with the original when I get permission to do so.99Views1like0CommentsTMM RESTARTING! (14.1.5)
It's almost identical to this issue below..Opened a case with f5 support but it's now been 25 hours and no resolution! The standby unit keeps going into standby over and over..and gives me messages about tmm restarting and possible issues with mpcd. even setting the unit offline and rebooting isn't stopping this issue. so we're not completely down but still it's worrying.. we do have VLAN failsafe enabled but i've played with the timeouts and dont notice a difference either https://community.f5.com/t5/technical-forum/tmm-restarting-os-version-12-1-1/td-p/312273 mgmt_acld Mar 23 22:09:43 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus wr_urldbd Mar 23 22:09:44 JTLSF-DNS-PR-SOA-PREPROD notice mcpd[6576]: 01070410:5: Removed subscription with subscriber id avrd Mar 23 22:09:44 JTLSF-DNS-PR-SOA-PREPROD notice mcpd[6576]: 01070406:5: Removed publication with publisher id TMM Mar 23 22:09:44 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus avrd Mar 23 22:09:44 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart restart avrd Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus dosl7d Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus asm Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus datasyncd Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus admd Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus mgmt_acld Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus dwbld Mar 23 22:09:51 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c006c:5: proc stat: [0] pid:31188 comm:(bigd) state:S utime:12 stime:2 cutime:21 cstime:4 starttime:321443168 vsize:44949504 rss:6251 wchan:0 blkio_ticks:0 [-1] pid:1846 comm:(bigd) state:S utime:12181 stime:6605 cutime:15 cstime:2 starttime:276188235 vsize:44949504 rss:6253 wchan:0 blkio_ticks:0 [-2] pid:1846 comm:(bigd) state:S utime:12181 stime:6604 cutime:15 cstime:2 starttime:276188235 vsize:44949504 rss:6253 wchan:0 blkio_ticks:0 . Mar 23 22:09:52 JTLSF-DNS-PR-SOA-PREPROD notice tmm[32121]: 01010001:5: pgo_use x86_64 padc TMM Version 12.1.1.0.0.184 starting Mar 23 22:09:53 JTLSF-DNS-PR-SOA-PREPROD notice mcpd[6576]: 01070404:5: Add a new Publication for publisherID TMM and filterType (nil) Mar 23 22:09:55 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 01140030:5: HA proc_running tmm is now responding. Mar 23 22:09:55 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c006d:5: Leaving Offline for Active for dbvar not redundant. Mar 23 22:09:55 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c0053:5: Active for traffic group /Common/traffic-group-1. Mar 23 22:09:55 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c0019:5: Active Mar 23 22:09:41 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c003e:5: Offline Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD info bigstart: Start bigd in single-process mode Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD info bigd.start[31188]: Execing bigd: /usr/bin/bigd bigd -S Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice promptstatusd[3640]: 01460006:5: semaphore tmm.running(1) held Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus datastor Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus dedup_admin Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus tmrouted Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus wamd Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart restart tmrouted Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus websso Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart restart localdbmgr Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus acctd Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus eam Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus rba Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus eca Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice tmrouted[23455]: 01910005:5: Tmrouted exiting after getting termination request. Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice mcpd[6576]: 01070406:5: Removed publication with publisher id tmrouted Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus nlad Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus vdi Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus urldb Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus datasyncd Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus fpuserd Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus bdosd Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus antserver Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus admd Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus asm Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus avrd Mar 23 22:09:43 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart restart avrd Mar 23 22:09:43 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus dosl7d Mar 23 22:09:43 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus dwbld Mar 23 22:09:43 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus mgmt_acld Mar 23 22:09:43 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus wr_urldbd Mar 23 22:09:44 JTLSF-DNS-PR-SOA-PREPROD notice mcpd[6576]: 01070410:5: Removed subscription with subscriber id avrd Mar 23 22:09:44 JTLSF-DNS-PR-SOA-PREPROD notice mcpd[6576]: 01070406:5: Removed publication with publisher id TMM Mar 23 22:09:44 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus avrd Mar 23 22:09:44 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart restart avrd Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus dosl7d Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus asm Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus datasyncd Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus admd Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus mgmt_acld Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus dwbld Mar 23 22:09:51 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c006c:5: proc stat: [0] pid:31188 comm:(bigd) state:S utime:12 stime:2 cutime:21 cstime:4 starttime:321443168 vsize:44949504 rss:6251 wchan:0 blkio_ticks:0 [-1] pid:1846 comm:(bigd) state:S utime:12181 stime:6605 cutime:15 cstime:2 starttime:276188235 vsize:44949504 rss:6253 wchan:0 blkio_ticks:0 [-2] pid:1846 comm:(bigd) state:S utime:12181 stime:6604 cutime:15 cstime:2 starttime:276188235 vsize:44949504 rss:6253 wchan:0 blkio_ticks:0 . Mar 23 22:09:52 JTLSF-DNS-PR-SOA-PREPROD notice tmm[32121]: 01010001:5: pgo_use x86_64 padc TMM Version 12.1.1.0.0.184 starting Mar 23 22:09:53 JTLSF-DNS-PR-SOA-PREPROD notice mcpd[6576]: 01070404:5: Add a new Publication for publisherID TMM and filterType (nil) Mar 23 22:09:55 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 01140030:5: HA proc_running tmm is now responding. Mar 23 22:09:55 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c006d:5: Leaving Offline for Active for dbvar not redundant. Mar 23 22:09:55 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c0053:5: Active for traffic group /Common/traffic-group-1. Mar 23 22:09:55 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c0019:5: Active914Views0likes2CommentsTMM Restarting (OS Version 12.1.1)
Hi Everyone, I have a problem with my F5 device. The problem is the tmm service always restarting continuesly, so its make the device INOPERATIVE (intermittent active-standby). Here the logs : "Mar 23 23:12:17 JTLSF-DNS-PR-SOA-PREPROD emerg logger: Re-starting tmm" Mar 23 22:09:29 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 01140045:5: HA reports tmm NOT ready. Mar 23 22:09:29 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c0050:5: Sod requests links down. Mar 23 22:09:40 JTLSF-DNS-PR-SOA-PREPROD warning sod[5195]: 01140029:4: HA daemon_heartbeat tmm fails action is go offline down links and restart. Mar 23 22:09:40 JTLSF-DNS-PR-SOA-PREPROD err sod[5195]: 012a0003:3: HalFailover_::set: Cannot clear /dev/ttyS1 DTR/RTS. errno=5 Mar 23 22:09:40 JTLSF-DNS-PR-SOA-PREPROD err sod[5195]: 012a0003:3: halSetFailover: set error Mar 23 22:09:40 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c0066:5: halSetFailover (clear) fails with status 11. Mar 23 22:09:40 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c0054:5: Offline for traffic group /Common/traffic-group-1. Mar 23 22:09:40 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c003e:5: Offline Mar 23 22:09:40 JTLSF-DNS-PR-SOA-PREPROD err mcpd[6576]: 01070069:3: Subscription not found in mcpd for subscriber Id BIGD_Subscriber. Mar 23 22:09:40 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c006d:5: Leaving Offline for Active for dbvar not redundant. Mar 23 22:09:40 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c0053:5: Active for traffic group /Common/traffic-group-1. Mar 23 22:09:40 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c0019:5: Active Mar 23 22:09:41 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c006c:5: proc stat: [0] pid:24126 comm:(tmm.0) state:S utime:1474241 stime:648064 cutime:0 cstime:0 starttime:276391877 vsize:2618109952 rss:18897 wchan:0 blkio_ticks:0 [-1] pid:24126 comm:(tmm.0) state:S utime:1474241 stime:648047 cutime:0 cstime:0 starttime:276391877 vsize:2618109952 rss:18897 wchan:0 blkio_ticks:0 [-2] pid:24126 comm:(tmm.0) state:R utime:1474241 stime:648029 cutime:0 cstime:0 starttime:276391877 vsize:2618109952 rss:18897 wchan:0 blkio_ticks:0 . Mar 23 22:09:41 JTLSF-DNS-PR-SOA-PREPROD warning sod[5195]: 01140029:4: HA proc_running tmm fails action is go offline and down links. Mar 23 22:09:41 JTLSF-DNS-PR-SOA-PREPROD err sod[5195]: 012a0003:3: HalFailover_::set: Cannot clear /dev/ttyS1 DTR/RTS. errno=5 Mar 23 22:09:41 JTLSF-DNS-PR-SOA-PREPROD err sod[5195]: 012a0003:3: halSetFailover: set error Mar 23 22:09:41 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c0066:5: halSetFailover (clear) fails with status 11. Mar 23 22:09:41 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c0054:5: Offline for traffic group /Common/traffic-group-1. Mar 23 22:09:41 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c003e:5: Offline Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD info bigstart: Start bigd in single-process mode Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD info bigd.start[31188]: Execing bigd: /usr/bin/bigd bigd -S Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice promptstatusd[3640]: 01460006:5: semaphore tmm.running(1) held Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus datastor Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus dedup_admin Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus tmrouted Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus wamd Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart restart tmrouted Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus websso Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart restart localdbmgr Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus acctd Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus eam Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus rba Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus eca Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice tmrouted[23455]: 01910005:5: Tmrouted exiting after getting termination request. Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice mcpd[6576]: 01070406:5: Removed publication with publisher id tmrouted Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus nlad Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus vdi Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus urldb Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus datasyncd Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus fpuserd Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus bdosd Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus antserver Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus admd Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus asm Mar 23 22:09:42 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus avrd Mar 23 22:09:43 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart restart avrd Mar 23 22:09:43 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus dosl7d Mar 23 22:09:43 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus dwbld Mar 23 22:09:43 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus mgmt_acld Mar 23 22:09:43 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh ./finish 2079 9 ==> /usr/bin/bigstart singlestatus wr_urldbd Mar 23 22:09:44 JTLSF-DNS-PR-SOA-PREPROD notice mcpd[6576]: 01070410:5: Removed subscription with subscriber id avrd Mar 23 22:09:44 JTLSF-DNS-PR-SOA-PREPROD notice mcpd[6576]: 01070406:5: Removed publication with publisher id TMM Mar 23 22:09:44 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus avrd Mar 23 22:09:44 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart restart avrd Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus dosl7d Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus asm Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus datasyncd Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus admd Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus mgmt_acld Mar 23 22:09:50 JTLSF-DNS-PR-SOA-PREPROD notice logger: /bin/sh /etc/bigstart/scripts/avr.provision ==> /usr/bin/bigstart singlestatus dwbld Mar 23 22:09:51 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c006c:5: proc stat: [0] pid:31188 comm:(bigd) state:S utime:12 stime:2 cutime:21 cstime:4 starttime:321443168 vsize:44949504 rss:6251 wchan:0 blkio_ticks:0 [-1] pid:1846 comm:(bigd) state:S utime:12181 stime:6605 cutime:15 cstime:2 starttime:276188235 vsize:44949504 rss:6253 wchan:0 blkio_ticks:0 [-2] pid:1846 comm:(bigd) state:S utime:12181 stime:6604 cutime:15 cstime:2 starttime:276188235 vsize:44949504 rss:6253 wchan:0 blkio_ticks:0 . Mar 23 22:09:52 JTLSF-DNS-PR-SOA-PREPROD notice tmm[32121]: 01010001:5: pgo_use x86_64 padc TMM Version 12.1.1.0.0.184 starting Mar 23 22:09:53 JTLSF-DNS-PR-SOA-PREPROD notice mcpd[6576]: 01070404:5: Add a new Publication for publisherID TMM and filterType (nil) Mar 23 22:09:55 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 01140030:5: HA proc_running tmm is now responding. Mar 23 22:09:55 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c006d:5: Leaving Offline for Active for dbvar not redundant. Mar 23 22:09:55 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c0053:5: Active for traffic group /Common/traffic-group-1. Mar 23 22:09:55 JTLSF-DNS-PR-SOA-PREPROD notice sod[5195]: 010c0019:5: Active How to solve this problem ? I want to upgrade this device ... but the CLI [INOPERATIVE], so we can do anyting at the CLI for running the script right ? Anyone have any steps for solve this TMM Restart Continuesly ? Thankyouuu ...1.4KViews0likes5CommentsiRule logging [LB::server pool] crashes tmm on 11.4.1 HF3
F5 support suggested I post on DevCentral to ask if selecting a pool in the Client_Accepted event would mark that pool active (yes) and if making this selection so early in the process I am able to "override" the default pool configured for a virtual server (as far as I can tell, yes). Instead of posting the question to DevCentral I thought I should just write a rule and test with some log messages. I found out I was right, but I also found out the rule I was using to test crashed tmm on 11.4.1 HF3 three times in a row. This did not happen in my DR environment which is on 11.6.0 HF3. Basically, the rule and procs just log which pool is selected multiple times between events and function calls. I suspect there must be some difference in how [LB::server pool] is being used between versions. It seems like I have experienced a ton of odd behavior while working with my LTMs. In the messages below there appears to be a time difference between the log message from the rule and the emerg logger messages. This was not the case in reality. I was monitoring the ltm log for my test messages with tail -f ltm | grep TEST and issuing a request I knew would trigger the rule execution. I saw my first log statement as expected and immediately the emerg logger statements were injected into my console. Has someone else experienced this? Does anyone know what causes this? Do I have to upgrade my production units to 11.6.0 HF3 to fix this? Messages from daemon.log: Mar 19 16:33:52 f5cm2 emerg logger: Re-starting tmm Mar 19 16:33:52 f5cm2 emerg logger: Re-starting tmm1 Mar 19 16:37:24 f5cm2 emerg logger: Re-starting tmm Mar 19 16:37:24 f5cm2 emerg logger: Re-starting tmm1 Mar 19 16:38:54 f5cm2 emerg logger: Re-starting tmm Mar 19 16:38:54 f5cm2 emerg logger: Re-starting tmm1 ` Messages from ltm log: `Mar 19 16:33:20 f5cm2 debug tmm1[8361]: Rule /Common/test : TEST: start pool in rule CLIENT_ACCEPT /Common/expected_pool_name Mar 19 16:33:20 f5cm2 notice sod[5545]: 01140045:5: HA reports tmm NOT ready. Mar 19 16:33:20 f5cm2 notice sod[5545]: 010c0050:5: Sod requests links down. Mar 19 16:33:20 f5cm2 info lacpd[5552]: 01160016:6: Failover event detected. (Switchboard failsafe disabled while offline) Mar 19 16:33:20 f5cm2 err bcm56xxd[6270]: 012c0010:3: Failover event detected. Marking external interfaces down. bsx.c(3724) ...snip... Mar 19 16:36:54 f5cm2 debug tmm[15613]: Rule /Common/test : TEST: start pool in rule CLIENT_ACCEPT /Common/expected_pool_name Mar 19 16:36:54 f5cm2 notice sod[5545]: 01140045:5: HA reports tmm NOT ready. Mar 19 16:36:54 f5cm2 notice sod[5545]: 010c0050:5: Sod requests links down. Mar 19 16:36:54 f5cm2 info lacpd[5552]: 01160016:6: Failover event detected. (Switchboard failsafe disabled while offline) Mar 19 16:36:55 f5cm2 err bcm56xxd[6270]: 012c0010:3: Failover event detected. Marking external interfaces down. bsx.c(3724) ...snip... Mar 19 16:38:24 f5cm2 debug tmm1[16328]: Rule /Common/test : TEST: start pool in rule CLIENT_ACCEPT /Common/expected_pool_name Mar 19 16:38:24 f5cm2 notice sod[5545]: 01140045:5: HA reports tmm NOT ready. Mar 19 16:38:24 f5cm2 notice sod[5545]: 010c0050:5: Sod requests links down. Mar 19 16:38:24 f5cm2 err bcm56xxd[6270]: 012c0010:3: Failover event detected. Marking external interfaces down. bsx.c(3724) ` test2 rule: when CLIENT_ACCEPTED { log local0.debug "TEST: start pool in rule CLIENT_ACCEPT [LB::server pool]" call test::client_accepted log local0.debug "TEST: pool in CLIENT_ACCEPT after function [LB::server pool]" } when HTTP_REQUEST { log local0.debug "TEST: start pool in rule HTTP_REQUEST [LB::server pool]" call test::http_request log local0.debug "TEST: pool in HTTP_REQUEST after function [LB::server pool]" } test rule proc client_accepted { log local0.debug "TEST: pool coming into client_accept [LB::server pool]" set var 1 switch $var { "0" { log local0.debug "TEST: var was set to $var." } "1" { log local0.debug "TEST: var was set to $var." pool pool_name log local0.debug "TEST: pool selected in switch on var of client_accept [LB::server pool]" } } log local0.debug "TEST: pool selected in client_accept [LB::server pool]" } proc http_request { log local0.debug "TEST: pool coming into http_request [LB::server pool]" pool pool2_name log local0.debug "TEST: pool selected in http_request [LB::server pool]" }555Views0likes8CommentsRouting application traffic through management interface
Hello all, I have a PoC setup in our lab with a management, internal and DMZ network and have a problem with routing. The F5 always sends the connection to the ADFS backend out from its DMZ interface, even though it's management interface is in the same subnet as the ADFS. MGMT: 10.x.250.0/24 DMZ: 10.x.251.128/25 Internal: 10.x.251.0/25 (not used here) I read this information which seems to suggest that application traffic must always be separate from management traffic, TMM handles the application traffic and the underlying linux handles the management traffic: https://clouddocs.f5.com/cli/tmsh-reference/latest/modules/sys/sys-management-route.html The management interface is available on all switch platforms and is designed for management purposes. You can access the browser-based Configuration utility and command line configuration utility through the management port. You cannot use the management interface in traffic management VLANs. So I understand from that that the MGMT is completely separate and I cannot make a routing hack to use the management interface for the ADFS application traffic. I can't change the location of the AD FS server. I could just open the firewall for the F5 connection from the DMZ to the management network but this is quite annoying as the F5 management and AD FS are directly connected on the same subnet. Is there anyway to instruct the F5 to use it's management interface 10.x.250.150 to contact the AD FS? Thanks, PeterSolved1.9KViews0likes2CommentsCannot add F5 net route
Hello! I am scratching my head because I can't see why this static route is not being added to the config. The "x" is replaced with a usable 1-254 IP. Existing config: net self INT-350_NET1_PRIMARY_IP { address x.159.222.101/24 allow-service { default } traffic-group traffic-group-local-only vlan INT-350 } net self INT-350_NET1_FLOATING_IP { address x.159.222.103/24 allow-service { default } floating enabled traffic-group traffic-group-1 unit 1 vlan INT-350 } net vlan INT-350 { if-index 448 interfaces { n7k-Po16 { tag-mode service tagged } } tag 350 } root@(sfltm1)(cfg-sync Standalone)(Active)(/Common)(tmos) load sys config merge from-terminal Enter configuration. Press CTRL-D to submit or CTRL-C to cancel. net route INT-350_NET1_ROUTE { gw x.159.222.254 network x.159.222.0/24 } Loading configuration... 01070666:3: Static route duplicates Self IP x.159.222.0 / 255.255.255.0 implied route Unexpected Error: Loading configuration process failed.1.1KViews0likes8CommentsIRULE TO REMOVE LOGS FROM FORWARD PROXY IRULE
What irule can be used to remove logs from an irule. The situation is that, the irule applied to a virtual server is filling up the /var/log folder and tmm is rebooting. example, if { $static::enable_logging_L4_VIP_GPRS_TRANSPARENT } { set logging_handle [HSL::open -proto UDP -pool ${static::log_destination_L4_VIP_GPRS_TRANSPARENT} ] i WANT ALL logs removed or tmm not to activate them229Views0likes0CommentsIRULE TO REMOVE LOGS FROM FORWARD PROXY IRULE
What irule can be used to remove logs from an irule. The situation is that, the irule applied to a virtual server is filling up the /var/log folder and tmm is rebooting. example, if { $static::enable_logging_L4_VIP_GPRS_TRANSPARENT } { set logging_handle [HSL::open -proto UDP -pool ${static::log_destination_L4_VIP_GPRS_TRANSPARENT} ] i WANT ALL logs removed or tmm not to activate them318Views0likes1CommentHave you ever run out of file descriptors while updating an AS3 declaration?
Running AS3 version 3.26.0 on Virtual Edition 15.1.5. Upon PATCHing the declaration I'm persistently getting nonsensical 422s for objects that can't be found. Upon checking the restnode log (/var/log/restnoded/restnoded.log), I see errors which indicate an exhaustion of file descriptors which appear to be limited to 4096 for the user restnoded, which the service runs as. Is there any supported way to work around this? Here are the aforementioned logs: Wed, 08 Jun 2022 18:37:31 GMT - severe: [appsvcs] {"message":"GET http://admin:XXXXXX@localhost:8100/mgmt/tm/sys/file/ssl-cert/~Common~default.crt failed (connect EMFILE 127.0.0.1:8100 - Local (undefined:undefined))","level":"error"} Wed, 08 Jun 2022 18:37:31 GMT - severe: [appsvcs] {"message":"GET http://admin:XXXXXX@localhost:8100/mgmt/tm/sys/file/ssl-key/~Common~default.key failed (connect EMFILE 127.0.0.1:8100 - Local (undefined:undefined))","level":"error"} Wed, 08 Jun 2022 18:37:31 GMT - warning: [appsvcs] {"message":"unable to digest declaration. Error: Unable to find /Common/default.crt for /redirect_w3_test_no_users/Shared/defaultCert/certificate","level":"warning"} Wed, 08 Jun 2022 18:37:31 GMT - severe: [appsvcs] {"message":"DELETE http://admin:XXXXXX@localhost:8100/mgmt/tm/ltm/data-group/internal/~Common~appsvcs~____appsvcs_lock Attempting to release global lock failed (getaddrinfo EBUSY localhost:8100)","level":"error"} Wed, 08 Jun 2022 18:37:31 GMT - severe: [appsvcs] {"message":"GET http://localhost:8100/mgmt/tm/sys/folder/~Common~appsvcs failed (getaddrinfo EBUSY localhost:8100)","level":"error"} Wed, 08 Jun 2022 18:37:32 GMT - severe: [appsvcs] {"message":"An error occured while deleting stored declaration: Error: spawn /bin/sh EMFILE","level":"error"} Wed, 08 Jun 2022 18:37:34 GMT - finest: socket 201 opened Wed, 08 Jun 2022 18:37:39 GMT - finest: socket 201 closed Wed, 08 Jun 2022 18:40:21 GMT - finest: socket 202 opened Wed, 08 Jun 2022 18:40:30 GMT - finest: socket 202 closed Wed, 08 Jun 2022 18:40:37 GMT - severe: [appsvcs] {"message":"GET http://admin:XXXXXX@localhost:8100/mgmt/tm/sys/file/ssl-cert/~Common~default.crt failed (connect EMFILE 127.0.0.1:8100 - Local (undefined:undefined))","level":"error"} Wed, 08 Jun 2022 18:40:37 GMT - severe: [appsvcs] {"message":"GET http://admin:XXXXXX@localhost:8100/mgmt/tm/sys/file/ssl-key/~Common~default.key failed (connect EMFILE 127.0.0.1:8100 - Local (undefined:undefined))","level":"error"} Wed, 08 Jun 2022 18:40:37 GMT - warning: [appsvcs] {"message":"unable to digest declaration. Error: Unable to find /Common/default.crt for /redirect_w3_test_no_users/Shared/defaultCert/certificate","level":"warning"} Wed, 08 Jun 2022 18:40:37 GMT - severe: [appsvcs] {"message":"DELETE http://admin:XXXXXX@localhost:8100/mgmt/tm/ltm/data-group/internal/~Common~appsvcs~____appsvcs_lock Attempting to release global lock failed (getaddrinfo EBUSY localhost:8100)","level":"error"} Wed, 08 Jun 2022 18:40:37 GMT - severe: [appsvcs] {"message":"GET http://localhost:8100/mgmt/tm/sys/folder/~Common~appsvcs failed (getaddrinfo EBUSY localhost:8100)","level":"error"} Wed, 08 Jun 2022 18:40:37 GMT - severe: [appsvcs] {"message":"An error occured while deleting stored declaration: Error: spawn /bin/sh EMFILE","level":"error"}1.2KViews0likes7CommentsAre NTP and DNS traffic management type or not?
Hello everyone, I'm system engineer in integrator company and currently I have one PoC of AWAF project with a customer. I have little experience of working with f5 devices, so I have one question and it'll help me a lot in future to analyze how BIG-IP devices. I've done some research in documentations but I couldn't find clear answer on topics, which type of traffic is considered as Data Traffic and which one is Management? For example NTP and DNS traffic should use management route or TMM route (I mean the case when there is no direct path to the destination DNS/NTP servers)? I thought that BIG-IP devices will use management route (management gateway) to do DNS queries and time synchronization, so I asked customer to grant access on firewall from management interface to the destination servers, but it didn't work. Then I've captured traffic via tcpdump and I realized that BIG-IP devices try to use TMM default route instead. But I've read in this article - https://support.f5.com/csp/article/K13284 that NTP is management traffic. Also this article - https://support.f5.com/csp/article/K7017 says that during the device boot, ntpd daemon is starting before TMM, so if it has no route via management interface, time synchronization will fail. So, I'm a little confused, what should I ask customer, open access from TMM interface for DNS, NTP, also for Signature Updates? I just do not understand logically, why NTP, DNS and system update do not use management routes? If all of them are considered as a data traffic, than what is management route used for? Only for accessing management GUI and SSH, is that correct? Sorry for a long question, but I really want to understand the platform's logic of traffic routing, to be able to operate it and correctly implement it with the customer. Thanks in advance. // Giorgi2.5KViews0likes5Comments