Page MenuHomePhabricator

dduvall (Dan Duvall)
Staff Software Engineer

Today

  • No visible events.

Tomorrow

  • No visible events.

Friday

  • No visible events.

User Details

User Since
Oct 7 2014, 4:24 PM (574 w, 14 h)
Availability
Available
IRC Nick
marxarelli
LDAP User
Dduvall
MediaWiki User
DDuvall (WMF) [ Global Accounts ]

Recent Activity

Thu, Oct 2

dduvall added a subtask for T403125: Investigate WMCS Magnum for GitLab runners: T406271: Grant gitlab-runners-staging access to fast-iops volume type and a 4xiops instance flavor.
Thu, Oct 2, 9:18 PM · Release-Engineering-Team (Priority Backlog 📥), GitLab (CI & Job Runners)
dduvall added a parent task for T406271: Grant gitlab-runners-staging access to fast-iops volume type and a 4xiops instance flavor: T403125: Investigate WMCS Magnum for GitLab runners.
Thu, Oct 2, 9:18 PM · Release-Engineering-Team (Radar), Cloud-VPS (Quota-requests)
dduvall created T406271: Grant gitlab-runners-staging access to fast-iops volume type and a 4xiops instance flavor.
Thu, Oct 2, 9:18 PM · Release-Engineering-Team (Radar), Cloud-VPS (Quota-requests)

Tue, Sep 30

dduvall added a comment to T405118: Set up zuul scheduler on zuul1001.

I tried it and I can confirm using mysql+pymysql gets us past the error.

Tue, Sep 30, 7:49 PM · collaboration-services, Essential-Work, Continuous-Integration-Infrastructure (Zuul upgrade)
dduvall added a comment to T405118: Set up zuul scheduler on zuul1001.

Interesting! This looks like it might be something missing in the executor image. My guess is that we copied upstream, but upstream doesn't use mariadb like we do. @dduvall can you take a look at this?

Tue, Sep 30, 5:40 PM · collaboration-services, Essential-Work, Continuous-Integration-Infrastructure (Zuul upgrade)

Thu, Sep 25

dduvall changed the status of T405651: Enable use of `.kokkuri:bake` on trusted runners from Open to In Progress.
Thu, Sep 25, 6:54 PM · GitLab (CI & Job Runners), Release-Engineering-Team (Doing 😎)
dduvall created T405651: Enable use of `.kokkuri:bake` on trusted runners.
Thu, Sep 25, 6:54 PM · GitLab (CI & Job Runners), Release-Engineering-Team (Doing 😎)

Mon, Sep 22

dduvall created T405287: Hanging NotReady status following OOM on node.
Mon, Sep 22, 10:22 PM · Release-Engineering-Team (Priority Backlog 📥), GitLab (CI & Job Runners)

Wed, Sep 17

dduvall added a subtask for T396380: 1.45.0-wmf.19 deployment blockers: T404902: Wikimedia\Assert\InvariantException: Invariant failed: getBasePageBundle called on non-Parsoid ContentHolder.
Wed, Sep 17, 6:28 PM · Essential-Work, Release-Engineering-Team (Doing 😎), Release, Train Deployments
dduvall added a parent task for T404902: Wikimedia\Assert\InvariantException: Invariant failed: getBasePageBundle called on non-Parsoid ContentHolder: T396380: 1.45.0-wmf.19 deployment blockers.
Wed, Sep 17, 6:28 PM · MW-1.45-notes (1.45.0-wmf.19; 2025-09-16), MW-Interfaces-Team, MediaWiki-REST-API, MediaWiki-Parser, Wikimedia-production-error
dduvall created T404902: Wikimedia\Assert\InvariantException: Invariant failed: getBasePageBundle called on non-Parsoid ContentHolder.
Wed, Sep 17, 6:25 PM · MW-1.45-notes (1.45.0-wmf.19; 2025-09-16), MW-Interfaces-Team, MediaWiki-REST-API, MediaWiki-Parser, Wikimedia-production-error

Mon, Sep 15

dduvall created T404668: Increase gitlab-runners-staging volumes to 12.
Mon, Sep 15, 11:36 PM · Cloud-VPS (Quota-requests)
dduvall reopened T404386: Request creation of gitlab-runners-staging VPS project as "Open".

@Andrew I don't see any zones listed in the project. Is that normal for a new project?

No, it is not normal for a project to have 0 Designate zones. There should be svc.$PROJECT.eqiad1.wikimedia.cloud., $PROJECT.eqiad1.wmcloud.org., and $PROJECT.wmcloud.org. zones assigned to the project in Designate.

Mon, Sep 15, 3:52 PM · Cloud-VPS (Project-requests)

Fri, Sep 12

dduvall added a comment to T404386: Request creation of gitlab-runners-staging VPS project.

@Andrew I don't see any zones listed in the project. Is that normal for a new project?

Fri, Sep 12, 10:37 PM · Cloud-VPS (Project-requests)

Thu, Sep 11

dduvall updated the task description for T404386: Request creation of gitlab-runners-staging VPS project.
Thu, Sep 11, 7:07 PM · Cloud-VPS (Project-requests)
dduvall created T404386: Request creation of gitlab-runners-staging VPS project.
Thu, Sep 11, 6:59 PM · Cloud-VPS (Project-requests)
dduvall added a comment to T404150: Additional floating IPs for gitlab-cloud-runner testing in testlabs project.

I asked @Andrew about this, and my understanding is that floating IPs are not required to create Octavia load balancers in OpenStack. But I don't have a full understanding of how Magnum works, so I might be wrong! Can you share more details like tofu code, errors you're getting, etc.?

Thu, Sep 11, 4:35 PM · Release-Engineering-Team (Radar), Cloud-VPS (Quota-requests)

Wed, Sep 10

dduvall added a comment to T404238: InvalidArgumentException: $aspect must use one of the XXX_USAGE constants, "A" given!.

A spike of these errors occurred during wmf.18 group1 promotion today but strangely all instances of the error were from 1.45.0-wmf.17.

Wed, Sep 10, 6:23 PM · Wikidata Integration in Wikimedia projects, Wikidata, Wikimedia-production-error

Tue, Sep 9

dduvall created T404150: Additional floating IPs for gitlab-cloud-runner testing in testlabs project.
Tue, Sep 9, 9:43 PM · Release-Engineering-Team (Radar), Cloud-VPS (Quota-requests)

Sep 4 2025

dduvall updated the task description for T396245: Build zuul images for production.
Sep 4 2025, 2:58 PM · Essential-Work, collaboration-services, Release-Engineering-Team (Doing 😎), Continuous-Integration-Infrastructure (Zuul upgrade)
dduvall closed T396245: Build zuul images for production as Resolved.
Sep 4 2025, 2:58 PM · Essential-Work, collaboration-services, Release-Engineering-Team (Doing 😎), Continuous-Integration-Infrastructure (Zuul upgrade)
dduvall updated the task description for T396245: Build zuul images for production.
Sep 4 2025, 2:58 PM · Essential-Work, collaboration-services, Release-Engineering-Team (Doing 😎), Continuous-Integration-Infrastructure (Zuul upgrade)

Aug 15 2025

dduvall added a comment to T392526: Refactor `build-images.py` to use a common code image and `docker buildx`.

I'm also unsure how to resolve the difference in package name - e.g., whether there's some suitable override mechanism (create an empty transitional package?) - or whether there are subtle differences in the packaging configuration that make the result incompatible.

One thing to check is whether these debs are actually arch-dependent, usually most vendor debs are just statically linked, we could try one of them in a bullseye and bookworm container to find out. If so, we could have a single update definition and then use it to sync to bullseye and bookworm.

Aug 15 2025, 4:40 PM · Patch-For-Review, Release-Engineering-Team (Priority Backlog 📥)

Jul 23 2025

dduvall added a comment to T392610: SpiderPig should support train deployments.

SpiderPig is hilarious and awesome. Great job everyone!

Jul 23 2025, 4:28 PM · Essential-Work, Scap (SpiderPig 🕸️), Release-Engineering-Team (Yak Shaving 🐃🪒)

Jul 22 2025

dduvall added a comment to T398873: Move nightly image build from releases-jenkins to deployment.eqiad.wmnet.

Yes, but in the meantime scap prep next would continue to work correctly (assuming at least one prior successful branch cut was merged). As it stands, if`MediaWiki branch and publish WMF single-version image` fails, a subsequent scap prep next will fail.

Jul 22 2025, 9:57 PM · Release-Engineering-Team (Doing 😎), OKR-Work
dduvall added a comment to T398873: Move nightly image build from releases-jenkins to deployment.eqiad.wmnet.

@dduvall I'd like to see MediaWiki branch and publish WMF single-version image changed so that instead of destroying and recreating the wmf/next branch each time it runs, it updates wmf/next if it already exists. This means being able to handle added/dropped extensions.

Jul 22 2025, 9:49 PM · Release-Engineering-Team (Doing 😎), OKR-Work
dduvall added a comment to T398873: Move nightly image build from releases-jenkins to deployment.eqiad.wmnet.

If CI fails (which happens about 10% of the time), we're left with an unusable wmf/next branch until the next run

Jul 22 2025, 9:45 PM · Release-Engineering-Team (Doing 😎), OKR-Work

Jul 11 2025

dduvall added a comment to T399120: [kokuri] Use a unique per CI run tag by default.

@bd808 Kokkuri 2.8.0 will include the digest in the image ref. See if that solves your issue.

Jul 11 2025, 10:00 PM · Release-Engineering-Team (Doing 😎), Patch-For-Review, GitLab (CI & Job Runners)

Jun 18 2025

dduvall added a comment to T395938: puppetize setup of new zuul VMs.

@Dzahn the WMF based production images for Zuul and Nodepool have been built and published to our registry. I'll post a summary about how we're managing them in T396245: Build zuul images for production tomorrow, but here are the latest image refs by service:

Jun 18 2025, 11:59 PM · Patch-For-Review, collaboration-services, Continuous-Integration-Infrastructure (Zuul upgrade)

Jun 9 2025

dduvall added a comment to T390119: Plan for porting PipelineLib to Zuul Ansible.

Looking at the above results, I believe that most of the functionality being served by PipelineLib could potentially be served by docker buildx bake (in conjunction w/ buildkitd and Blubber). Docker bake can build multiple sets of targets/contexts/configs simultaneously and even export the results as generic artifacts (to serve the one case that is using copy).

Jun 9 2025, 11:50 PM · Patch-For-Review, Release-Engineering-Team (Priority Backlog 📥), Continuous-Integration-Infrastructure (Zuul upgrade)
dduvall added a comment to T390119: Plan for porting PipelineLib to Zuul Ansible.

PipelineLib actions in use, according to codesearch results of 27 Gerrit hosted projects that include a .pipeline/config.yaml file.

Jun 9 2025, 11:13 PM · Patch-For-Review, Release-Engineering-Team (Priority Backlog 📥), Continuous-Integration-Infrastructure (Zuul upgrade)
dduvall updated the task description for T390119: Plan for porting PipelineLib to Zuul Ansible.
Jun 9 2025, 10:48 PM · Patch-For-Review, Release-Engineering-Team (Priority Backlog 📥), Continuous-Integration-Infrastructure (Zuul upgrade)

Jun 6 2025

dduvall claimed T396245: Build zuul images for production.
Jun 6 2025, 11:00 PM · Essential-Work, collaboration-services, Release-Engineering-Team (Doing 😎), Continuous-Integration-Infrastructure (Zuul upgrade)
dduvall updated subscribers of T396245: Build zuul images for production.

I refactored the blubber.yaml that @dancy had written back when we were experimenting with a Zuul setup for GitLab and created a wmf/12.0.0 branch.

Jun 6 2025, 11:00 PM · Essential-Work, collaboration-services, Release-Engineering-Team (Doing 😎), Continuous-Integration-Infrastructure (Zuul upgrade)

Jun 5 2025

dduvall updated subscribers of T396111: Wikimedia\NormalizedException\NormalizedException: Invalid username: {username}.

Spotted this today as well, following wmf.4 promotion to all wikis.

Jun 5 2025, 6:35 PM · ConfirmEdit (CAPTCHA extension), MediaWiki-Platform-Team, MediaWiki-extensions-CentralAuth, Wikimedia-production-error

Jun 3 2025

dduvall added a subtask for T392174: 1.45.0-wmf.4 deployment blockers: T395957: PHP Warning: Undefined array key "clientPref".
Jun 3 2025, 7:35 PM · Release-Engineering-Team (Priority Backlog 📥), Essential-Work, Release, Train Deployments
dduvall added a parent task for T395957: PHP Warning: Undefined array key "clientPref": T392174: 1.45.0-wmf.4 deployment blockers.
Jun 3 2025, 7:35 PM · MW-1.45-notes (1.45.0-wmf.5; 2025-06-10), MediaWiki-Platform-Team, MediaWiki-extensions-CentralAuth, Wikimedia-production-error
dduvall triaged T395957: PHP Warning: Undefined array key "clientPref" as Unbreak Now! priority.
Jun 3 2025, 7:35 PM · MW-1.45-notes (1.45.0-wmf.5; 2025-06-10), MediaWiki-Platform-Team, MediaWiki-extensions-CentralAuth, Wikimedia-production-error
dduvall created T395957: PHP Warning: Undefined array key "clientPref".
Jun 3 2025, 7:33 PM · MW-1.45-notes (1.45.0-wmf.5; 2025-06-10), MediaWiki-Platform-Team, MediaWiki-extensions-CentralAuth, Wikimedia-production-error

May 14 2025

dduvall removed a member for MW-on-K8s: dduvall.
May 14 2025, 8:24 PM

May 6 2025

dduvall created T393496: Increase zuul3 quotas for cpu/ram/disk/instances.
May 6 2025, 5:35 PM · cloud-services-team, Release-Engineering-Team (Priority Backlog 📥), Continuous-Integration-Infrastructure (Zuul upgrade), Cloud-VPS (Quota-requests)
dduvall closed T391374: Stand up Zuul 11 experiment environment in zuul3 cloud VPS project as Resolved.
May 6 2025, 5:22 PM · Essential-Work, Release-Engineering-Team (Doing 😎), Continuous-Integration-Infrastructure (Zuul upgrade)

May 1 2025

dduvall added a comment to T393034: Investigate out of date refs following gerrit switchover.

@thcipriani is this still a blocker or are we good for group1/all wiki promotion today?

May 1 2025, 3:59 PM · Wikimedia-Incident, Release-Engineering-Team, collaboration-services, Gerrit

Apr 24 2025

dduvall updated the task description for T392610: SpiderPig should support train deployments.
Apr 24 2025, 3:58 PM · Essential-Work, Scap (SpiderPig 🕸️), Release-Engineering-Team (Yak Shaving 🐃🪒)
dduvall created T392610: SpiderPig should support train deployments.
Apr 24 2025, 3:57 PM · Essential-Work, Scap (SpiderPig 🕸️), Release-Engineering-Team (Yak Shaving 🐃🪒)
dduvall added a comment to T390251: docker-registry.wikimedia.org keeps serving bad blobs.

serializes only layers per push. Multiple pushes can still happen simultaneously and IIUC scap does do that. Maybe we could have a flag in scap like e.g. --image-push-concurrency=1 to at least verify/rule out this hypothesis? It would slow down deployment for a couple of weeks but would give a strong signal to guide us better into resolving this.

+1 I like this, @dancy @dduvall what do you think about it?

As a short-term mitigation, it seems reasonable, and serializing the image pushes in build-images.py should be straightforward.

Apr 24 2025, 3:21 PM · Patch-For-Review, serviceops
dduvall changed the status of T392526: Refactor `build-images.py` to use a common code image and `docker buildx` from Open to In Progress.
Apr 24 2025, 3:19 PM · Patch-For-Review, Release-Engineering-Team (Priority Backlog 📥)

Apr 23 2025

dduvall updated the task description for T392526: Refactor `build-images.py` to use a common code image and `docker buildx`.
Apr 23 2025, 6:44 PM · Patch-For-Review, Release-Engineering-Team (Priority Backlog 📥)
dduvall created T392526: Refactor `build-images.py` to use a common code image and `docker buildx`.
Apr 23 2025, 6:38 PM · Patch-For-Review, Release-Engineering-Team (Priority Backlog 📥)

Apr 22 2025

dduvall added a comment to T390251: docker-registry.wikimedia.org keeps serving bad blobs.

serializes only layers per push. Multiple pushes can still happen simultaneously and IIUC scap does do that. Maybe we could have a flag in scap like e.g. --image-push-concurrency=1 to at least verify/rule out this hypothesis? It would slow down deployment for a couple of weeks but would give a strong signal to guide us better into resolving this.

+1 I like this, @dancy @dduvall what do you think about it?

Apr 22 2025, 11:47 PM · Patch-For-Review, serviceops

Apr 17 2025

dduvall updated subscribers of T391869: PHP Warning: Undefined property: Wikimedia\Parsoid\NodeData\DataMw::$caption.

Looks to have been introduced in:

Apr 17 2025, 6:18 PM · Essential-Work, Content-Transform-Team (Work In Progress), Parsoid, Wikimedia-production-error

Apr 16 2025

dduvall removed a subtask for T386220: 1.44.0-wmf.25 deployment blockers: T392086: PHP Warning: Array to string conversion / RuntimeException: PCRE failure on Special:PasswordReset.
Apr 16 2025, 6:23 PM · Essential-Work, Release-Engineering-Team (Doing 😎), Release, Train Deployments
dduvall removed a parent task for T392086: PHP Warning: Array to string conversion / RuntimeException: PCRE failure on Special:PasswordReset: T386220: 1.44.0-wmf.25 deployment blockers.
Apr 16 2025, 6:23 PM · MW-1.44-notes (1.44.0-wmf.25; 2025-04-15), MW-1.43-notes, MediaWiki-Platform-Team, MediaWiki-User-login-and-signup, Wikimedia-production-error
dduvall lowered the priority of T392086: PHP Warning: Array to string conversion / RuntimeException: PCRE failure on Special:PasswordReset from Unbreak Now! to Medium.

Removing this task as a blocker as the errors only occurred during a short-ish window, occurred for wmf.24 as well as wmf.25, and only for internal wikis.

Apr 16 2025, 6:23 PM · MW-1.44-notes (1.44.0-wmf.25; 2025-04-15), MW-1.43-notes, MediaWiki-Platform-Team, MediaWiki-User-login-and-signup, Wikimedia-production-error
dduvall added a project to T392086: PHP Warning: Array to string conversion / RuntimeException: PCRE failure on Special:PasswordReset: MediaWiki-Platform-Team.
Apr 16 2025, 6:03 PM · MW-1.44-notes (1.44.0-wmf.25; 2025-04-15), MW-1.43-notes, MediaWiki-Platform-Team, MediaWiki-User-login-and-signup, Wikimedia-production-error
dduvall added a comment to T391935: scap train-presync failed to push image: blob upload unknown.

Closed as a duplicate that, while ongoing, is not strictly a train blocker.

Apr 16 2025, 4:58 PM · serviceops, Release-Engineering-Team
dduvall merged T391935: scap train-presync failed to push image: blob upload unknown into T390251: docker-registry.wikimedia.org keeps serving bad blobs.
Apr 16 2025, 4:57 PM · Patch-For-Review, serviceops
dduvall merged task T391935: scap train-presync failed to push image: blob upload unknown into T390251: docker-registry.wikimedia.org keeps serving bad blobs.
Apr 16 2025, 4:57 PM · serviceops, Release-Engineering-Team

Apr 15 2025

dduvall added a comment to T390251: docker-registry.wikimedia.org keeps serving bad blobs.

Other possibly relevant discussions around this issue.

Apr 15 2025, 5:24 PM · Patch-For-Review, serviceops
dduvall added a comment to T390251: docker-registry.wikimedia.org keeps serving bad blobs.

Also, I wonder if there's a way we can force monolithic uploads?

Apr 15 2025, 5:22 PM · Patch-For-Review, serviceops
dduvall added a comment to T390251: docker-registry.wikimedia.org keeps serving bad blobs.

Also, I wonder if there's a way we can force monolithic uploads?

Apr 15 2025, 5:05 PM · Patch-For-Review, serviceops
dduvall added a comment to T390251: docker-registry.wikimedia.org keeps serving bad blobs.

In the case of uploads, here is the bad sequence:

  • The client (e.g. dockerd) issues POST /v2/<repo>/blob/uploads/ to initiate an upload. This returns a new URL for subsequent operations (hereafter called the upload URL)
  • The client issues a PATCH to the upload URL to transmit the data.
  • The client issues a PUT to the upload URL to finalize the upload. This is where a 404 is sometimes returned by the registry (basically saying that it doesn't know about this upload). A 404 is more likely to be seen if a prior upload was large (i.e, if the replicator is busy). Retrying this PUT does eventually succeed.

In this case it's not the content of the upload that hasn't made it to the replica, but the existence of the upload state itself.

Apr 15 2025, 5:04 PM · Patch-For-Review, serviceops
dduvall added a comment to T390251: docker-registry.wikimedia.org keeps serving bad blobs.

At this point we have two problems.

  • Large image pushes are now unreliable (this seems new for mediawiki deployments). No workaround proposed yet.
Apr 15 2025, 4:17 PM · Patch-For-Review, serviceops

Apr 14 2025

dduvall added a comment to T391374: Stand up Zuul 11 experiment environment in zuul3 cloud VPS project.

The Zuul dashboard is available at https://zuul-dev.wmcloud.org/tenants

Apr 14 2025, 11:52 PM · Essential-Work, Release-Engineering-Team (Doing 😎), Continuous-Integration-Infrastructure (Zuul upgrade)
dduvall added a comment to T391374: Stand up Zuul 11 experiment environment in zuul3 cloud VPS project.

@dduvall excellent!

The event stream permission is probably good enough, it does not grant any specific access beside the ability to receive events and we have multiple bots on WMCS using that same setup. As long as the user is not granted more permission, it can't do much. Setting up a Gerrit + repos + config might add a bit of a burden, then if you can reuse an existing setup that let gives us a great playground \o/

Apr 14 2025, 11:44 PM · Essential-Work, Release-Engineering-Team (Doing 😎), Continuous-Integration-Infrastructure (Zuul upgrade)

Apr 8 2025

dduvall edited projects for T391374: Stand up Zuul 11 experiment environment in zuul3 cloud VPS project, added: Release-Engineering-Team (Doing 😎); removed Release-Engineering-Team (Priority Backlog 📥).

@hashar FYI I've set up Zuul and friends on zuul-1001.zuul3.eqiad1.wikimedia.cloud using https://opendev.org/zuul/zuul/src/branch/master/doc/source/examples/docker-compose.yaml

Apr 8 2025, 11:30 PM · Essential-Work, Release-Engineering-Team (Doing 😎), Continuous-Integration-Infrastructure (Zuul upgrade)
dduvall changed the status of T391374: Stand up Zuul 11 experiment environment in zuul3 cloud VPS project from Open to In Progress.
Apr 8 2025, 11:15 PM · Essential-Work, Release-Engineering-Team (Doing 😎), Continuous-Integration-Infrastructure (Zuul upgrade)

Mar 26 2025

dduvall added a comment to T389499: Refactor scap's kubernetes DeploymentsConfig to support selection of image kinds.

So, a possibly adequate analogy for the relationship between image "kind" (a new name for an existing concept that did not previously have a name) and image "flavour" (an existing name for an existing concept) would be that between a class definition and the specific set of constructor arguments that produce a concrete instantiation.

Mar 26 2025, 3:35 PM · MW-on-K8s, Release-Engineering-Team, serviceops

Mar 18 2025

dduvall added a comment to T388769: Add support for Alpine Linux in Blubber.

The apt config and implementation is also Debian-based base image specific. Alpine base images would need an apk config and implementation. Red Hat-based base images would need a yum (or dnf?) config and implementation. Arch uses pacman. OpenSUSE uses zypper. I am not aware of any unifying abstraction over the various distro specific package managers that would simplify this readily.

Mar 18 2025, 3:47 PM · Release Pipeline (Blubber)

Mar 17 2025

dduvall added a comment to T388769: Add support for Alpine Linux in Blubber.

Thanks for pointing out that this isn't a bug, @bd808.

Mar 17 2025, 9:10 PM · Release Pipeline (Blubber)

Mar 7 2025

dduvall added a comment to T387927: Improve garbage collection of unused MediaWiki images on deployment host.

I agree with running a daily timer and trying to spread the knowledge about the availability of scap clean-images to quickly recover space in unusual circumstances.

Mar 7 2025, 10:08 PM · Release-Engineering-Team (Doing 😎)
dduvall added a comment to T387927: Improve garbage collection of unused MediaWiki images on deployment host.

The scap clean-images implementation has been merged. I plan on doing a release early next week. Sample behavior from train-dev:

Mar 7 2025, 7:18 PM · Release-Engineering-Team (Doing 😎)

Mar 4 2025

dduvall changed the status of T387927: Improve garbage collection of unused MediaWiki images on deployment host from Open to In Progress.
Mar 4 2025, 11:08 PM · Release-Engineering-Team (Doing 😎)
dduvall updated the task description for T387927: Improve garbage collection of unused MediaWiki images on deployment host.
Mar 4 2025, 9:07 PM · Release-Engineering-Team (Doing 😎)
dduvall updated the task description for T387927: Improve garbage collection of unused MediaWiki images on deployment host.
Mar 4 2025, 9:06 PM · Release-Engineering-Team (Doing 😎)
dduvall updated subscribers of T387927: Improve garbage collection of unused MediaWiki images on deployment host.

@dancy thoughts on the implementation?

Mar 4 2025, 9:04 PM · Release-Engineering-Team (Doing 😎)
dduvall created T387927: Improve garbage collection of unused MediaWiki images on deployment host.
Mar 4 2025, 9:04 PM · Release-Engineering-Team (Doing 😎)
dduvall added a comment to T387796: deployment server - low disk space on /srv.

Is there something else filling up /srv? It has filled back up and docker system df hasn't changed much.

Mar 4 2025, 8:47 PM · serviceops-radar, Release-Engineering-Team (Radar), SRE
dduvall added a comment to T387796: deployment server - low disk space on /srv.

Even after the deletion, there is still a lot of reclaimable space:

Mar 4 2025, 12:43 AM · serviceops-radar, Release-Engineering-Team (Radar), SRE
dduvall added a comment to T387796: deployment server - low disk space on /srv.

I've removed a bunch of old images for now. I will talk with Tyler and Ahmon tomorrow about long term solutions.

Mar 4 2025, 12:42 AM · serviceops-radar, Release-Engineering-Team (Radar), SRE

Feb 27 2025

dduvall added a comment to T387414: Insert above/below throwing error instead of adding a new row.

Thank you!

Feb 27 2025, 8:02 PM · VisualEditor, VisualEditor-Tables
dduvall added a comment to T387351: [regression] ToC icons appears as black squares while assets are loading.

Thanks, Jon!

Feb 27 2025, 8:02 PM · User-notice-archive, MW-1.44-notes (1.44.0-wmf.20; 2025-03-11), Vector 2022, Regression
dduvall added a comment to T387414: Insert above/below throwing error instead of adding a new row.

Just a reminder that all train blockers need to be UBN. If this isn't a blocker, please remove the parent task.

Feb 27 2025, 7:35 PM · VisualEditor, VisualEditor-Tables
dduvall added a comment to T387351: [regression] ToC icons appears as black squares while assets are loading.

Just a reminder that all train blockers need to be UBN. If this isn't a blocker, please remove the parent task.

Feb 27 2025, 7:35 PM · User-notice-archive, MW-1.44-notes (1.44.0-wmf.20; 2025-03-11), Vector 2022, Regression
dduvall triaged T387414: Insert above/below throwing error instead of adding a new row as Unbreak Now! priority.
Feb 27 2025, 7:34 PM · VisualEditor, VisualEditor-Tables
dduvall raised the priority of T387351: [regression] ToC icons appears as black squares while assets are loading from High to Unbreak Now!.
Feb 27 2025, 7:34 PM · User-notice-archive, MW-1.44-notes (1.44.0-wmf.20; 2025-03-11), Vector 2022, Regression
dduvall added a comment to T387351: [regression] ToC icons appears as black squares while assets are loading.

What's the status here? Do we need a cherry-pick to wmf.18 to unblock the train?

Feb 27 2025, 7:30 PM · User-notice-archive, MW-1.44-notes (1.44.0-wmf.20; 2025-03-11), Vector 2022, Regression
dduvall added a comment to T387414: Insert above/below throwing error instead of adding a new row.

Should this block today's train promotion?

Feb 27 2025, 7:28 PM · VisualEditor, VisualEditor-Tables

Feb 26 2025

dduvall added a comment to T386947: https://gitlab.wikimedia.org/repos/test-platform/catalyst/ci-charts buildkit failure.

This error seems to emanate from Blubber's BuildKit frontend, not buildkitd.

Feb 26 2025, 10:02 PM · GitLab (CI & Job Runners)
dduvall updated the task description for T387388: Wikimedia\Rdbms\DBQueryError: Error 1062: Duplicate entry for key 'file_name' Function: LocalFile::acquireFileIdFromNameQuery.
Feb 26 2025, 7:57 PM · MW-Interfaces-Team, MediaWiki-Uploading, Wikimedia-production-error
dduvall created T387388: Wikimedia\Rdbms\DBQueryError: Error 1062: Duplicate entry for key 'file_name' Function: LocalFile::acquireFileIdFromNameQuery.
Feb 26 2025, 7:55 PM · MW-Interfaces-Team, MediaWiki-Uploading, Wikimedia-production-error
dduvall closed T383243: Zuul/Jenkins: Investigate caching of build results for MediaWiki testsuite jobs as Resolved.
Feb 26 2025, 6:09 PM · Continuous-Integration-Infrastructure, Release-Engineering-Team (Doing 😎)
dduvall added a comment to T383243: Zuul/Jenkins: Investigate caching of build results for MediaWiki testsuite jobs.

Attempting a better summary before closing this:

Feb 26 2025, 6:08 PM · Continuous-Integration-Infrastructure, Release-Engineering-Team (Doing 😎)

Feb 21 2025

dduvall added a comment to T383243: Zuul/Jenkins: Investigate caching of build results for MediaWiki testsuite jobs.

@hashar Do you know if we can set the Jenkins result as SKIPPED/NOT_BUILT and have Gearman/Zuul honor that? (i.e. Will Zuul fail the pipeline on any non-success status or are there other success-ish statuses?)

Feb 21 2025, 7:21 PM · Continuous-Integration-Infrastructure, Release-Engineering-Team (Doing 😎)

Feb 19 2025

dduvall closed T386755: Multiple *-pipeline-test jobs failing to load pipelinelib with git error as Resolved.

Following a botched "safe" restart and subsequent systemctl restart jenkins, the issue seems to be resolved.

Feb 19 2025, 8:09 PM · cloud-services-team, Jenkins, LPL Essential (LPL Essential 2024 Jul-Oct), Toolhub, Continuous-Integration-Infrastructure, Striker, ci-test-error (WMF-deployed Build Failure)
dduvall added a comment to T386755: Multiple *-pipeline-test jobs failing to load pipelinelib with git error.

Seeing https://github.com/nodejs/build/issues/3754 as well which seems to support your hunch @taavi .

Feb 19 2025, 7:33 PM · cloud-services-team, Jenkins, LPL Essential (LPL Essential 2024 Jul-Oct), Toolhub, Continuous-Integration-Infrastructure, Striker, ci-test-error (WMF-deployed Build Failure)
dduvall added a comment to T386755: Multiple *-pipeline-test jobs failing to load pipelinelib with git error.

This feels like very familiar to T385553/T377803, and Java was upgraded yesterday on contint1002 per the Apt history log file so that matches too. Maybe let's try restarting the Jenkins service since that fixes the Puppet case of this error?

Feb 19 2025, 7:24 PM · cloud-services-team, Jenkins, LPL Essential (LPL Essential 2024 Jul-Oct), Toolhub, Continuous-Integration-Infrastructure, Striker, ci-test-error (WMF-deployed Build Failure)
dduvall claimed T386755: Multiple *-pipeline-test jobs failing to load pipelinelib with git error.
Feb 19 2025, 6:51 PM · cloud-services-team, Jenkins, LPL Essential (LPL Essential 2024 Jul-Oct), Toolhub, Continuous-Integration-Infrastructure, Striker, ci-test-error (WMF-deployed Build Failure)

Feb 14 2025

dduvall added a comment to T381359: Prototype using catalyst to deploy function-orchestrator patches ("milestone 1").

We think we can accomplish this using a slight variation on the documentation on running acceptance tests against your built image, but overriding KOKKURI_REGISTRY_PUBLIC to point to an internal registry (instead of the default docker-registry.wikimedia.org).

Feb 14 2025, 10:04 PM · Catalyst (ilopona), Abstract Wikipedia team (25Q3 (Jan–Mar)), function-orchestrator

Feb 5 2025

dduvall closed T378741: Don't redact stracktraces containing "eval()'d" in Phatality as Resolved.
Feb 5 2025, 5:41 PM · Release-Engineering-Team (Doing 😎), Phatality