Dzahn (Daniel Zahn)
SRE in team collaboration servicesAdministrator

Projects (38)
View All

Calendar

User Details

User Since: Sep 30 2014, 4:39 PM (575 w, 17 h)
Roles: Administrator
Availability: Available
IRC Nick: mutante
LDAP User: Dzahn
MediaWiki User: Mutante [ Global Accounts ]

Recent Activity
View All

Today

Dzahn closed T406671: SystemdUnitFailed - prometheus-ethtool-exporter.service on gitlab-runner2003 as Resolved.

[gitlab-runner2003:~] $ sudo systemctl list-units --state=failed
  UNIT LOAD ACTIVE SUB DESCRIPTION
0 loaded units listed.

Wed, Oct 8, 2:48 AM · collaboration-services

Dzahn renamed T406671: SystemdUnitFailed - prometheus-ethtool-exporter.service on gitlab-runner2003 from SystemdUnitFailed to SystemdUnitFailed - prometheus-ethtool-exporter.service on gitlab-runner2003.

Wed, Oct 8, 2:47 AM · collaboration-services

Yesterday

Dzahn placed T361090: Move k8s miscweb blackbox checks out of microsites puppet module up for grabs.

Tue, Oct 7, 10:07 PM · collaboration-services

Dzahn added a comment to T361090: Move k8s miscweb blackbox checks out of microsites puppet module.

This is blocked on T397264 and that is a question for observability but probably not a priority for either them or us right now.

Tue, Oct 7, 10:06 PM · collaboration-services

Dzahn added a subtask for T397264: create a new place for prometheus/alertmanager checks not tied to physical machines: T361090: Move k8s miscweb blackbox checks out of microsites puppet module.

Tue, Oct 7, 10:05 PM · SRE Observability (FY2025/2026-Q1), Observability-Alerting, collaboration-services, SRE

Dzahn added a parent task for T361090: Move k8s miscweb blackbox checks out of microsites puppet module: T397264: create a new place for prometheus/alertmanager checks not tied to physical machines.

Tue, Oct 7, 10:05 PM · collaboration-services

Dzahn closed T406622: PuppetDisabled - releases2003 as Resolved.

--> T405352#11251315

Tue, Oct 7, 5:18 PM · collaboration-services

Dzahn renamed T406622: PuppetDisabled - releases2003 from PuppetDisabled to PuppetDisabled - releases2003.

Tue, Oct 7, 5:18 PM · collaboration-services

Dzahn added a comment to T405888: Puppet agent failure detected on instance deployment-maps-master02 in project deployment-prep.

The chain that causes this is:

Tue, Oct 7, 12:14 AM · Beta-Cluster-Infrastructure

Mon, Oct 6

Dzahn added a comment to T405788: No Puppet resources found on instance deployment-puppetserver-1 on project deployment-prep.

Mon, Oct 6, 11:50 PM · Beta-Cluster-Infrastructure

Dzahn updated subscribers of T405796: Requesting access to restricted for AramilFeraxa.

@MKopec It looks like you have provided everything needed. It's just that the "clinic duty" that handles access request changes every week. And just last week the schedule switched from "Monday to Monday" to "Wednesday to Wednesday". So it will continue with @FCeratto-WMF or @jijiki . I was just trying to reduce turn-around time by adding requirements.

Mon, Oct 6, 6:26 PM · SRE, SRE-Access-Requests

Dzahn added a comment to T335407: Wikipedia Android app is not available on Codesearch.

In the context of "official apps on github" also see T405525

Mon, Oct 6, 4:57 PM · Wikipedia-Android-App-Backlog, VPS-project-Codesearch

Dzahn added a comment to T406500: The Wikipedia iOS app should be added to Codesearch.

In the context of "official repos on GitHub" also see T405525

Mon, Oct 6, 4:55 PM · Wikipedia-iOS-App-Backlog, VPS-project-Codesearch

Fri, Oct 3

Dzahn closed T406366: SystemdUnitFailed - zuul-executor as Resolved.

Fri, Oct 3, 8:08 PM · collaboration-services

Dzahn closed T405366: Grant Access to analytics-privatedata-users for BTracy-WMF as Resolved.

ah:) cool. I did not mean to open it again. that was just because I had the tab already open when you closed it.

Fri, Oct 3, 7:45 PM · LDAP-Access-Requests, Data-Engineering, SRE, SRE-Access-Requests

Dzahn reopened T405366: Grant Access to analytics-privatedata-users for BTracy-WMF as "Open".

Fri, Oct 3, 7:44 PM · LDAP-Access-Requests, Data-Engineering, SRE, SRE-Access-Requests

Dzahn added a comment to T405366: Grant Access to analytics-privatedata-users for BTracy-WMF.

Yes, it seems like that is the case. This goes back to T405366#11210719.

Fri, Oct 3, 7:41 PM · LDAP-Access-Requests, Data-Engineering, SRE, SRE-Access-Requests

Dzahn merged task T406378: SystemdUnitFailed into T406366: SystemdUnitFailed - zuul-executor.

Fri, Oct 3, 7:19 PM · collaboration-services

Dzahn merged T406378: SystemdUnitFailed into T406366: SystemdUnitFailed - zuul-executor.

Fri, Oct 3, 7:19 PM · collaboration-services

Dzahn added a comment to T406334: Gerrit switchover between secondary instances.

This would only be a thing though if we permanently keep 3 or more gerrit servers around.

Fri, Oct 3, 7:00 PM · Gerrit, collaboration-services

Dzahn added a comment to T406366: SystemdUnitFailed - zuul-executor.

root cause: typo. $ instead of @ in erb template ....

Fri, Oct 3, 5:17 PM · collaboration-services

Dzahn added a comment to T406333: gerrit: config tweaks.

re: test platform - that is also T363196 (T329444, T240933, T256563, .. ) - there is/was a gerrit test instance in devtools. puppetized.. so using the same config. but it was shut down because using the LDAP for user auth there is against policy, afair.

Fri, Oct 3, 5:13 PM · Gerrit, collaboration-services

Dzahn claimed T406366: SystemdUnitFailed - zuul-executor.

broken by unexpected issue after https://gerrit.wikimedia.org/r/c/operations/puppet/+/1193141

Fri, Oct 3, 5:08 PM · collaboration-services

Dzahn renamed T406366: SystemdUnitFailed - zuul-executor from SystemdUnitFailed to SystemdUnitFailed - zuul-executor.

Fri, Oct 3, 5:07 PM · collaboration-services

Thu, Oct 2

Dzahn updated subscribers of T406243: Requesting access to deployment for VolkerE.

Thu, Oct 2, 5:03 PM · SRE, SRE-Access-Requests

Dzahn closed T405129: Requesting access to analytics-privatedata-users for tais-lessa as Resolved.

Thu, Oct 2, 5:03 PM · Data-Engineering, SRE, SRE-Access-Requests

Dzahn added a comment to T405129: Requesting access to analytics-privatedata-users for tais-lessa.

Thank you @cmadeo

Thu, Oct 2, 5:02 PM · Data-Engineering, SRE, SRE-Access-Requests

Dzahn updated the task description for T405129: Requesting access to analytics-privatedata-users for tais-lessa.

Thu, Oct 2, 4:55 PM · Data-Engineering, SRE, SRE-Access-Requests

Dzahn changed the status of T405129: Requesting access to analytics-privatedata-users for tais-lessa from Stalled to In Progress.

Thu, Oct 2, 4:55 PM · Data-Engineering, SRE, SRE-Access-Requests

Wed, Oct 1

Dzahn added a comment to T401859: upgrade wikistats cloud VPS project to trixie.

we will see tomorrow if the timers / jobs ran

Wed, Oct 1, 11:52 PM · VPS-project-Wikistats

Dzahn added a comment to T401859: upgrade wikistats cloud VPS project to trixie.

After the puppet changes above and some manual unmounting / mounting and fixing /etc/fstab on the old instance; puppet runs are now without errors on both instances, old and new

Wed, Oct 1, 11:51 PM · VPS-project-Wikistats

Dzahn added a comment to T401859: upgrade wikistats cloud VPS project to trixie.

https://gerrit.wikimedia.org/r/c/operations/puppet/+/1192985

Wed, Oct 1, 11:34 PM · VPS-project-Wikistats

Dzahn added a comment to T401859: upgrade wikistats cloud VPS project to trixie.

https://gerrit.wikimedia.org/r/c/operations/puppet/+/1192986

Wed, Oct 1, 11:33 PM · VPS-project-Wikistats

Dzahn added a comment to T406106: Grant Access to wmde and nda for Maria Lechner WMDE.

This ticket is mostly a duplicate of T405917 now. (but don't worry about it too much, not a big deal, it is being handled either way)

Wed, Oct 1, 11:09 PM · SRE, LDAP-Access-Requests

Dzahn added a comment to T401859: upgrade wikistats cloud VPS project to trixie.

made a new cinder volume backup-trixie and attached it to new trixie instance..

Wed, Oct 1, 8:16 PM · VPS-project-Wikistats

Dzahn moved T268199: Graduate codesearch to production from Work in Progress to Backlog on the collaboration-services board.

Wed, Oct 1, 7:45 PM · collaboration-services, VPS-project-Codesearch

Dzahn moved T402889: Puppet CA certificate Puppet CA: mailman-puppetmaster.mailman.eqiad.wmflabs expired from Work in Progress to Consultation on the collaboration-services board.

Wed, Oct 1, 7:44 PM · SRE Observability, collaboration-services

Dzahn added a comment to T406124: phabricator.wmcloud.org account verification request: itsmoon.

Done as suggested. The email address has been verified.

Wed, Oct 1, 7:31 PM · collaboration-services, VPS-project-Phabricator

Dzahn added a comment to T405940: eqiad row C/D Collaboration Services host migrations.

Done! Added both in gcal just now.

Wed, Oct 1, 7:28 PM · collaboration-services, SRE, DC-Ops, ops-eqiad

Dzahn updated the task description for T405713: Requesting access to analytics-privatedata-users for gengh.

Wed, Oct 1, 7:26 PM · SRE, SRE-Access-Requests

Dzahn placed T405713: Requesting access to analytics-privatedata-users for gengh up for grabs.

Thanks. Checking the approval box and setting to "in progress":)

Wed, Oct 1, 7:25 PM · SRE, SRE-Access-Requests

Dzahn changed the status of T405713: Requesting access to analytics-privatedata-users for gengh from Stalled to In Progress.

Wed, Oct 1, 7:25 PM · SRE, SRE-Access-Requests

Dzahn added a comment to T405940: eqiad row C/D Collaboration Services host migrations.

@RobH We can indeed do the gitlab-runners first and separate them. Let's do that.

Wed, Oct 1, 5:24 PM · collaboration-services, SRE, DC-Ops, ops-eqiad

Dzahn placed T405796: Requesting access to restricted for AramilFeraxa up for grabs.

Wed, Oct 1, 5:10 PM · SRE, SRE-Access-Requests

Dzahn updated subscribers of T405796: Requesting access to restricted for AramilFeraxa.

Thank you, Tyler.

Wed, Oct 1, 5:10 PM · SRE, SRE-Access-Requests

Dzahn added a comment to T405940: eqiad row C/D Collaboration Services host migrations.

@RobH What's your preferred way to schedule this? Want to let me know which slots work for you? Or should we just suggest something via Google calendar?

Wed, Oct 1, 5:07 PM · collaboration-services, SRE, DC-Ops, ops-eqiad

Dzahn assigned T405796: Requesting access to restricted for AramilFeraxa to thcipriani.

Hi Tyler, there is a request for the "restricted" group here. They want to run maintenance scripts on the deployment server. Details at T405796#11221398

Wed, Oct 1, 4:55 PM · SRE, SRE-Access-Requests

Dzahn moved T405917: Requesting access to Superset for marialechnerwmde from Untriaged to Manager/NDA Approval/Confirmation on the SRE-Access-Requests board.

Wed, Oct 1, 4:53 PM · LDAP-Access-Requests, SRE, SRE-Access-Requests

Dzahn added a comment to T405917: Requesting access to Superset for marialechnerwmde.

(this should be like https://gerrit.wikimedia.org/r/c/operations/puppet/+/1191507/4/modules/admin/data/data.yaml)

Wed, Oct 1, 4:53 PM · LDAP-Access-Requests, SRE, SRE-Access-Requests

Dzahn added a project to T405917: Requesting access to Superset for marialechnerwmde: LDAP-Access-Requests.

Wed, Oct 1, 4:52 PM · LDAP-Access-Requests, SRE, SRE-Access-Requests

Dzahn added a comment to T405917: Requesting access to Superset for marialechnerwmde.

@Maria_Lechner_WMDE Please send an email to Katie Francis of Legal (https://meta.wikimedia.org/wiki/User:KFrancis_(WMF)) to get the NDA signing process started. Once she confirms here that is done we can complete the ticket and give you the needed groups.

Wed, Oct 1, 4:51 PM · LDAP-Access-Requests, SRE, SRE-Access-Requests

Dzahn changed the status of T405713: Requesting access to analytics-privatedata-users for gengh from In Progress to Stalled.

stalled on manager approval

Wed, Oct 1, 4:46 PM · SRE, SRE-Access-Requests

Dzahn changed the status of T405129: Requesting access to analytics-privatedata-users for tais-lessa from In Progress to Stalled.

stalled on manager approval.

Wed, Oct 1, 4:45 PM · Data-Engineering, SRE, SRE-Access-Requests

Dzahn added a comment to T406124: phabricator.wmcloud.org account verification request: itsmoon.

when trying to verify I got:

Wed, Oct 1, 4:33 PM · collaboration-services, VPS-project-Phabricator

Dzahn added a comment to T388022: Phabricator test project requires email verification but can't send email.

In T388022#11231344, @Pppery wrote:

It hasn't been fixed. I asked phabricator.wmcloud.org to resend me a verification email and it didn't arrive.

Wed, Oct 1, 4:29 PM · collaboration-services, VPS-project-Phabricator

Dzahn added a comment to T402889: Puppet CA certificate Puppet CA: mailman-puppetmaster.mailman.eqiad.wmflabs expired.

One mouse click and this instance is deleted. The renewal part is moot now.

Wed, Oct 1, 4:22 PM · SRE Observability, collaboration-services

Dzahn added a comment to T405124: Requesting access to deployment for ebomani..

@EBomani You can start by taking a look at the list of bastion hosts.

Wed, Oct 1, 3:41 AM · SRE, SRE-Access-Requests

Tue, Sep 30

Dzahn added a comment to T388022: Phabricator test project requires email verification but can't send email.

Seems like this is only in the phabricator config itself after all.. or something was fixed meanwhile since this ticket was created.

Tue, Sep 30, 9:20 PM · collaboration-services, VPS-project-Phabricator

Dzahn added a comment to T388022: Phabricator test project requires email verification but can't send email.

I don't mind if you do that. That would give us an idea how common it is (or will become).

Tue, Sep 30, 9:17 PM · collaboration-services, VPS-project-Phabricator

Dzahn placed T403948: 'Fulltext' searches fail on test Phab instance due to ElasticSearch default config (PhutilAggregateException: All Fulltext Search hosts failed / CURLE_COULDNT_CONNECT) up for grabs.

Tue, Sep 30, 9:01 PM · Patch-For-Review, Release-Engineering-Team (Radar), collaboration-services, VPS-project-Phabricator

Dzahn added a comment to T403948: 'Fulltext' searches fail on test Phab instance due to ElasticSearch default config (PhutilAggregateException: All Fulltext Search hosts failed / CURLE_COULDNT_CONNECT).

We agreed the cluster_search config should be removed from both test and prod setup.

Tue, Sep 30, 9:00 PM · Patch-For-Review, Release-Engineering-Team (Radar), collaboration-services, VPS-project-Phabricator

We deployed to the test instance today.

Tue, Sep 30, 8:59 PM · Patch-For-Review, Release-Engineering-Team (Radar), collaboration-services, VPS-project-Phabricator

Dzahn closed T405124: Requesting access to deployment for ebomani. as Resolved.

Tue, Sep 30, 8:58 PM · SRE, SRE-Access-Requests

Dzahn added a comment to T405124: Requesting access to deployment for ebomani..

You have deployment access now. Welcome to deployers, @EBomani

Tue, Sep 30, 8:47 PM · SRE, SRE-Access-Requests

Dzahn added a comment to T405777: Create glam.incubator.wikimedia.org.

But the point is we need the stability and Wikidata integration of something on production, rather than just on cloud.

Tue, Sep 30, 8:07 PM · Wiki-Setup (Create), GLAM, Wikispore, incubator.wikimedia.org

Dzahn added a comment to T406062: Puppet failure on releases1003:9100.

root cause: T405352

Tue, Sep 30, 7:13 PM · collaboration-services

Dzahn renamed T406062: Puppet failure on releases1003:9100 from PuppetFailure to Puppet failure on releases1003:9100.

Tue, Sep 30, 7:12 PM · collaboration-services

Dzahn closed T406061: Puppet failure on contint1002:9100 as Resolved.

Tue, Sep 30, 7:11 PM · collaboration-services

Dzahn renamed T406061: Puppet failure on contint1002:9100 from PuppetFailure to Puppet failure on contint1002:9100.

Tue, Sep 30, 7:01 PM · collaboration-services

Dzahn added a comment to T405118: Set up zuul scheduler on zuul1001.

I tried it and I can confirm using mysql+mymysql gets us past the error.

Tue, Sep 30, 6:55 PM · collaboration-services, Essential-Work, Continuous-Integration-Infrastructure (Zuul upgrade)

Dzahn added a comment to T405118: Set up zuul scheduler on zuul1001.

It's possible that mysql+pymysql would also just work. but it's not super clear.

Tue, Sep 30, 6:34 PM · collaboration-services, Essential-Work, Continuous-Integration-Infrastructure (Zuul upgrade)

Dzahn moved T405352: APT error when installing Jenkins package in releases instances from Work in Progress to Consultation on the collaboration-services board.

Tue, Sep 30, 5:40 PM · collaboration-services, Continuous-Integration-Infrastructure, Jenkins

Dzahn placed T405352: APT error when installing Jenkins package in releases instances up for grabs.

Tue, Sep 30, 5:39 PM · collaboration-services, Continuous-Integration-Infrastructure, Jenkins

Dzahn added a comment to T388022: Phabricator test project requires email verification but can't send email.

The production instance is connected to other systems handling the user sign-up. MediaWiki/SUL and LDAP/developer accounts with their own mechanisms to prevent abuse. The test instance doesn't have these. So I am not sure about the "more locked down" part here.

Tue, Sep 30, 5:36 PM · collaboration-services, VPS-project-Phabricator

Dzahn added a comment to T388022: Phabricator test project requires email verification but can't send email.

a task to request account activation is a bit of a hurdle to get into the test instance [and requires SRE time

Tue, Sep 30, 4:26 PM · collaboration-services, VPS-project-Phabricator

Dzahn added a comment to T401157: Phorge setup check caching is misbehaving, leading to many duck-sound=quack requests.

phab test instance is now configured to 256M and puppet is enabled again

Tue, Sep 30, 4:17 PM · collaboration-services, User-brennen, Release-Engineering-Team, Phabricator

Dzahn added a comment to T403542: Drop unexpected/unneeded database tables in Phabricator.

It has also been checked on the phab test instance and those tables did not exist there.

Tue, Sep 30, 4:07 PM · DBA, Release-Engineering-Team (Priority Backlog 📥), Phabricator

Mon, Sep 29

Dzahn added a comment to T405118: Set up zuul scheduler on zuul1001.

is this what is missing?

Mon, Sep 29, 8:09 PM · collaboration-services, Essential-Work, Continuous-Integration-Infrastructure (Zuul upgrade)

Dzahn added a comment to T405118: Set up zuul scheduler on zuul1001.

First attempt to start this new systemd unit. Currently fails with:

Mon, Sep 29, 8:07 PM · collaboration-services, Essential-Work, Continuous-Integration-Infrastructure (Zuul upgrade)

Dzahn added a comment to T405165: Fetching mediawiki GPG keys fail with error "No data" due to User-Agent requirement.

@Ciencia_Al_Poder When you actually download the MediaWiki tarballs, as opposed to the GPG keys, would I be right in assuming they come from https://releases.wikimedia.org/mediawiki/ ?

Mon, Sep 29, 7:25 PM · Patch-For-Review, Traffic

Dzahn changed the status of T268199: Graduate codesearch to production, a subtask of T381417: aux-k8s-codfw cluster setup, from Open to Stalled.

Mon, Sep 29, 4:48 PM · SRE Observability (FY2024/2025-Q3), Infrastructure-Foundations, SRE, Kubernetes

Dzahn changed the status of T268199: Graduate codesearch to production from Open to Stalled.

Mon, Sep 29, 4:48 PM · collaboration-services, VPS-project-Codesearch

Dzahn closed T403847: Deploy zuul executor on executor VM, a subtask of T395938: puppetize setup of new zuul VMs, as Resolved.

Mon, Sep 29, 4:48 PM · Patch-For-Review, collaboration-services, Continuous-Integration-Infrastructure (Zuul upgrade)

Dzahn closed T403847: Deploy zuul executor on executor VM as Resolved.

I am claiming this as resolved.. an executor is deployed. And we have a separate task that will be about testing the whole setup once other missing parts have been created.

Mon, Sep 29, 4:48 PM · Release-Engineering-Team (Priority Backlog 📥), collaboration-services, Continuous-Integration-Infrastructure (Zuul upgrade)

Dzahn added a project to T405119: Set up zuul web on zuul1001: collaboration-services.

Mon, Sep 29, 3:57 PM · Patch-For-Review, collaboration-services, Essential-Work, Continuous-Integration-Infrastructure (Zuul upgrade)

Dzahn added a comment to T405599: Create project tag for Wikisource Reader App.

Great, sounds good. Thank you

Mon, Sep 29, 3:54 PM · Essential-Work, Release-Engineering-Team (Doing 😎), Project-Admins

Sat, Sep 27

Dzahn added a comment to T405777: Create glam.incubator.wikimedia.org.

Could this also just be created in the single incubator for everything, with a prefix and then become a production wiki at glam.wikimedia.org? I guess I am asking why the special case with a standalone incubator instead of the standard procedure or requesting a production wiki right away?

Sat, Sep 27, 5:03 AM · Wiki-Setup (Create), GLAM, Wikispore, incubator.wikimedia.org

Dzahn merged task T405820: SystemdUnitFailed into T405789: SystemdUnitFailed - gitlab - backup-restore.

Sat, Sep 27, 4:54 AM · collaboration-services

Dzahn merged T405820: SystemdUnitFailed into T405789: SystemdUnitFailed - gitlab - backup-restore.

Sat, Sep 27, 4:54 AM · collaboration-services

Dzahn merged task T405813: SystemdUnitFailed into T405789: SystemdUnitFailed - gitlab - backup-restore.

Sat, Sep 27, 12:30 AM · collaboration-services

Dzahn merged T405813: SystemdUnitFailed into T405789: SystemdUnitFailed - gitlab - backup-restore.

Sat, Sep 27, 12:30 AM · collaboration-services