User Details
- User Since
- Sep 30 2014, 4:39 PM (575 w, 17 h)
- Roles
- Administrator
- Availability
- Available
- IRC Nick
- mutante
- LDAP User
- Dzahn
- MediaWiki User
- Mutante [ Global Accounts ]
Today
[gitlab-runner2003:~] $ sudo systemctl list-units --state=failed UNIT LOAD ACTIVE SUB DESCRIPTION 0 loaded units listed.
Yesterday
This is blocked on T397264 and that is a question for observability but probably not a priority for either them or us right now.
--> T405352#11251315
The chain that causes this is:
Mon, Oct 6
likely related to https://gerrit.wikimedia.org/r/c/operations/puppet/+/1190300 / T399891
@MKopec It looks like you have provided everything needed. It's just that the "clinic duty" that handles access request changes every week. And just last week the schedule switched from "Monday to Monday" to "Wednesday to Wednesday". So it will continue with @FCeratto-WMF or @jijiki . I was just trying to reduce turn-around time by adding requirements.
In the context of "official apps on github" also see T405525
In the context of "official repos on GitHub" also see T405525
Fri, Oct 3
ah:) cool. I did not mean to open it again. that was just because I had the tab already open when you closed it.
Yes, it seems like that is the case. This goes back to T405366#11210719.
This would only be a thing though if we permanently keep 3 or more gerrit servers around.
root cause: typo. $ instead of @ in erb template ....
broken by unexpected issue after https://gerrit.wikimedia.org/r/c/operations/puppet/+/1193141
Thu, Oct 2
Thank you @cmadeo
Wed, Oct 1
we will see tomorrow if the timers / jobs ran
After the puppet changes above and some manual unmounting / mounting and fixing /etc/fstab on the old instance; puppet runs are now without errors on both instances, old and new
This ticket is mostly a duplicate of T405917 now. (but don't worry about it too much, not a big deal, it is being handled either way)
made a new cinder volume backup-trixie and attached it to new trixie instance..
Done as suggested. The email address has been verified.
Done! Added both in gcal just now.
Thanks. Checking the approval box and setting to "in progress":)
@RobH We can indeed do the gitlab-runners first and separate them. Let's do that.
Thank you, Tyler.
@RobH What's your preferred way to schedule this? Want to let me know which slots work for you? Or should we just suggest something via Google calendar?
Hi Tyler, there is a request for the "restricted" group here. They want to run maintenance scripts on the deployment server. Details at T405796#11221398
@Maria_Lechner_WMDE Please send an email to Katie Francis of Legal (https://meta.wikimedia.org/wiki/User:KFrancis_(WMF)) to get the NDA signing process started. Once she confirms here that is done we can complete the ticket and give you the needed groups.
stalled on manager approval
stalled on manager approval.
when trying to verify I got:
One mouse click and this instance is deleted. The renewal part is moot now.
@EBomani You can start by taking a look at the list of bastion hosts.
Tue, Sep 30
Seems like this is only in the phabricator config itself after all.. or something was fixed meanwhile since this ticket was created.
I don't mind if you do that. That would give us an idea how common it is (or will become).
We agreed the cluster_search config should be removed from both test and prod setup.
We deployed to the test instance today.
You have deployment access now. Welcome to deployers, @EBomani
But the point is we need the stability and Wikidata integration of something on production, rather than just on cloud.
root cause: T405352
I tried it and I can confirm using mysql+mymysql gets us past the error.
It's possible that mysql+pymysql would also just work. but it's not super clear.
The production instance is connected to other systems handling the user sign-up. MediaWiki/SUL and LDAP/developer accounts with their own mechanisms to prevent abuse. The test instance doesn't have these. So I am not sure about the "more locked down" part here.
a task to request account activation is a bit of a hurdle to get into the test instance [and requires SRE time
phab test instance is now configured to 256M and puppet is enabled again
It has also been checked on the phab test instance and those tables did not exist there.
Mon, Sep 29
is this what is missing?
First attempt to start this new systemd unit. Currently fails with:
@Ciencia_Al_Poder When you actually download the MediaWiki tarballs, as opposed to the GPG keys, would I be right in assuming they come from https://releases.wikimedia.org/mediawiki/ ?
I am claiming this as resolved.. an executor is deployed. And we have a separate task that will be about testing the whole setup once other missing parts have been created.
Great, sounds good. Thank you
Sat, Sep 27
Could this also just be created in the single incubator for everything, with a prefix and then become a production wiki at glam.wikimedia.org? I guess I am asking why the special case with a standalone incubator instead of the standard procedure or requesting a production wiki right away?
Fri, Sep 26
To get this kicked off; here are other things we will need:
Thank you! That does clarify it. deployment server and the restricted group should work, as far as I see right now.
this sounds like it is about running maintenance commands on maintenance servers (mwmaint*). Is that right?