TAG-TO-UPLOAD - DEBIAN - SERVICE DESIGN / DEPLOYMENT PLAN ========================================================= Overall structure and dataflow ------------------------------ * Uploader (DD or DM) makes signed git tag (containing metadata forming instructions to tag2upload service) * Uploader pushes said tag to salsa. [1] * salsa sends webhook to tag2upload service. * tag2upload service : provides an HTTPS service accessible to salsa : fishes url and tag name out of webhook json : checks to see if the tag is at all relevant : retrieves tag data (git shallow clone) ! verifies signature on the tag ! parses the tag metadata ! checks that salsa repo url is basically sane ! checks to see if signed by DD, or DM for appropriate package - obtains relevant git history - obtains, if applicable, orig tarball from archive - makes source package # signs source package and "canonical view" git tag - pushes history and both tags to dgit-repos git server - uploads source package to archive ! reports activities by email : shows status of package building to enquirers via www * archive publishes package as normal [1] In principle other git servers would be possible but it would have to be restricted to ones where we can either avoid, or stop, them being used as a channel for a DoS attack against the tag2upload service. Privsep ------- The tag2upload service will have to have a signing key that can upload source packages to the archive. We do not want that signing key to be abused. In particular, even though it will be in a hardware token we want to avoid giving unrestricted access to use that key, to code which itself has a large attack surface. In particular, source package construction is very complex. So there will be a privilege separation arrangement, as described above. Different tasks run in a different security context: : runs on the Manager, which is web-accessible and not trusted very much ! is fully trusted and has access to the signing key - runs in the discardable VM or container, controlled by `!' # is achieved by the `dgit rpush' protocol, where the trusted (invoking, signing) part offers a restricted signing oracle to the less-trusted (building) part. The signing oracle will check that the files to be signed are roughly in the right form and that they name the right source package. It will construct the "canonical view" git tag itself from metadata provided by the building part. The signing oracle has the information from the now-verified git tag (since it operating in the context of a particular request) and will only sign for the same source package and version. Service architecture -------------------- I propose the following architecture for the tag2upload service. There are three systems involved: I. Manager (`:`) Hardly trusted. * Database (sqlite) containing queue, and historical data. * Conventional webserver offering TLS and using Let's Encrypt. * Manager daemon. Manager daemon has the following tasks: * Web-service-style "application server" written in some scripting language listens on a local TCP port, handles HTTP connections proxied by the webserver. * Receives webbook requests. Checks that the calling IP address is salsa. Parses the JSON. Checks tag name to see if it seems of interest. If so, fetches the actual tag data (git shallow clone) and sees if it looks plausible, and if so, stores it in the db. If an Oracle client is waiting, feeds it the tag and url. * Server for very simple protocol, used by Oracle to obtain work to do. Accessed via ssh with restricted key (`ssh ... nc`). * Manager daemon web service also offers basic query API and web pages showing recent activity, for human tracking. (To all comers.) II. Oracle (`!`) Trusted to use the signing key. (Key itself is in a hardware token.) Not exposed to source package contents. Not exposed to the web. Not exposed via the git protocol, not even as a client. * Uses ssh to connect to manager's simple Oracle protocol port. Manager sends Oracle the signed tag, and repository URL. * Sends an email saying what it is about to process. (We do this in the Oracle so that less-trusted components don't get to hide their misbheaviours by not sending reports.) * Checks that the tag is signed by someone in the keyring (and that it uses a good enough hash function). (Oracle has a copy of the keyrings and dm allow list.) * Parses the tag to find the metadata including source package name, target suite, and version. Checks that the signer is authorised for this package. * Checks that the source repository URL is basically sane. (But does not access it - the Builder does that, below.) * Arranges that the Builder is reset (see below). * ssh's to the Builder to have the builder fetch the git data. * Runs dgit rpush, specifying the package, version and target suite on the command line. Target host is the Builder. (We use the existing dgit rpush signing oracle protocol.) * Sends an email saying what it did. * Reports the outcome success/failure and a summary line to the Manager via the still-open manager protocol connection. III. Builder (`-`) Does the actual source package conversion. Largely trusts the Oracle. Trusted as to source package contents, but not otherwise. Oracle can reset this. So it is a VM or a chroot. We propose to use the same schroot configuration as for a buildd, subject to consultation with DSA as to the best approach. * On instructions from the Oracle (via incoming ssh): - Fetches the git objects for the maintainer's tag from Salsa. - Fetches the git objects for the existing canonical view from the dgit-repos git server. - Fetches necessary origs from the archive. - Converts the git history to the canonical form (treesame to the source package) by adding necessary synthetic commits. - Builds the source package - Uses the rpush protocol to obtain signed git tag (on the canonical git form) and signed .dsc and .changes. - Pushes the git objects to the dgit-repos server. - Uploads the .dsc and .changes to the archive. * Packet filter limiting outgoing connections to salsa, dgit-repos, and the Debian archive, Incoming connections come only from the Oracle. Reproducibility, metadata and auditing -------------------------------------- The trusted part of the tag2upload service will keep some logs, particularly of each tag it is told about and what the disposition of that was, and when it was retried. Also, it will send the following information to a public mailing list: - The tag object data for any tag it decides to process, before it passes it to the VM. - A report (more or less, a shell transcript) of each processing attempt - The list will also be the public email address of the tag2upload robot's signing key The generated .dscs will contain additional fields Git-Tag-Tagger: Firstname Surname "tagger" line from the git tag converted to deb822 format Git-Tag-Info: tag= fp= is the git object ID of the tag object (if someone wants to obtain referenced git objects, they can be found on the dgit-repos git server) is the "fingerprint_in_hex" from the VALIDSIG line in the gpgv output. This additional metadata is needed to be able to tell by looking at the .dsc who the original uploader was (which might be different to the maintainer, in the sponsorship case). (Programs which use the uploader signature identity will send mails to the mailing list mentioned above, until they have been updated. This is not desirable but not a blocker for deployment.) The generated .changes will contain copies of the two .dsc fields above. The upload will contain a .source_buildinfo. This will list the versions of the software running in the Builder, which is primarily what controls the generated .dsc. The versions of dgit-infrastructure and git running in the trusted part are also relevant because the trusted part assembles outgoing tagger lines etc. and interprets the incoming git tag; however, in our deployment we intend to maintain them in sync, and anyway our ad-hoc reproduction tooling will not be able to arrange for them to be different. So the outside-VM version information will not be included. Eventually there could be a mode for sbuild (related to binary build reproduction), or a suitable script, which can verify a reproduction attempt. For now the src:dgit test suite will check that the upload is reproducible if run again in the same environment. Emails ------ Emails are sent to: 1. The username associated with the signing key 2. The tagger (email address from the git tag object) 3. A public mailing list selected (or created) for the purpose 1 and 2 will often be the same. This provides feedback to the person making the signature. The person preparing (rather than, maybe, sponsoring) the upload (Changed-By in .changes) will be notified by the archive software. The email report will contain at least: * The target distro, package, suite and version * The URL from which the git objectx were downloadeed * Whether the operation succeeded, and error messages if it didn't. Email is sent by the Oracle feeding a file to `ssh smarthost sendmail -t` not by implementing SMTP, to reduce the attack surface. DoS --- This service is not very resistant to DoS attacks. In particular, sending it bad URLs might stall it (since it has to retry failing URLs). So we (i) do not expose it to anyone but salsa and (ii) limit it to trying to fetch salsa urls. Making very many tags on salsa would stress this tag2upload service a bit but not fatally, and it would be a DoS against salsa too. After signature verification, we are much more vulnerable to DoS. An approved signer can get the service to do a lot of work. That is the purpose of the service, indeed.