summaryrefslogtreecommitdiff
path: root/P2P
Commit message (Collapse)AuthorAge
* avoid failure to lock content of removed file causing drop etc to failJoey Hess2020-07-25
| | | | | | | | | | | | | | This was already prevented in other ways, but as seen in commit c30fd24d91d1217b7b764953dd3ded6b54d78b2e, those were a bit fragile. And I'm not sure races were avoided in every case before. At least a race between two separate git-annex processes, dropping the same content, seemed possible. This way, if locking fails, and the content is not present, it will always do the right thing. Also, it avoids the overhead of an unncessary inAnnex check for every file. This commit was sponsored by Denis Dzyubenko on Patreon.
* remove redundant importsJoey Hess2020-06-22
| | | | | | | | Clean build under ghc 8.8.3, which seems to do better at finding cases where two imports both provide the same symbol, and warns about one of them. This commit was sponsored by Ilya Shlyakhter on Patreon.
* async exception safetyJoey Hess2020-06-05
|
* make runRelayService async exception safeJoey Hess2020-06-03
| | | | | | | | Use withCreateProcess so the helper process will be shut down if the thread is killed. Use withAsync to ensure the helper threads get shut down too.
* use filepath-bytestring for annex object manipulationsJoey Hess2019-12-11
| | | | | | | | | | | git-annex find is now RawFilePath end to end, no string conversions. So is git-annex get when it does not need to get anything. So this is a major milestone on optimisation. Benchmarks indicate around 30% speedup in both commands. Probably many other performance improvements. All or nearly all places where a file is statted use RawFilePath now.
* convert TopFilePath to use RawFilePathJoey Hess2019-12-09
| | | | | | | | | | | | | Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.
* wip RawFilePath 2x git-annex find speedupJoey Hess2019-11-26
| | | | | | | | | | | | | | | | | | | | | | | | | Finally builds (oh the agoncy of making it build), but still very unmergable, only Command.Find is included and lots of stuff is badly hacked to make it compile. Benchmarking vs master, this git-annex find is significantly faster! Specifically: num files old new speedup 48500 4.77 3.73 28% 12500 1.36 1.02 66% 20 0.075 0.074 0% (so startup time is unchanged) That's without really finishing the optimization. Things still to do: * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, decodeBS conversions. * Use versions of IO actions like getFileStatus that take a RawFilePath. * Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy. * Use ByteString for parsing git config to speed up startup. It's likely several of those will speed up git-annex find further. And other commands will certianly benefit even more.
* minor typosJoey Hess2019-03-27
|
* update licenses from GPL to AGPLJoey Hess2019-03-13
| | | | | | | | | | | | | | | | | | This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)
* distinguish between cached and uncached credsJoey Hess2018-12-04
| | | | | | | | | | | | | | | | p2p and multicast creds are not cached the same way that s3 and webdav creds are. The difference is that p2p and multicast obtain the creds themselves, as part of a process like pairing. So they're storing the only extant copy of the creds. In s3 and webdav etc the creds are provided by the cloud storage provider. This is a fine difference, but I do think it's a reasonable difference. If the user wants to prevent s3 and webdav etc creds from being stored unencrypted on disk, they won't feel the same about p2p auth tokens used for tor, or a multicast encryption key, or for that matter their local ssh private key. This commit was sponsored by Fernando Jimenez on Patreon.
* comment typoJoey Hess2018-11-12
|
* Fixed some other potential hangs in the P2P protocolJoey Hess2018-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Finishes the start made in 983c9d5a53189f71797591692c0ed675f5bd1c16, by handling the case where `transfer` fails for some other reason, and so the ReadContent callback does not get run. I don't know of a case where `transfer` does fail other than the locking dealt with in that commit, but it's good to have a guarantee. StoreContent and StoreContentTo had a similar problem. Things like `getViaTmp` may decide not to run the transfer action. And `transfer` could certianly fail, if another transfer of the same object was in progress. (Or a different object when annex.pidlock is set.) If the transfer action was not run, the content of the object would not all get consumed, and so would get interpreted as protocol commands, which would not go well. My approach to fixing all of these things is to set a TVar only once all the data in the transfer is known to have been read/written. This way the internals of `transfer`, `getViaTmp` etc don't matter. So in ReadContent, it checks if the transfer completed. If not, as long as it didn't throw an exception, send empty and Invalid data to the callback. On an exception the state of the protocol is unknown so it has to raise ProtoFailureException and close the connection, same as before. In StoreContent, if the transfer did not complete some portion of the DATA has been read, so the protocol is in an unknown state and it has to close the conection as well. (The ProtoFailureMessage used here matches the one in Annex.Transfer, which is the most likely reason. Not ideal to duplicate it..) StoreContent did not ever close the protocol connection before. So this is a protocol change, but only in an exceptional circumstance, and it's not going to break anything, because clients already need to deal with the connection breaking at any point. The way this new behavior looks (here origin has annex.pidlock = true so will only accept one upload to it at a time): git annex copy --to origin -J2 copy x (to origin...) ok copy y (to origin...) Lost connection (fd:25: hGetChar: end of file) This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
* git-annex-shell: fix transfer hangJoey Hess2018-11-06
| | | | | | | | | | | | | | | Fix hang when transferring the same objects to two different clients at the same time. (Or when annex.pidlock is used, two different objects to the same or different clients.) Could also potentially occur if a client was downloading an object and somehow lost connection but that git-annex-shell was still running and holding the transfer lock. This does not guarantee that, if `transfer` fails for some other reason, a DATA response will be made. This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
* Fix a P2P protocol hangJoey Hess2018-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When readContent got Nothing from prepSendAnnex, it did not run its callback, and the callback is what sends the DATA reply. sendContent checks with contentSize that the object file is present, but that doesn't really guarantee that prepSendAnnex won't return Nothing. So, it was possible for a P2P protocol GET to not receive a response, and appear to hang. When what it's really doing is waiting for the next protocol command. This seems most likely to happen when the annex is in direct mode, and the file being requested has been modified. It could also happen in an indirect mode repository if genInodeCache somehow failed. Perhaps due to a race with a drop of the content file. Fixed by making readContent behave the way its spec said it should, and run the callback with L.empty in this case. Note that, it's finee for readContent to send any amount of data to the callback, including L.empty. sendBytes deals with that by making sure it sends exactly the specified number of bytes, aborting the protocol if it's too short. So, when L.empty is sent, the protocol will end up aborting. This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
* simplifyJoey Hess2018-10-22
|
* instrument P2P --debug with connection and thread infoJoey Hess2018-10-22
| | | | | | For debugging http://git-annex.branchable.com/bugs/annex_get_-J_16_via_ssh_stalls_/ This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
* clean P2P protocol shutdown on EOF try 2Joey Hess2018-09-25
| | | | | | | | | Same goal as b18fb1e343e9654207fbebacf686659c75d0fb4c but without breaking backwards compatability. Just return IO exceptions when running the P2P protocol, so that git-annex-shell can detect eof and avoid the ugly message. This commit was sponsored by Ethan Aubin.
* Revert "clean P2P protocol shutdown on EOF"Joey Hess2018-09-25
| | | | | | | | | | | | | | | | | This reverts commit b18fb1e343e9654207fbebacf686659c75d0fb4c. That broke support for old git-annex-shell before p2pstdio was added. The immediate problem is that postAuth had a fallthrough case that sent an error back to the peer, but sending an error back when the connection is closed is surely not going to work. But thinking about it some more, making every function that uses receiveMessage need to handle ProtocolEOF adds a lot of complication, so I don't want to do that. The commit only cleaned up the test suite output a tiny bit, so I'm just gonna revert it for now.
* clean P2P protocol shutdown on EOFJoey Hess2018-09-13
| | | | | | | | | | | | Avoids "git-annex-shell: <stdin>: hGetChar: end of file" being displayed by the test suite, due to the way it runs git-annex-shell without using ssh. git-annex-shell over ssh was not affected because git-annex hangs up the ssh connection and so never sees the error message that git-annnex-shell probably did emit. This commit was sponsored by Ryan Newton on Patreon.
* enforce retrievalSecurityPolicyJoey Hess2018-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | Leveraged the existing verification code by making it also check the retrievalSecurityPolicy. Also, prevented getViaTmp from running the download action at all when the retrievalSecurityPolicy is going to prevent verifying and so storing it. Added annex.security.allow-unverified-downloads. A per-remote version would be nice to have too, but would need more plumbing, so KISS. (Bill the Cat reference not too over the top I hope. The point is to make this something the user reads the documentation for before using.) A few calls to verifyKeyContent and getViaTmp, that don't involve downloads from remotes, have RetrievalAllKeysSecure hard-coded. It was also hard-coded for P2P.Annex and Command.RecvKey, to match the values of the corresponding remotes. A few things use retrieveKeyFile/retrieveKeyFileCheap without going through getViaTmp. * Command.Fsck when downloading content from a remote to verify it. That content does not get into the annex, so this is ok. * Command.AddUrl when using a remote to download an url; this is new content being added, so this is ok. This commit was sponsored by Fernando Jimenez on Patreon.
* improve indentJoey Hess2018-06-14
|
* GIT_ANNEX_SHELL_APPENDONLYJoey Hess2018-05-25
| | | | | | | Makes it allow writes, but not deletion of annexed content. Note that securing pushes to the git repository is left up to the user. This commit was sponsored by Jack Hill on Patreon.
* squash -Wsimplifiable-class-constraints warningsJoey Hess2018-04-22
| | | | I have not tested this with older ghc than 8.2.2.
* implement annex.retry et alJoey Hess2018-03-29
| | | | | | | Added annex.retry, annex.retry-delay, and per-remote versions to configure transfer retries. This commit was supported by the NSF-funded DataLad project.
* deal with unlocked filesJoey Hess2018-03-13
| | | | | | | | | | | | | | | | | P2P protocol version 1 adds VALID|INVALID after DATA; INVALID means the file was detected to change content while it was being sent and so we may not have received the valid content of the file. Added new MustVerify constructor for Verification, which forces verification even when annex.verify=false etc. This is used when INVALID and in protocol version 0. As well as changing git-annex-shell p2psdio, this makes git-annex tor remotes always force verification, since they don't yet use protocol version 1. Previously, annex.verify=false could skip verification when using tor remotes, and let bad data into the repository. This commit was sponsored by Jack Hill on Patreon.
* use total size from DATAJoey Hess2018-03-12
| | | | | | | | | | | | | | | | | | | | Noticed that getting a key whose size is not known resulted in a progress display that didn't include the percent complete. Fixed for P2P by making the size sent with DATA be used to update the meter's total size. In order for rateLimitMeterUpdate to also learn the total size, had to make it be passed the Meter, and some other reorg in Utility.Metered was also done so that --json-progress can construct a Meter to pass to rateLimitMeterUpdate. When the fallback rsync is done, the progress display still doesn't include the percent complete. Only way to fix that seems to be to let rsync display its output again, but that would conflict with git-annex's own progress meter, which is also being displayed. This commit was sponsored by Henrik Riomar on Patreon.
* no protocol 1 yetJoey Hess2018-03-12
|
* move protocol version stuff to the Net free monadJoey Hess2018-03-12
| | | | | | | Needs to be in Net not Local, so that Net actions can take the protocol version into account. This commit was sponsored by an anonymous bitcoin donor.
* version the P2P protocolJoey Hess2018-03-12
| | | | | | | | | | | | | | | | | | Unfortunately ReceiveMessage didn't handle unknown messages the way it was documented to; client sending VERSION would cause the server to return an ERROR and hang up. Fixed that, but old releases of git-annex use the P2P protocol for tor and will still have that behavior. So, version is not negotiated for Remote.P2P connections, only for Remote.Git connections, which will support VERSION from their first release. There will need to be a later flag day to change Remote.P2P; left a commented out line that is the only thing that will need to be changed then. Version 1 of the P2P protocol is not implemented yet, but updated the docs for the DATA change that will be allowed by that version. This commit was sponsored by Jeff Goeke-Smith on Patreon.
* p2p ssh connection poolsJoey Hess2018-03-08
| | | | | | | | | | | | | | | | | | | | | | Much like Remote.P2P, there's a pool of connections to a peer, in order to support concurrent operations. Deals with old git-annex-ssh on the remote that does not support p2pstdio, by only trying once to use it, and remembering if it's not supported. Made p2pstdio send an AUTH_SUCCESS with its uuid, which serves the dual purposes of something to detect to see that the connection is working, and a way to verify that it's connected to the right uuid. (There's a redundant uuid check since the uuid field is sent by git_annex_shell, but I anticipate that being removed later when the legacy git-annex-shell stuff gets removed.) Not entirely happy with Remote.Git.runSsh's behavior when the proto action fails. Running the fallback will work ok, but what will we do when the fallbacks later get removed? It might be better to try to reconnect, in case the connection got closed. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
* implemented git-annex-shell p2pstdioJoey Hess2018-03-07
| | | | | | | | | | | Not yet used by git-annex, but this will allow faster transfers etc than using individual ssh connections and rsync. Not called git-annex-shell p2p, because git-annex p2p does something else and I don't want two subcommands with the same name between the two for sanity reasons. This commit was sponsored by Øyvind Andersen Holm.
* make sure that lockContentShared is always paired with an inAnnex checkJoey Hess2018-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockContentShared had a screwy caveat that it didn't verify that the content was present when locking it, but in the most common case, eg indirect mode, it failed to lock when the content is not present. That led to a few callers forgetting to check inAnnex when using it, but the potential data loss was unlikely to be noticed because it only affected direct mode I think. Fix data loss bug when the local repository uses direct mode, and a locally modified file is dropped from a remote repsitory. The bug caused the modified file to be counted as a copy of the original file. (This is not a severe bug because in such a situation, dropping from the remote and then modifying the file is allowed and has the same end result.) And, in content locking over tor, when the remote repository is in direct mode, it neglected to check that the content was actually present when locking it. This could cause git annex drop to remove the only copy of a file when it thought the tor remote had a copy. So, make lockContentShared do its own inAnnex check. This could perhaps be optimised for direct mode, to avoid the check then, since locking the content necessarily verifies it exists there, but I have not bothered with that. This commit was sponsored by Jeff Goeke-Smith on Patreon.
* add readonly mode to serve P2P protocolJoey Hess2018-03-07
| | | | | | This will be used by git-annex-shell when configured to be readonly. This commit was sponsored by Nick Daly on Patreon.
* refactorJoey Hess2018-03-06
|
* comment typoJoey Hess2018-03-06
|
* AssociatedFile newtypeJoey Hess2017-03-10
| | | | | | To prevent any further mistakes like 301aff34c42d896038c42e0a0bc7b1cf71e0ad22 This commit was sponsored by Francois Marier on Patreon.
* fix build on windowsJoey Hess2016-12-30
|
* refactorJoey Hess2016-12-30
|
* Always use filesystem encoding for all file and handle reads and writes.Joey Hess2016-12-24
| | | | | This is a big scary change. I have convinced myself it should be safe. I hope!
* enable-tor: When run as a regular user, test a connection back to the hidden ↵Joey Hess2016-12-24
| | | | | | | | | | | | | | | | | | | service over tor. This way we know that after enable-tor, the tor hidden service is fully published and working, and so there should be no problems with it at pairing time. It has to start up its own temporary listener on the hidden service. It would be nice to have it start the remotedaemon running, so that extra step is not needed afterwards. But, there may already be a remotedaemon running, in communication with the assistant and we don't want to start another one. I thought about trying to HUP any running remotedaemon, but Windows does not make it easy to do that. In any case, having the user start the remotedaemon themselves lets them know it needs to be running to serve the hidden service. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
* refactorJoey Hess2016-12-24
|
* Revert "close"Joey Hess2016-12-24
| | | | | | This reverts commit 3aaabc906b776075e190739b42157959b4e09f31. Commit contained incomplete work.
* closeJoey Hess2016-12-22
|
* include tor-annex in hidden service directory namesJoey Hess2016-12-21
| | | | | | To make it easier to manage/delete them etc. Backwards compatablity is preserved for existing tor configs.
* Revert "p2p --link now defaults to setting up a bi-directional link"Joey Hess2016-12-16
| | | | | | | | This reverts commit 3037feb1bf9ae9c857b45191309965859b23b0b6. On second thought, this was an overcomplication of what should be the lowest-level primitive. Let's build bi-directional links at the pairing level with eg magic wormhole.
* p2p --link now defaults to setting up a bi-directional linkJoey Hess2016-12-16
| | | | | | | | | | | | | | | | | | | | | | | | | Both the local and remote git repositories get remotes added pointing at one-another. Makes pairing twice as easy! Security: The new LINK command in the protocol can be sent repeatedly, but only by a peer who has authenticated with us. So, it's entirely safe to add a link back to that peer, or to some other peer it knows about. Anything we receive over such a link, the peer could send us over the current connection. There is some risk of being flooded with LINKs, and adding too many remotes. To guard against that, there's a hard cap on the number of remotes that can be set up this way. This will only be a problem if setting up large p2p networks that have exceptional interconnectedness. A new, dedicated authtoken is created when sending LINK. This also allows, in theory, using a p2p network like tor, to learn about links on other networks, like telehash. This commit was sponsored by Bruno BEAUFILS on Patreon.
* fix build with old ghcJoey Hess2016-12-10
|
* hang up connection after relayingJoey Hess2016-12-09
| | | | | | | Seems that git upload-pack outputs a "ONCDN " that is not read by the remote git receive-pack. This fixes: [2016-12-09 17:08:32.77159731] P2P > ERROR protocol parse error: "ONCDN "
* avoid exposing auth tokens in debugJoey Hess2016-12-09
|
* debug dump P2P messagesJoey Hess2016-12-09
|