summaryrefslogtreecommitdiff
path: root/README.dsc-import
blob: f5bb0bdb2e4673a75d8b2aff6ce6c474cff06f47 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
From ijackson Mon Sep 26 15:37:19 +0100 2016
X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil]
	[nil "Monday" "26" "September" "2016" "15:37:19" "+0100" "Ian Jackson" "ijackson@chiark.greenend.org.uk" nil nil "Intent to commit craziness - source package unpacking" "^From:" nil nil "9" nil nil nil nil nil nil nil nil nil nil]
	nil)
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <22505.12959.668142.478444@chiark.greenend.org.uk>
X-Mailer: VM 8.2.0b under 24.4.1 (i586-pc-linux-gnu)
From: Ian Jackson <ijackson@chiark.greenend.org.uk>
To: debian-dpkg@lists.debian.org,
    Guido Guenther <agx@debian.org>,
    Bernhard R. Link <brlink@debian.org>,
    vcs-pkg-discuss@lists.alioth.debian.org
Subject: Intent to commit craziness - source package unpacking
Date: Mon, 26 Sep 2016 15:37:19 +0100

tl;dr:

 * dpkg developers, please tell me whether I am making assumptions
   that are likely to become false.  Particularly, on the behaviour of
   successive runs of dpkg-source --before-build with successively
   longer series files.

 * git-buildpackage and git-dpm developers, please point me to
   information about what metadata to put into the commit message for
   a git commit which represents a dpkg-source quilt patch.  I would
   like these commits to be as convenient for gbp and git-dpm users as
   possible.


Hi.

Currently when dgit needs to import a .dsc into git, it just uses
dpkg-source -x, and git-add.  The result is a single commit where the
package springs into existence fully formed.  This is not as good as
it could be.  I would like to represent (in the git pseudohistory) the
way that the resulting tree is constructed from the input objects.

In particular, I would like to: represent the input tarballs as a
commit each (which all get merged together as if by git merge -s
subtree), and for quilt packages, each patch as a commit.  But I want
to avoid (as much as possible) reimplementing the package extraction
algorithm in dpkg-source.

dpkg-source does not currently provide interfaces that look like they
are intended for what I want to do.  And dgit wants to work with old
versions of dpkg, so I don't want to block on getting such interfaces
added (even supposing that a sane interface could be designed, which
is doubtful).

So I intend to do as follows.  (Please hold your nose.)

* dgit will untar each input tarball (other than the Debian tarball).

  This will be done by scanning the .dsc for things whose names look
  like (compressed) tarballs, and using the interfaces provided by
  Dpkg::Compression to get at the tarball.

  Each input tarball unpack will be done separately, and will be
  followed by git-add and git-write tree, to obtain a git tree object
  corresponding to the tarball contents.

  That tree object will be made into a commit object with no parents.
  (The package changelog will be searched for the earliest version
  with the right upstream version component, and the information found
  there used for the commit object's metadata.)

* dgit will then run dpkg-source -x --skip-patches.

  Again, git plumbing will be used to make this into a tree and a
  commit.  The commit will have as parents all the tarballs previous
  mentioned.  The metadata will come from the .dsc and/or the
  final changelog entry.

* dgit will look to see if the package is `3.0 (quilt)' and if so
  whether it has a series file.  (dgit already rejects packages with
  distro-specific series files, so we need worry only about a single
  debian/patches/series file.)

  If there is a series file, dgit will read it into memory.  It will
  then iterate over the series file, and each time:
    - write into its playground a series file containing one
      more non-comment non-empty line to previously
    - run dpkg-source --before-build (which will apply that
      additional patch)
    - make git tree and commit objects, using the metadata from
      the relevant patch file to make the commit (if available)
    - each commit object has as a parent the previous commit
      (either the previous commit, or the commit resulting from
      dpkg-source -x)

  After this the series file has been completely rewritten.

* dgit will then run one final invocation of dpkg-source
  --before-build.  This ought not to produce any changes, but if
  it does, they will be represented as another commit.

* As currently, there will be a final no-change-to-the-tree
  pseudomerge commit which stitches the package into the relevant dgit
  suite branch; ie something that looks as if it was made with git
  merge -s ours.

* As currently, dgit will take steps so that none of the git trees
  discussed above contain a .pc directory.


This has the following properties:

* Each input tarball is represented by a different commit; in usual
  cases these commits will be the same for every upload of the same
  upstream version.

* For `3.0 (quilt)' each patch's changes to the upstream files appears
  as a single git commit (as is the effect of the debian tarball).
  For `1.0' non-native, the effect of the diff is represented as a
  commit.  So eg `git blame' will show synthetic commits corresponding
  to the correct parts of the input source package.

* It is possible to `git-cherry-pick' etc. commits representing `3.0
  (quilt)' patches.  It is even possible fish out the patch stack as
  git branch and rebase it elsewhere etc., since the patch stack is
  represented as a contiguous series of commits which make only the
  relevant upstream changes.

* Every orig tarball in the source package is decompressed twice, but
  disk space for only one extra copy of its unpacked contents is
  needed.  (The converse would be possible in principle but would be
  very hard to arrange with the current interfaces provided by the
  various tools.)

* No back doors into the innards of dpkg-source (nor changes to
  dpkg-dev) are required.

* dgit does grow a dependency on Dpkg::Compression.

* Knowledge of the source format embedded in dgit is is restricted to
  iterating over tarballs and manipulating debian/patches/series,
  which dgit already does.

* dgit now depends on dpkg-source --before-build idempotently applying
  patches as they successively appear on debian/patches/series.

* Perhaps the git commits generated by dgit to represent patches can
  be made to round-trip nicely into tools like git-dpm and
  git-buildpackage.

  I have found the information about tags in gbp-dch(1), but that
  doesn't seem like it's applicable.

  I have also found the information about tags in gbp-pq(1).  From
  that it looks like I ought to generate "Gbp-Pq: Name" and "Gbp-Pq:
  Topic".

* The scheme I describe avoids introducing a dependency from dgit to
  git-buildpackage.  I might be able to replace the
  successive-patch-application part with an appropriate invocation of
  gbp-pq.  Would that be better ?

  Bear in mind that because the output of gbp-pq import doesn't
  contain debian/patches, I would need to rewrite its output (perhaps
  with git-filter-branch).


Comments welcome.  Please be quick - this is very close to the top of
my dgit todo list.


Thanks,
Ian.


-- 
Ian Jackson <ijackson@chiark.greenend.org.uk>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.

From ijackson Wed Sep 28 10:50:49 +0100 2016
X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil nil nil nil nil nil nil nil]
	[nil "Wednesday" "28" "September" "2016" "10:50:49" "+0100" "Ian Jackson" "ijackson@chiark.greenend.org.uk" "<22507.37497.633622.843659@chiark.greenend.org.uk>" nil "Re: Intent to commit craziness - source package unpacking" "^From:" nil nil "9" nil nil nil nil nil nil nil nil nil nil]
	nil)
X-Mozilla-Status: 0003
X-Mozilla-Status2: 00000000
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Message-ID: <22507.37497.633622.843659@chiark.greenend.org.uk>
In-Reply-To: <20160928010117.nqe2prbsbaqkbjza@gaara.hadrons.org>
References: <22505.12959.668142.478444@chiark.greenend.org.uk>
	<20160928010117.nqe2prbsbaqkbjza@gaara.hadrons.org>
X-Mailer: VM 8.2.0b under 24.4.1 (i586-pc-linux-gnu)
From: Ian Jackson <ijackson@chiark.greenend.org.uk>
To: Guillem Jover <guillem@debian.org>
Cc: debian-dpkg@lists.debian.org,
    Guido Guenther <agx@debian.org>,
    "Bernhard R. Link" <brlink@debian.org>,
    vcs-pkg-discuss@lists.alioth.debian.org
Subject: Re: Intent to commit craziness - source package unpacking
Date: Wed, 28 Sep 2016 10:50:49 +0100

Guillem Jover writes ("Re: Intent to commit craziness - source package =
unpacking"):
> On Mon, 2016-09-26 at 15:37:19 +0100, Ian Jackson wrote:
> > tl;dr:
> >=20
> >  * dpkg developers, please tell me whether I am making assumptions
> >    that are likely to become false.  Particularly, on the behaviour=
 of
> >    successive runs of dpkg-source --before-build with successively
> >    longer series files.
>=20
> For format =AB3.0 (quilt)=BB, that seems fine, to the point I'm fine =
even
> documenting this, which I can probably do for 1.18.11.

Great.

> For other formats, such as =AB2.0=BB, I don't think that's true, but =
I
> assume you don't care about that one anyway. But just mentioning
> because this behavior is probably format-specific. For =AB2.0=BB I
> think it could be fixed, and should not be too hard (not sure if it's=

> worth it though).

I think the right approach is perhaps to use --skip-patches and
--before-build only with 3.0 (quilt).  The that would leave 2.0 (or
other strange or future formats) producing a correct (although
possibly sub-optimal) import.

> > dpkg-source does not currently provide interfaces that look like th=
ey
> > are intended for what I want to do.  And dgit wants to work with ol=
d
> > versions of dpkg, so I don't want to block on getting such interfac=
es
> > added (even supposing that a sane interface could be designed, whic=
h
> > is doubtful).
>=20
> Even then I'm still interested in a decription of what you'd need
> ideally, to take into account when having a pass at cleaning up that
> part of the interface. I think you could be interested in a cleaner
> Dpkg::Source::* hierarchy, for the mid/long-term?

For `3.0 (quilt)' explicit interfaces for applying and unapplying
individual patches would help.  But really IMO such an interface ought
to be exposed on the command line rather than (or as well as) via a
Perl module.

Beyond that I find it hard to see what could make dgit's life easier.
Since dgit wants to construct a commit graph representing the source
package's innards, unless dpkg-source explicitly provides an interface
along those lines ("please output a graph of unpacked source tree
states and corresponding commit messages") dgit is still going to have
to know specially about most of the source package formats.

> > * dgit will untar each input tarball (other than the Debian tarball=
).
> >=20
> >   This will be done by scanning the .dsc for things whose names loo=
k
> >   like (compressed) tarballs, and using the interfaces provided by
> >   Dpkg::Compression to get at the tarball.
>=20
> Hmm, Dpkg::Source::Archive is currently private, but I might have a
> look at making it public if that would be helpful here.

I think the amount of logic I would have to replicate is minimal.

> > * As currently, dgit will take steps so that none of the git trees
> >   discussed above contain a .pc directory.
>=20
> As long as the directory does not disappear from the working tree,
> that should work.

Right, indeed it won't.

Thanks for your comments.  I feel unblocked :-).

Ian.

--=20
Ian Jackson <ijackson@chiark.greenend.org.uk>   These opinions are my o=
wn.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.