CHANGELOG.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529

# 0.0.111 (01.10.2021)
- require minimum version of NCLS

# 0.0.110 (20.09.21)
- fix count_overlaps with keep_nonoverlapping=False
- fix subtract with more than 1024 intervals (new fix)

# 0.0.109 (16.09.21)
- fix overlap invert behavior
- add intersect invert flag
- fix subtract in cases where more than 1024 intervals overlapped a single interval

# 0.0.106/107/108(hotfixes) (07/8.09.21)
- fix join with slack mutating first arg
- add flag use_other_strand in join, nearest, k_nearest
- fix categorical-bug in newer versions of pandas
- add function pr.version_info() to print relevant version flags for debugging

# 0.0.105 (23.08.21)
- require bamread 0.0.10 to fix #211

# 0.0.104 (06/20.08.21)
- fix broken three_end/five_end code

# 0.0.102/103 (06.08.21)
- fix bug in pr.count_overlaps
- demand version 0.0.9 or greater from bamread

# 0.0.100/0.0.101 (20/21.06.21)
- add full-flag to read_gtf
- fix bug in join with slack > 0 when result is empty

# 0.0.99 (17.06.21)
- add nb_cpu arg to overlap

# 0.0.98 (07.06.21)
- fix k-nearest how=None

# 0.0.98 (20.05.21)
- fix casting in tss/tes

# 0.0.96/97 (07.05.21)
- fixes to .tes and .tss methods (issue #182)

# 0.0.95 (02.03.21)
- teensy fix bedclip
- add pretty-printing in jupyter notebooks (thanks to @rasi)

# 0.0.94 (27.02.21)
- print warning if start and end columns have different dtypes

# 0.0.93 (25.02.21)
- add max_disjoint for maximal disjoint set

# 0.0.91-92 (15.01.21)
- hotfix for 0.0.90

# 0.0.90 (03.01.21)
- fix #165 slow set operations on small files with many chromosomes (thanks ndukler)

# 0.0.89 (16.11.20)
- fix #159 (thanks cfriedline)

# 0.0.88 (09.11.20)
- fix bug when concatting stranded and unstranded pyranges (thanks cfriedline, issue #160)

# 0.0.87 (23.10.20)
- fix bug in join with left/right option

# 0.0.86 (05.10.20)
- add slack-option to merge

# 0.0.85 (17.09.20)
- fix error when parsing gtf-files with whitespace in value-tags

# 0.0.84 (18.08.20)
- add option to report overlap in join

# 0.0.83 (18.08.20)
- hotfix

# 0.0.82 (18.08.20)
- fix error introduced in 0.0.80

# 0.0.81 (13.08.20)
- fix Fisher's implementation

# 0.0.80 (10.08.20)
- fix reassigning chromosomes in apply

# 0.0.79 (08.06.20)
- fix bug in features.introns where the gene_id column was overwritten (issue #134)

# 0.0.78 (18.03.20)
- add reader for bigwig (pr.read_bigwig)
- fix cluster (allow for multiple by arguments)
- optimize to_bigwig slightly
- fix: overlap did not recognize invert-argument

# 0.0.77 (24.03.20)
- add api-docs
- make default strandedness of apply-pair equal None
- add pr.from_string() to create a PyRanges from a multiline string
- remove set_columns, set on .columns directly
- apply numpy-methods to pyranges
- add pr.get_fasta(gr, path)

# 0.0.76 (20.02.20)
- fix leftover print in itergrs

# 0.0.75 (20.02.20)
- reset index when reading pyranges from df
- ignore reinit error in ray
- did not use copy_df in init

# 0.0.74 (12.02.20)
- support for multiple (repeating) attributes in gtf reading
- fix handling of kwargs in apply, apply_pair, apply_chunks
- add to_example(nrows=10) to get a copy-paste friendly representation of a PyRanges
- add pr.from_dict() to create a PyRanges from a dict (like the ones produced with to_example)

# 0.0.73 (03.02.20)
- fix small bug in jaccard
- remove leftover debug-print in pr.random()
- add experimental gr.stats.forbes
- fix handling of kwargs in apply, apply_pair, apply_chunks

# 0.0.72 (03.02.20)
- random also takes dict as chromsizes argument (like {"chr1": 249, "chr2": 242})
- fix reldist bug when grs have different chromosomes

# 0.0.71 (30.01.20)
- fix various issues with reading and writing gtf/gff3 (1-indexing, removed final ";" in gff3 attribute col when writing)
- remove ModuleNotFoundException in __init__.py (3.5 < only)
- gr.overlap(gr2) now has default argument how="first", i.e. only return overlapping intervals once, even though there are multiple overlapping features in gr
- fix bug in pr.stats.mcc when using stranded data

# 0.0.70 (24.01.20)
- add Simes method to pr.stats
- add keys argument to pr.iter
- make strand=None default arg for concat
- gr.split() does opposite of merge
- pr.count_overlaps(grs, features=None) like bedtools multiintersect added
- set mkl.set_num_threads to 1 in __init__

# 0.0.69 (22.01.20)
- add value col argument to to_bigwig (thanks https://github.com/liyao001)

# 0.0.68 (21.01.20)
- fix regression: slack changes dtype from int32 to int64

# 0.0.67 (10.01.20)
- add dtypes attribute to pyranges
- fix left and right join when chromosomes missing

# 0.0.66 (03.01.20)
- add argument sparse to read_bam. Setting it to False fetches more columns.

# 0.0.65 (10.12.19)
- fix column names after read_gtf so they work with GenomicFeatures
- add flag chain, make False by default to to_* methods
- genomicfeatures: add tss/tes-methods
- fix column names after read_gtf so they work with GenomicFeatures
- remove Strand column with unstrand() even if PyRanges is not stranded
- reading gff and gtf now consistent and column names from attributes are in lower_snake_case

# 0.0.64 (28.11.19)
- add missing example data (ending with gz) to pyranges
- add rowbased_spearman, rowbased_pearson and rowbased_rankdata to pyranges.stats
- pyranges now accept columns with integer names, like pandas

# 0.0.63 (14.11.19)
- ignore index when inserting Series
- able to add dictionary of dfs to a pyranges
- remove FDR from fisher_exact, but add fdr as own method in stats
- make stats.mcc faster
- make stats.mcc work without a genome
- fix gff3 reading when metadata contains spaces

# 0.0.62 (11.11.19)
- fix fisher exact when given pd.Series
- fisher_exact: only use pseudocounts for OR

# 0.0.61 (11.11.19)
- add outer, inner and left join
- add fisher exact
- insert series/dataframes to pyranges with + operator
- gr.Whatever = pd.Series(...) now ignores index
- add gr.copy() method to create deep copy

# 0.0.60 (10.11.19)
- add k-nearest
- ensure that start/end have the same dtype after calling slack
- breaking change: new_position takes no default arg
- new_position takes an argument swap
- .length returns a python integer
- breaking change: lengths returns a vector by default

# 0.0.59 (28.10.19)
- fix attributerrors on pyranges (thanks https://github.com/MuhammedHasan)
- add reader for gff3
- add writer for gff3
- add count flag to cluster/cluster_by

# 0.0.58 (25.10.19)
- fix merge print functions
- make pickleable
- add iter as alias for itergrs in pr. namespace
- gr.length() shows nucleotide length (sum of all interval lengths)
- gr.lengths() takes as_dict=False flag to return as vector
- fix slack in join: added columns when joining with itself
- fix print for unstranded pyranges: printed tail and head of first chromosome

# 0.0.57 (10.10.19)
- add overlap-flag to tile
- add chain to print-method
- bugfix: printing stranded pyranges sorted output even though sort was false
- bugfix: wrong number hidden cols on very small terminal widths
- bugfix: unstrand did not change underlying dict to chromosomes only
- show number of hidden columns in header
- tests: mismatches in strand between dict and dataframes
- .df/.as_df() now returns with non-duplicated index

# 0).0.56 (25.09.19)
- add possibility of 5-end and 3-end in slack being different/none
- add slack to join-method
- add new_position method to take union or intersection of two pairs of Start/End-columns in pyranges

# 0.0.55 (13.09.19)
- Add int64-flag to method pr.random.

# 0.0.54 (10.09.19)
- Ensure that Chromosome and Strand is str dtype before creating category
- Add check to ensure that the columns Chromosome, Start and End exist when trying to create a PyRanges

# 0.0.53 (02.09.19)
- Fix error in pypi file

# 0.0.52 (30.08.19)
Fixes:
  - fix creating duplicate indexes in pyrange apply both
  - fix regression where joining unstranded and stranded pyrange did not make a stranded pyrange
  - default was strand=False for a few methods, should have been None (i.e. autodetect)
  - read_bed now handles gzipped bed (if the file has the .gz extension)
  - now able to print untraditional strands which are not strings
  - fix drop when "Strand" is part of what is to be dropped
  - more robust checking if column is in gr

Additions:
  - print functions take formatting-argument {"Start": "{:,}"}

Changes:
  - print shows sorted stranded data in Start/End order
  - print dynamically selects number of untraditional strands and hidden columns to display
  - read_bed now takes nrows arg
  - now assertion is raised if trying to drop "Chromosome", "Start" or "End" (instead of ignoring)
  - to_bed, to_gtf, to_csv now take compression argument ("infer" by default)
  - to_csv writes the header as default

# 0.0.51 (01.08.19)
Additions:
  - pr.itergrs added to iterate over the dfs from multiple pyranges at the same time

Changes:
  - pybigwig and bamread are optional dependencies that need to be manually installed (like ray)


# 0.0.50 (29.07.19)
Additions:
  - pr.random(n=1000, length=100, chromsizes=None, strand=True) creates a random PyRanges from a PyRanges of chromosome sizes.

Changes:
  - make __iter__ return natsorted items

Removals:
  - insert. use join instead

Fixes:
  - bug in boolean indexing due to __iter__ returning wrong sort order

# 0.0.49 (26.07.19)
Hotfix:
  - bug in assign (strand=False, by default, not None)

# 0.0.48 (25.07.19)
Additions:
  - head(n=8)
  - tail(n=8)
  - sample(n=8)
  - set_columns(new_names) to set new column names
  - argument like to drop, which takes string describing regex (gr.drop(like="_left|_right"))
  - add count (number of intervals) to merge and merge_by

Fixes:
  - 5X faster boolean indexing
  - fix some bugs in features.introns when data was missing

Changes:
  - coverage renamed to_rle
  - if drop used without argument, not dropping Strand by default

# 0.0.47 (19.07.2019)
hotfixes

# 0.0.46 (19.07.2019)
Additions:
  - cluster and merge takes argument by to only merge cluster within specific features
  - gr.features.introns added. Can use by="gene" or by="transcript"
  - new data: pr.data.gencode_gtf and pr.data.ucsc_bed
  - can subset pyrange with boolean vector
  - sort also takes argument by (sort without arg sorts on start/end)

# 0.0.45 (14.06.2019)
Fixes:
  - bug in subset which removed strand
  - bug when setting Strand with setattr
  - bug when setting Chromosome with setattr

Changes:
  - new method to compute cluster (3x as fast)
  - string-arg to drop not interpreted as regex
  - drop or keep do not take drop_strand. Only unstrand can drop strand.

Additions:
  - subsetting with new col order rearranges columns

# 0.0.44 (04.06.19)
Changes:
  - Now possible to reset Strand/Chromosome

Additions:
  - gr.drop_duplicate_positions(strand=None) # None means auto => true if stranded otherwise False
  - add test data pr.data.chromsizes()
  - pr.gf.tile_genome(genome_pyrange, tile_size, tile_last=False) (like GenomicRanges tileGenome)
  - pr.gf.genome_bounds(pyrange, genome_pyrange, clip=False) (like UCSC bedclip)

# 0.0.43 (29.05.19)
Fixes:
  - fix bug in tostring
  - fix bug in multithreading

Additions:
  - add apply_chunks, which operates on chunks, instead of chromosome-dfs.

Changes:
  - add nb_cpu argument to all functions
  - add number of columns and stranded/unstranded to tostring
  - add ... as last column, when there are more columns than possible to show
  - use , as thousands separator in tostring for number rows/cols


# 0.0.42 (16.05.19)
Additions:
 - allow keyword-arguments to apply, apply_pair (see example in the docs)

Changes:
  - to_csv etc. returns the objects themselves, so they can be used in method chains
  - methods called tile/window instead of tiles/windows


Fixes:
  - fix print when len(pr) < entries to print
  - tile


# 0.0.41 (14.05.19)
Additions:
 - add slack-flag to cluster/merge
 - print joined positions possible
 - add simple methods for printing without breaking the chain (p, mp, sp, tmp, rp)

Removals:
 - settings in pyranges. Added print methods instead.

Improvement:
 - print methods faster, especially for pyranges with many cols


# 0.0.40 (13.05.19)
Additions:
  - pyranges_db now out on PyPI

Changes:
  - PyRanges can now have Strand column with other data than "+" or "-", but it is considered unstranded.
  - Ensure that slack parameter is always integer.
  - no keep_metadata-flag in windows. Metadata is always kept. Can call drop() beforehand if metadata should not be kept.

Remove:
  - remove confusing keep flag from drop method (use gr[cols_to_keep] instead)

Fixes:
  - add missing ... in pyranges tostring

# 0.0.39 (09.05.19)
Removal:
  - remove sandbox module

# 0.0.37-38 (09.05.19)
Changes:
  - pyranges constructor is copy-by-default

# 0.0.36 (09.05.19)
Additions:
  - add insert method which uses overlap

Changes:
  - read_bed does not fail when strand is "."
  - read_bed considers bed unstranded if Strand has other values than +/-


# 0.0.35 (26.04.19)
Changes:
  - tssify/tesify renamed five_end/three_end
  - five_end/three end fails when data does not contain strand

Fixes:
  - slack changed pyrange in-place


# 0.0.34 (25.04.19)
Fixes:
  - assign changed pyrange in-place


# 0.0.33 (25.04.19)
Changes:
  - minor bugfix


# 0.0.32 (25.04.19)
Changes:
  - Use gr.to_bed for output_methods, not gr.out.bed
  - Remove copy_df flag in constructor; using df.copy() is terser
  - change flag extended in constructor to int64 (default False)


# 0.0.31 (24.04.19)
Changes:
  - Make int32 default for Start/End

Additions:
  - PyRanges now has window-function, like bedtools makewindows

Fixes:
  - getitem sometimes returned int32-pyrange despite being given int64-pyrange
  - doing nearest two times in a row sometimes failed due to minor suffix-bug


# 0.0.30 (23.04.19)
Changes:
  - Make col first argument of assign


# 0.0.29 (23.04.19)
Changes:
  - Move pyranges db to own module to remove mysql-requirement (made wheelmaking hard)

Additions:
  - add assign and subset methods on pyrange


# 0.0.28 (22.04.19)
- Only refer to and use ray in dispatcher

# 0.0.27 (22.04.19)
Fixes:
  - raise Exception when encountering non-"+-" Strand values


# 0.0.26 (15.04.19)

Additions:
  - pr.sandbox.Debug context manager for pipes

Fixes:
  - coverage errored with value_col

# 0.0.25 (15.04.19)
Additions:
  - Can set columns on a PyRanges using a dict of iterables
  - gr() takes subset and col argument, like dplyr mutate and select

Removed:
  - disallow eval string, must use lambdas, e.g.: gr(lambda df: df.Score > 0)

Fixes:
  - drop (and getitem) small fix
  - sometimes had empty dfs in dict because of unused categoricals


# 0.0.24 (15.04.19)
Hotfix:
  - left in dbg statements

# 0.0.23 (15.04.19)
Hotfix:
  - unstrand() did not always remove strand info

# 0.0.22 (14.04.19)
Additions:
  - pr.PyRanges() returns empty PyRange # before you needed pr.PyRanges({})
  - pyranges are now callable. Examples: gr("df.Score > 0") and gr("df.A.astype(str) + mysuffix")
  - can subset PyRanges with a dict of boolean vectors
  - pr.data.exons(), pr.data.cpg()
  - gr.unstrand() removes strand information from a PyRanges
  - throw exception if trying to drop Strand from df without setting drop_strand=True
  - adding a Strand column to the PyRanges makes it stranded

Changes:
  - write dtype as category, not int8/int16/...

Fixes:
  - remove empty dfs in the dict given to the PyRanges constructor

Removed:
  - gr.data.epigenome_roadmap()


# 0.0.21 (14.04.19)
Additions:
  - gr.cluster(): assign ID to each cluster found by merge
  - gr.columns: return the columns in the pyranges
  - gr.drop: drop columns based on regex or list
  - gr[["Score", "Name"]]: select subset of columns
Fixes:
  - gr.stranded errored if chromosomes were ints