-
Notifications
You must be signed in to change notification settings - Fork 265
Expand file tree
/
Copy pathNEWS
More file actions
2226 lines (1382 loc) · 79.3 KB
/
NEWS
File metadata and controls
2226 lines (1382 loc) · 79.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
## Release a.b
Changes affecting specific commands:
* bcftools +af-dist
- New `-s, --samples` to print per-sample HWE probability, geometric mean
* bcftools mpileup
- Remove unused experimental INFO/MIN_PL_SUM annotation
- Add new FORMAT/QM annotation, to be used with the new `bcftools +trio-dnm3 --use-ALM` model.
* bcftools +trio-dnm3
- Add a new TrioDNM model and make it the new default. To prevent confusion, the
old +trio-dnm2 plugin was removed and replaced with +trio-dnm3. The original
+trio-dnm3 model can be run as
bcftools +trio-dnm3 --use-ALM
## Release 1.23 (16th December 2025)
Changes affecting the whole of bcftools, or multiple commands:
* The `-i/-e` filtering expressions and `-f` formatting in `query`
- Add a new function `smpl_COUNT()/sCOUNT()` which returns the number of elements (#2423)
Changes affecting specific commands:
* bcftools annotate
- Make dynamic variables read from a tab-delimited annotation file (#2151) work
also for regions. For example, while the first command below was functional, the
second was not (#2441)
bcftools annotate -a ann.tsv.gz -c CHROM,POS,-,SCORE,~STR -i'TAG={STR}' -k in.vcf
bcftools annotate -a ann.tsv.gz -c CHROM,BEG,END,SCORE,~STR -i'TAG={STR}' -k in.vcf
* bcftools consensus
- Fix a bug which prevented reading fasta files containing empty lines in their entirety (#2424)
- Fix a bug which causes `--absent` miss some absent positions
* bcftools csq
- Add support for complex substitutions, such as AC>TAA
* bcftools +fill-tags
- Fix header formatting error for INFO/F_MISSING which must be Number=1 (#2442)
- Make `-t 'F_MISSING'` work with `-S groups.txt` (#2447)
* bcftools gtcheck
- The program is now able to process gVCF blocks. Newly, monoallelic sites are excluded only
when the site is monoallelic in both query and genotype file. The new option --keep-refs
allows to always include monoallelic sites.
- Fix an error in parsing -i/-e command line options where the `qry:` and `gt:` prefix was
not stripped (#2432)
* bcftools mpileup
- Make `-d, --max-depth 0` set the depth to unlimited (#2435)
* bcftools norm
- Make the -i/-e filtering option work for all options, such as line merging and
duplication removal (#2415)
* bcftools query
- Numerical functions, such as SUM(INFO/DP), would previously return the value 0 when
executed on missing values. This was incorrect, newly a missing value is printed.
* bcftools reheader
- Add options `--samples-list` and `--samples-file` to allow renaming samples from a list of
samples on command line, rather than from a file of sample names (#2383)
* bcftools +split-vep
- Fix the option `-A, --all-fields`, it was not working properly and could lead to a segfault (#2473)
## Release 1.22 (30th May 2025)
Changes affecting the whole of bcftools, or multiple commands:
* Add support for matching lines by ID via the --pair-logic and --collapse options (#1739)
* The -i/-e filtering expressions
- The expressions now properly match the regex negation of missing values, e.g. -i 'TAG!~"\."' (#2355)
- Added support for Fisher's exact test
* Add the option `-v, --verbosity INT` to all bcftools commands and plugins. Verbosity values
bigger than 3 are passed to the underlying HTSlib library so that the user can investigate
network issues and other problems occurring at the library level.
Changes affecting specific commands:
* bcftools annotate
- Fix Number in the header definition of transferred FILTER and ID tags (#2335)
* bcftools call
- The `-s, --samples` option was not working properly, now also supporting
sample negation as advertised in the manual page, e.g. `-s ^sample1,sample2`
to include all samples but sample1 and sample2 (#2380)
* bcftools consensus
- Preserve entire missing gVCF blocks with --missing (#2350)
- Fixed a bug, the `-S, --samples-file` option is no longer ignored (#2398)
* bcftools convert
- The command `convert --gvcf2vcf` was not filling the REF allele when BCF was output (#243)
* bcftools csq
- Check the input GFF for features outside transcript boundaries and extend the transcript
to contain the feature fully (#2323)
- Add experimental support for alternative genetic code tables, accessible via
a new option `-C, --genetic-code` (#2368)
- Change in the `--unify-chr-names` option, no automatic sequence name modification
is attempted anymore, the prefixes to trim must be given explictly. For example,
if run with `--unify-chr-names chr,Chromosome,-`, the program will trim the "chr"
prefix in the VCF, "Chromosome" in the GFF, leaving the fasta unchanged (#2378)
* bcftools +fill-tags
- Thanks to the extension of filtering expressions with Fisher's exact test, the plugin
can now be used to add FT annotation (#1582)
* bcftools merge
- Preserve phasing in half-missing genotypes (#2331)
- The option `--merge none` is expected to create no new multiallelic sites, but it should
allow to merge, say, A>C with A>C,AT (#2333)
- Make `--merge both` work with indel-only records; for example, the multiallelic
site G>GT,T should be merged with G>GT (#2339)
- Do not merge symbolic alleles unless they have not just the same type, eg. <DEL>,
but also length, i.e the INFO/END coordinate (#2362)
- Fix a bug where an incorrectly formatted gVCF file with overlapping blocks would trigger
an infinite loop in the program (#2410)
* bcftools mpileup
- The -r/-R option newly merge overlapping regions, preventing the output of duplicate sites
* bcftools norm
- Print the number of removed duplicate sites in the final statistics (#2346)
- Preserve the original alleles in `--old-rec-tag` when `--check-ref s` requested (#2357)
- Print a warning when INFO/SVLEN is not defined as Number=A (#2371)
* plot-vcfstats
- Make the option `-s, --sample-names` functional again (#2353)
* bcftools +prune
- New option to remove or annotate clusters of sites within a window
* bcftools query
- The functions used in -i/-e filtering expressions (such as SUM, MEDIAN, etc) can be
now used in formatting expressions (#2271).
If the VCF contains INFO/AD and FORMAT/AD, try:
bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t [ %sSUM(FMT/AD)]'
bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t [ %SUM(FMT/AD)]'
bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t %SUM(FMT/AD)'
bcftools query test.vcf -f '%CHROM:%POS \t [ %AD] \t %SUM(INFO/AD)'
- Make it possible to refer to the ID column from the FORMAT expression (#2337)
bcftools query test.vcf -f 'ID=%ID ID=[ %/ID] vs FMT_ID=[ %ID]'
* bcftools roh
- New visualization tool misc/roh-viz, see below
* bcftools +setGT
- Support for setting missing genotypes with arbitrary ploidy via `-n c:./.` (#2303)
* bcftools +split-vep
- The `-s, --select` option was extended to print only one consequence. Previously it
was possible to select a single transcript (e.g., the one with the worst consequence),
and it was possible to filter by consequence severity (e.g., missing or worse),
but in some cases multiple consequences are reported within a single transcript
(e.g., start_lost&splice_region). The extended option allows to print the worst
part, for example as
--select primary:missense+:worst
* bcftools +trio-dnm2
- Fix a problem with --strictly-novel option which would neglect the presence of the apparent de novo
allele in the father for male offspring
- Fix a problem with uncallsed mosaic chrX variants in males
* roh-viz
- HTML/JavaScript visualization of bcftools/roh output and homozygosity rate.
* bcftools +vrfs
- New experimental plugin for scoring variants and assess site noisiness (variant read frequency profiles)
from a large number of unaffected parental samples
## Release 1.21 (12th September 2024)
Changes affecting the whole of bcftools, or multiple commands:
* Support multiple semicolon-separated strings when filtering by ID using -i/-e (#2190).
For example, `-i 'ID="rs123"'` now correctly matches `rs123;rs456`
* The filtering expression ILEN can be positive (insertion), negative (deletion), zero
(balanced substitutions), or set to missing value (symbolic alleles).
* bcftools query
* bcftools +split-vep
- The columns indices printed by default with `-H` (e.g., "#[1]CHROM") can be now
suppressed by giving the option twice `-HH` (#2152)
Changes affecting specific commands:
* bcftools annotate
- Support dynamic variables read from a tab-delimited annotation file (#2151)
For example, in the two cases below the field 'STR' from the -a file is required to match
the INFO/TAG in VCF. In the first example the alleles REF,ALT must match, in the second
example they are ignored. The option -k is required to output also records that were not
annotated:
bcftools annotate -a ann.tsv.gz -c CHROM,POS,REF,ALT,SCORE,~STR -i'TAG={STR}' -k in.vcf
bcftools annotate -a ann.tsv.gz -c CHROM,POS,-,-,SCORE,~STR -i'TAG={STR}' -k in.vcf
- When adding Type=String annotations from a tab-delimited file, encode characters with
special meaning using percent encoding (';', '=' in INFO and ':' in FORMAT) (#2202)
* bcftools consensus
- Allow to apply a reference allele which overlaps a previous deletion, there is no
need to complain about overlapping alleles in such case
- Fix a bug which required `-s -` to be present even when there were no samples in the VCF
(#2260)
* bcftools csq
- Fix a rare bug where indel combined with a substitution ending at exon boundary is
incorrectly predicted to have 'inframe' rather than 'frameshift' consequence (#2212)
* bcftools gtcheck
- Fix a segfault with --no-HWE-prob. The bug was introduced with the output format change in
1.19 which replaced the DC section with DCv2 (#2180)
- The number of matching genotypes in the DCv2 output was not calculated correctly with
non-zero `-E, --error-probability`. Consequently, also the average HWE score was incorrect.
The main output, the discordance score, was not affected by the bug
* bcftools +mendelian2
- Include the number of good cases where at least one of the trio genotypes has an alternate
allele (#2204)
- Fix the error message which would report the wrong sample when non-existent sample is given.
Note that bug only affected the error message, the program otherwise assigns the family
members correctly (#2242)
* bcftools merge
- Fix a severe bug in merging of FORMAT fields with Number=R and Number=A values. For example,
rows with high-coverage FORMAT/AD values (bigger or equal to 128) could have been assigned
to incorrect samples. The bug was introduced in version 1.19. For details see #2244.
* bcftools mpileup
- Return non-zero error code when the input BAM/CRAM file is truncated (#2177)
- Add FORMAT/AD annotation by default, disable with `-a -AD`
* bcftools norm
- Support realignment of symbolic <DUP.*> alleles, similarly to <DEL.*> added previously
(#1919,#2145)
- Fix in reporting reference allele genotypes with `--multi-overlaps .` (#2160)
- Support of duplicate removal of symbolic alleles of the same type but different SVLEN (#2182)
- New `-S, --sort` switch to optionally sort output records by allele (#1484)
- Add the `-i/-e` filtering options to select records for normalization. Note duplicate
removal ignores this option.
- Fix a bug where `--atomize` would not fill GT alleles for atomized SNVs followed by
an indel (#2239)
* bcftools +remove-overlaps
- Revamp the program to allow greater flexibility, with the following new options:
-M, --mark-tag TAG Mark -m sites with INFO/TAG
-m, --mark EXPR Mark (if also -M is present) or remove sites [overlap]
dup .. all overlapping sites
overlap .. overlapping sites
min(QUAL) .. mark sites with lowest QUAL until overlaps are resolved
--missing EXPR Value to use for missing tags with -m 'min(QUAL)'
0 .. the default
DP .. heuristics, scale maximum QUAL value proportionally to INFO/DP
--reverse Apply the reverse logic, for example preserve duplicates instead of removing
-O, --output-type t t: plain list of sites (chr,pos), tz: compressed list
* bcftools +tag2tag
- The conversions --LXX-to-XX, --XX-to-LXX were working but specific cases such as --LAD-to-AD were not.
- Print more informative error message when source tag type violiates VCF specification
* bcftools +trio-dnm2
- Better handling of the --strictly-novel functionality, especically with respect to chrX inheritance
## Release 1.20 (15th April 2024)
Changes affecting the whole of bcftools, or multiple commands:
* Add short option -W for --write-index. The option now accepts an optional parameter
which allows to choose between TBI and CSI index format.
Changes affecting specific commands:
* bcftools consensus
- Add new --regions-overlap option which allows to take into account overlapping deletions
that start out of the fasta file target region.
* bcftools isec
- Add new option `-l, --file-list` to read the list of file names from a file
* bcftools merge
- Add new option `--force-single` to support single-file edge case (#2100)
* bcftools mpileup
- Add new option --indels-cns for an alternative indel calling model, which should increase
the speed on long read data (thanks to using edlib) and the precision (thanks to a number
of heuristics).
* bcftools norm
- Change the order of atomization and multiallelic splitting (when both -a,-m are given)
from "atomize first, then split" to "split first, then atomize". This usually results
in a simpler VCF representation. The previous behaviour can be achieved by explicitly
streaming the output of the --atomize command into the --multiallelics splitting command.
- Fix Type=String multiallelic splitting for Number=A,R,G tags with incorrect number
of values.
- Merging into multiallelic sites with `bcftools norm -m +indels` did not work. This is
now fixed and the merging is now more strict about variant types, for example complex
events, such as AC>TGA, are not considered as indels anymore (#2084)
* bcftools reheader
- Allow reading the input file from a stream with --fai (#2088)
* bcftools +setGT
- Support for custom genotypes based on the allele with higher depth, such
as `--new-gt c:0/X` custom genotypes (#2065)
* bcftools +split-vep
- When only one of the tags is present, automatically choose INFO/BCSQ (the default
tag name produced by `bcftools csq`) or INFO/CSQ (produced by VEP). When both
tags are present, use the default INFO/CSQ.
- Transcript selection by MANE, PICK, and user-defined transcripts, for example
--select CANONICAL=YES
--select MANE_SELECT!=""
--select PolyPhen~probably_damaging
- Select all matching transcripts via --select, not just one
- Change automatic type parsing of VEP fields DNA_position, CDS_position, and Protein_position
from Integer to String, as it can be of the form "8586-8599/9231". The type Integer can be
still enforced with `-c cDNA_position:int,CDS_position:int,Protein_position:int`.
- Recognize `-c field:str`, not just `-c field:string`, as advertised in the usage page
- Fix a bug which made filtering expression containing missing values crash (#2098)
* bcftools stats
- When GT is missing but AD is present, the program determines the alternate allele from AD.
However, if the AD tag has incorrect number of values, the program would exit with an error
printing "Requested allele outside valid range". This is now fixed by taking into account
the actual number of ALT alleles.
* bcftools +tag2tag
- Support for conversion from tags using localized alleles (e.g. LPL, LAD) to the family of
standard tags (PL, AD)
* bcftools +trio-dnm2
- Extend --strictly-novel to exclude cases where the non-Mendelian allele
is the reference allele. The change is motivated by the observation that
this class of variants is enriched for errors (especially for indels),
and better corresponds with the option name.
## Release 1.19 (12th December 2023)
Changes affecting the whole of bcftools, or multiple commands:
* Filtering expressions can be given a file with list of strings to match, this
was previously possible only for the ID column. For example
ID=@file .. selects lines with ID present in the file
INFO/TAG=@file.txt .. selects lines where TAG has a string value listed in the file
INFO/TAG!=@file.txt .. TAG must not have a string value listed in the file
Allow to query REF,ALT columns directly, for example
-e 'REF="N"'
Changes affecting specific commands:
* bcftools annotate
- Fix `bcftools annotate --mark-sites`, VCF sites overlapping regions in a BED file
were not annotated (#1989)
- Add flexibility to FILTER column transfers and allow transfers within the same file,
across files, and in combination. For examples see
http://samtools.github.io/bcftools/howtos/annotate.html#transfer_filter_to_info
* bcftools call
- Output MIN_DP rather than MinDP in gVCF mode
- New `-*, --keep-unseen-allele` option to output the unobserved allele <*>,
intended for gVCF.
* bcftools head
- New `-s, --samples` option to include the #CHROM header line with samples.
* bcftools gtcheck
- Add output options `-o, --output` and `-O, --output-type`
- Add filtering options `-i, --include` and `-e, --exclude`
- Rename the short option `-e, --error-probability` from lower case to upper
case `-E, --error-probability`
- Changes to the output format, replace the DC section with DCv2:
- adds a new column for the number of matching genotypes
- The --error-probability is newly interpreted as the probability of erroneous
allele rather than genotype. In other words, the calculation of the discordance
score now considers the probability of genotyping error to be different
for HOM and HET genotypes, i.e. P(0/1|dsg=0) > P(1/1|dsg=0).
- fixes in HWE score calculation plus output average HWE score rather
than absolute HWE score
- better description of fields
* bcftools merge
- Add `-m` modifiers to suppress the output of the unseen allele <*> or <NON_REF>
at variant sites (e.g. `-m both,*`) or all sites (e.g. `-m both,**`)
* bcftools mpileup
- Output MIN_DP rather than MinDP in gVCF mode
* bcftools norm
- Add the number of joined lines to the summary output, for example
Lines total/split/joined/realigned/skipped: 6/0/3/0/0
- Allow combining -m and -a with --old-rec-tag (#2020)
- Symbolic <DEL> alleles caused norm to expand REF to the full length of the deletion.
This was not intended and problematic for long deletions, the REF allele should list
one base only (#2029)
* bcftools query
- Add new `-N, --disable-automatic-newline` option for pre-1.18 query formatting behavior
when newline would not be added when missing
- Make the automatic addition of the newline character in a more predictable way and,
when missing, always put it at the end of the expression. In version 1.18 it could
be added at the end of the expression (for per-site expressions) or inside the square
brackets (for per-sample expressions). The new behavior is:
- if the formatting expression contains a newline character, do nothing
- if there is no newline character and -N, --disable-automatic-newline is given, do nothing
- if there is no newline character and -N is not given, insert newline at the end of the expression
See #1969 for details
- Add new `-F, --print-filtered` option to output a default string for samples that would otherwise
be filtered by `-i/-e` expressions.
- Include sample name in the output header with `-H` whenever it makes sense (#1992)
* bcftools +spit-vep
- Fix on the fly filtering involving numeric subfields, e.g. `-i 'MAX_AF<0.001'` (#2039)
- Interpret default column type names (--columns-types) as entire strings, rather than
substrings to avoid unexpected spurious matches (i.e. internally add ^ and $ to all
field names)
* bcftools +trio-dnm2
- Do not flag paternal genotyping errors as de novo mutations. Specifically, when father's
chrX genotype is 0/1 and mother's 0/0, 0/1 in the child will not be marked as DNM.
* bcftools view
- Add new `-A, --trim-unseen-allele` option to remove the unseen allele <*> or <NON_REF>
at variant sites (`-A`) or all sites (`-AA`)
## Release 1.18 (25th July 2023)
Changes affecting the whole of bcftools, or multiple commands:
* Support auto indexing during writing BCF and VCF.gz via new `--write-index` option
Changes affecting specific commands:
* bcftools annotate
- The `-m, --mark-sites` option can be now used to mark all sites without the
need to provide the `-a` file (#1861)
- Fix a bug where the `-m` function did not respect the `--min-overlap` option (#1869)
- Fix a bug when update of INFO/END results in assertion error (#1957)
* bcftools concat
- New option `--drop-genotypes`
* bcftools consensus
- Support higher-ploidy genotypes with `-H, --haplotype` (#1892)
- Allow `--mark-ins` and `--mark-snv` with a character, similarly to `--mark-del`
* bcftools convert
- Support for conversion from tab-delimited files (CHROM,POS,REF,ALT) to sites-only VCFs
* bcftools csq
- New `--unify-chr-names` option to automatically unify different chromosome
naming conventions in the input GFF, fasta and VCF files (e.g. "chrX" vs "X")
- More versatility in parsing various flavors of GFF
- A new `--dump-gff` option to help with debugging and investigating the internals
of hGFF parsing
- When printing consequences in nonsense mediated decay transcripts, include 'NMD_transcript'
in the consequence part of the annotation. This is to make filtering easier and analogous to
VEP annotations. For example the consequence annotation
3_prime_utr|PCGF3|ENST00000430644|NMD
is newly printed as
3_prime_utr&NMD_transcript|PCGF3|ENST00000430644|NMD
* bcftools gtcheck
- Add stats for the number of sites matched in the GT-vs-GT, GT-vs-PL, etc modes. This
information is important for interpretation of the discordance score, as only the
GT-vs-GT matching can be interpreted as the number of mismatching genotypes.
* bcftools +mendelian2
- Fix in command line argument parsing, the `-p` and `-P` options were not
functioning (#1906)
* bcftools merge
- New `-M, --missing-rules` option to control the behavior of merging of vector tags
to prevent mixtures of known and missing values in tags when desired
- Use values pertaining to the unknown allele (<*> or <NON_REF>) when available
to prevent mixtures of known and missing values (#1888)
- Revamped line matching code to fix problems in gVCF merging where split gVCF blocks
would not update genotypes (#1891, #1164).
* bcftool mpileup
- Fix a bug in --indels-v2.0 which caused an endless loop when CIGAR operator 'H' or 'P'
was encountered
* bcftools norm
- The `-m, --multiallelics +` mode now preserves phasing (#1893)
- Symbolic <DEL.*> alleles are now normalized too (#1919)
- New `-g, --gff-annot` option to right-align indels in forward transcripts to follow
HGVS 3'rule (#1929)
* bcftools query
- Force newline character in formatting expression when not given explicitly
- Fix `-H` header output in formatting expressions containing newlines
* bcftools reheader
- Make `-f, --fai` aware of long contigs not representable by 32-bit integer (#1959)
* bcftools +split-vep
- Prevent a segfault when `-i/-e` use a VEP subfield not included in `-f` or `-c` (#1877)
- New `-X, --keep-sites` option complementing the existing `-x, --drop-sites` options
- Force newline character in formatting expression when not given explicitly
- Fix a subtle ambiguity: identical rows must be returned when `-s` is applied regardless
of `-f` containing the `-a` VEP tag itself or not.
* bcftools stats
- Collect new VAF (variant allele frequency) statistics from FORMAT/AD field
- When counting transitions/transversions, consider also alternate het genotypes
* plot-vcfstats
- Add three new VAF plots
## Release 1.17 (21st February 2023)
Changes affecting the whole of bcftools, or multiple commands:
* The -i/-e filtering expressions
- Error checks were added to prevent incorrect use of vector arithmetics. For example,
when evaluating the sum of two vectors A and B, the resulting vector could contain
nonsense values when the input vectors were not of the same length. The fix introduces
the following logic:
- evaluate to C_i = A_i + B_i when length(A)==B(A) and set length(C)=length(A)
- evaluate to C_i = A_i + B_0 when length(B)=1 and set length(C)=length(A)
- evaluate to C_i = A_0 + B_i when length(A)=1 and set length(C)=length(B)
- throw an error when length(A)!=length(B) AND length(A)!=1 AND length(B)!=1
- Arrays in Number=R tags can be now subscripted by alleles found in FORMAT/GT. For example,
FORMAT/AD[GT] > 10 .. require support of more than 10 reads for each allele
FORMAT/AD[0:GT] > 10 .. same as above, but in the first sample
sSUM(FORMAT/AD[GT]) > 20 .. require total sample depth bigger than 20
* The commands `consensus -H` and `+split-vep -H`
- Drop unnecessary leading space in the first header column and newly print `#[1]columnName`
instead of the previous `# [1]columnName` (#1856)
Changes affecting specific commands:
* bcftools +allele-length
- Fix overflow for indels longer than 512bp and aggregate alleles equal or larger than
that in the same bin (#1837)
* bcftools annotate
- Support sample reordering of annotation file (#1785)
- Restore lost functionality of the --pair-logic option (#1808)
* bcftools call
- Fix a bug where too many alleles passed to `-C alleles` via `-T` caused memory
corruption (#1790)
- Fix a bug where indels constrained with `-C alleles -T` would sometimes be missed (#1706)
* bcftools consensus
- BREAKING CHANGE: the option `-I, --iupac-codes` newly outputs IUPAC codes based on FORMAT/GT
of all samples. The `-s, --samples` and `-S, --samples-file` options can be used to subset
samples. In order to ignore samples and consider only the REF and ALT columns (the original
behavior prior to 1.17), run with `-s -` (#1828)
* bcftools convert
- Make variantkey conversion work for sites without an ALT allele (#1806)
* bcftool csq
- Fix a bug where a MNV with multiple consequences (e.g. missense + stop_gained)
would report only the less severe one (#1810)
- GFF file parsing was made slightly more flexible, newly ids can be just 'XXX'
rather than, for example, 'gene:XXX'
- New gff2gff perl script to fix GFF formatting differences
* bcftools +fill-tags
- More of the available annotations are now added by the `-t all` option
* bcftools +fixref
- New INFO/FIXREF annotation
- New -m swap mode
* bcftools +mendelian
- The +mendelian plugin has been deprecated and replaced with +mendelian2. The
function of the plugin is the same but the command line options and the output
format has changed, and for this was introduced as a new plugin.
* bcftools mpileup
- Most of the annotations generated by mpileup are now optional via the
`-a, --annotate` option and add several new (mostly experimental) annotations.
- New option `--indels-2.0` for an EXPERIMENTAL indel calling model. This model aims
to address some known deficiencies of the current indel calling algorithm, specifically,
it uses diploid reference consensus sequence. Note that in the current version it
has the potential to increase sensitivity but at the cost of decreased specificity.
- Make the FS annotation (Fisher exact test strand bias) functional and remove it
from the default annotations
* bcftools norm
- New --multi-overlaps option allows setting overlapping alleles either to the
ref allele (the current default) or to a missing allele (#1764 and #1802)
- Fixed a bug in `-m -` which does not split missing FORMAT values correctly and
could lead to empty FORMAT fields such as `::` instead of the correct `:.:` (#1818)
- The `--atomize` option previously would not split complex indels such as C>GGG.
Newly these will be split into two records C>G and C>CGG (#1832)
* bcftools query
- Fix a rare bug where the printing of SAMPLE field with `query` was incorrectly
suppressed when the `-e` option contained a sample expression while the formatting
query did not. See #1783 for details.
* bcftools +setGT
- Add new `--new-gt X` option (#1800)
- Add new `--target-gt r:FLOAT` option to randomly select a proportion of genotypes (#1850)
- Fix a bug where `-t ./x` mode was advertised as selecting both phased and unphased
half-missing genotypes, but was in fact selecting only unphased genotypes (#1844)
* bcftools +split-vep
- New options `-g, --gene-list` and `--gene-list-fields` which allow to prioritize
consequences from a list of genes, or restrict output to the listed genes
- New `-H, --print-header` option to print the header with `-f`
- Work around a bug in the LOFTEE VEP plugin used to annotate gnomAD VCFs. There the
LoF_info subfield contains commas which, in general, makes it impossible to parse the
VEP subfields. The +split-vep plugin can now work with such files, replacing the offending
commas with slash (/) characters. See also https://github.com/Ensembl/ensembl-vep/issues/1351
- Newly the `-c, --columns` option can be omitted when a subfield is used in `-i/-e` filtering
expression. Note that `-c` may still have to be given when it is not possible to infer the
type of the subfield. Note that this is an experimental feature.
* bcftools stats
- The per-sample stats (PSC) would not be computed when `-i/-e` filtering options and
the `-s -` option were given but the expression did not include sample columns (1835)
* bcftools +tag2tag
- Revamp of the plugin to allow wider range of tag conversions, specifically all combinations
from FORMAT/GL,PL,GP to FORMAT/GL,PL,GP,GT
* bcftools +trio-dnm2
- New `-n, --strictly-novel` option to downplay alleles which violate Mendelian
inheritance but are not novel
- Allow to set the `--pn` and `--pns` options separately for SNVs and indels and make
the indel settings more strict by default
- Output missing FORMAT/VAF values in non-trio samples, rather than random nonsense values
* bcftools +variant-distance
- New option `-d, --direction` to choose the directionality: forward, reverse, nearest (the default)
or both (#1829)
## Release 1.16 (18th August 2022)
* New plugin `bcftools +variant-distance` to annotate records with distance to the
nearest variant (#1690)
Changes affecting the whole of bcftools, or multiple commands:
* The -i/-e filtering expressions
- Added support for querying of multiple filters, for example `-i 'FILTER="A;B"'`
can be used to select sites with two filters "A" and "B" set. See the documentation
for more examples.
- Added modulo arithmetic operator
Changes affecting specific commands:
* bcftools annotate
- A bug introduced in 1.14 caused that records with INFO/END annotation would
incorrectly trigger `-c ~INFO/END` mode of comparison even when not explicitly
requested, which would result in not transferring the annotation from a tab-delimited
file (#1733)
* bcftools merge
- New `-m snp-ins-del` switch to merge SNVs, insertions and deletions separately (#1704)
* bcftools mpileup
- New NMBZ annotation for Mann-Whitney U-z test on number of mismatches within
supporting reads
- Suppress the output of MQSBZ and FS annotations in absence of alternate allele
* bcftools +scatter
- Fix erroneous addition of duplicate PG lines
* bcftools +setGT
- Custom genotypes (e.g. `-n c:1/1`) now correctly override ploidy
## Release 1.15.1 (7th April 2022)
* bcftools annotate
- New `-H, --header-line` convenience option to pass a header line on command line,
this complements the existing `-h, --header-lines` option which requires a file
with header lines
* bcftools csq
- A list of consequence types supported by `bcftools csq` has been added to
the manual page. (#1671)
* bcftools +fill-tags
- Extend generalized functions so that FORMAT tags can be filled as well, for example:
bcftools +fill-tags in.bcf -o out.bcf -- -t 'FORMAT/DP:1=int(smpl_sum(FORMAT/AD))'
- Allow multiple custom functions in a single run. Previously the program would silently
go with the last one, assigning the same values to all (#1684)
* bcftools norm
- Fix an assertion failure triggered when a faulty VCF file with a '-'
character in the REF allele was used with `bcftools norm --atomize`. This
option now checks that the REF allele only includes the allowed characters
A, C, G, T and N. (#1668)
- Fix the loss of phasing in half-missing genotypes in variant atomization (#1689)
* bcftools roh
- Fix a bug that could result in an endless loop or incorrect AF estimate when
missing genotypes are present and the `--estimate-AF -` option was used (#1687)
* bcftools +split-vep
- VEP fields with characters disallowed in VCF tag names by the specification (such as '-'
in 'M-CAP') couldn't be queried. This has been fixed, the program now sanitizes the field
names, replacing invalid characters with underscore (#1686)
## Release 1.15 (21st February 2022)
* New `bcftools head` subcommand for conveniently displaying the headers
of a VCF or BCF file. Without any options, this is equivalent to
`bcftools view --header-only --no-version` but more succinct and memorable.
* The `-T, --targets-file` option had the following bug originating in HTSlib code:
when an uncompressed file with multiple columns CHR,POS,REF was provided, the
REF would be interpreted as 0 gigabases (#1598)
Changes affecting specific commands:
* bcftools annotate
- In addition to `--rename-annots`, which requires a file with name mappings,
it is now possible to do the same on the command line `-c NEW_TAG:=OLD_TAG`
- Add new option --min-overlap to specify the minimum required
overlap of intersecting regions
- Allow to transfer ALT from VCF with or without replacement using
bcftools annotate -a annots.vcf.gz -c ALT file.vcf.gz
bcftools annotate -a annots.vcf.gz -c +ALT file.vcf.gz
* bcftools convert
- Revamp of `--gensample`, `--hapsample` and `--haplegendsample` family of options
which includes the following changes:
- New `--3N6` option to output/input the new version of the .gen file format,
see https://www.cog-genomics.org/plink/2.0/formats#gen
- Deprecate the `--chrom` option in favor of `--3N6`. A simple `cut` command
can be used to convert from the new 3*M+6 column format to the format printed
with `--chrom` (`cut -d' ' -f1,3-`).
- The CHROM:POS_REF_ALT IDs which are used to detect strand swaps are required
and must appear either in the "SNP ID" column or the "rsID" column. The column
is autodetected for `--gensample2vcf`, can be the first or the second for
`--hapsample2vcf` (depending on whether the `--vcf-ids` option is given), must be
the first for `--haplegendsample2vcf`.
* bcftools csq
- Allow GFF files with phase column unset
* bcftools filter
- New `--mask`, `--mask-file` and `--mask-overlap` options to soft filter
variants in regions (#1635)
* bcftools +fixref
- The `-m id` option now works also for non-dbSNP ids, i.e. not just `rsINT`