| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes the case where multiple spatial layers in a video use film grain,
and the --all-layers option is used.
Also avoids reallocating the film grain buffer in a few cases where
it doesn't need to be reallocated.
BUG=aomedia:2002
Change-Id: I1126b47ee134a665881070aa2da83276e5c1a662
|
|
|
|
|
|
| |
BUG=aomedia:1995
Change-Id: Ied317364eba92a4bb903a42f87b870b7d719d93b
|
|
|
|
|
|
|
|
|
| |
aom_highbd_lpf_horizontal_14_sse2 -1.15x perf
due to full sse2 register width usage for some ops
highbd_hev_filter_mask_x_sse2 added for code quality
if blocks in _4,_6,_8 and _14 eliminated
Change-Id: Ie28a70798833c95fb21cac238ffdebfcead5f0a7
|
|
|
|
|
|
| |
BUG=aomedia:1948,aomedia:1955
Change-Id: I0384e7ae9402f1117b97dae827097214e2907cbc
|
|
|
|
|
|
|
|
|
|
|
| |
Block size c/neon
4x4 6.10x
8x8 9.27x
16x16 5.28x
32x32 5.02x
64x64 4.48x
Change-Id: I1fbd5527a9179a87159765cd35cfb6af63ea54b8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Scaling w.r.t. C
Block Size blend_vmask blend_hmask
8x4 9.48x 9.81x
8x8 10.16x 10.56x
16x8 8.00x 7.45x
16x16 8.30x 7.74x
32x16 5.96x 5.06x
32x32 5.96x 4.90x
64x32 4.92x 4.30x
64x64 4.38x 3.95x
128x64 4.43x 3.73x
128x128 3.52x 3.26x
Change-Id: Ibfdaf151a5220c134bcccb79eafd1d06b1ce39b2
|
|
|
|
|
|
| |
This fixes a number of MSVC compiler warnings.
Change-Id: I046afb92f9350a534e66220846bd32e1701f4e87
|
|
|
|
|
|
| |
BUG=aomedia:1963
Change-Id: If08601d556fbefbb680a2b9ecfd48115d32bef60
|
|
|
|
|
|
|
|
| |
Improvement over solution in 8a99b5f
BUG=aomedia:1945
Change-Id: I6c72494544919943dbce799f2fb046b1ef33abb0
|
|
|
|
|
|
|
| |
This also adds support for 2d fft (float) to be used
for both denoising and noise power spectral density estimation.
Change-Id: Ie95b44280bb301dfd3f0cf06d139e307d2f4e11b
|
|
|
|
|
|
|
| |
"Defined but not used" warnings that somehow appeared after
submission.
Change-Id: I7222dc02306d5759619b798b7a02407ffab8edd6
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This enables ARMv8/aarch64 optimisations of CDEF as well as a few
minor improvements to x86 and ARMv7. Several new intrinsics also
added, which makes it possible to remove x86 specific code in the CDEF
code. Also, various sanitizer warnings have been addressed (mostly
related to intended two-complement overflow/underflow). And there are
several AVX2 improvements.
New intrinsics: v64_sadd_s8, v64_sadd_u8, v64_pack_s32_u16,
v64_rdavg_u16, v128_sad_u16, v128_ssd_s16, v128_sadd_s8, v128_sadd_u8,
v128_add_64, v128_sub_64, v128_pack_s32_u16, v128_rdavg_u16,
v128_min_s32, v128_max_s32, v128_cmpgt_s32, v128_cmpeq_32,
v128_cmplt_s32, v128_padd_u8, v128_shl_n_64, v128_shr_n_u64,
v128_shr_n_s64, v128_shr_s64, v128_shr_u64, v128_shl_64,
v128_dotp_su8, v128_dotp_s32, v128_movemask_8, v128_dup_64,
v128_blend_8, v256_sad_u16, v256_ssd_s16, v256_low_u64, v256_dotp_su8,
v256_dotp_s32, v256_sadd_s8, v256_sadd_u8, v256_add_64, v256_sub_64,
v256_pack_s32_u16, v256_rdavg_u16, v256_min_s32, v256_max_s32,
v256_cmpgt_s32, v256_cmplt_s32, v256_cmpeq_32, v256_wideshuffle_8,
v256_padd_u8, v256_shl_n_64, v256_shr_n_u64, v256_shr_n_s64,
v256_shr_s64, v256_shr_u64, v256_shl_64, v256_movemask_8, v256_dup_64,
v256_blend_8, v256_unziplo_64, v256_unziphi_64
The unit tests have been updated.
Change-Id: If051e902f2095e3a02aaf13cf1230475392f051e
|
|
|
|
|
|
|
|
|
|
| |
Intrinsic optimization and unit test changes of horizontal
filter 6 added.
Performance gain w.r.t. C,
lpf_horizontal_6 ~4.8x
Change-Id: Ib3f814f5ce1abe902124b5635d287b82f1ab4b1e
|
|
|
|
|
|
|
|
|
|
| |
c and sse2 functions for 5 tap dual filtering added
corresponding unit tests added
aom_lpf_vertical_6_sse2 -1.2x performace, no memcpy
aom_highbd_lpf_vertical_6_sse2 -less pixels involved
highbd sse2 loopfiler minor code improvement
Change-Id: I2f01701a8a4d19aebcff13c4a5cd854c1dd21549
|
|
|
|
|
|
|
|
| |
dual 14 horizontal fn - 2x performance
dual 14 vertical fn - 3x performance
6,8 and 14 minor code quality improvement
Change-Id: Ifb86eae32d6e28d6f9653cdf792a8e3f9113e3c1
|
|
|
|
|
|
|
|
|
|
| |
Intrinsic optimization and unit test changes of vertical
filter 8 added.
Performance gain w.r.t. C,
lpf_vertical_8 ~3.1x
Change-Id: Icbb2b43867c8a14c39af1e24d08a20662ed39937
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Block Gain w.r.t.C
8x4 4.58x
8x8 5.88x
16x8 4.42x
16x16 4.82x
32x16 3.99x
32x32 4.03x
64x32 3.19x
64x64 3.12x
128x64 2.59x
128x128 2.38x
Change-Id: I1318e40f27d55272e3c5dc3cb0d5c1a1a22ff8bb
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Block size c/neon
16x8 3.33
16x16 3.32
16x32 3.35
32x16 3.59
32x32 3.74
32x64 3.68
64x32 3.78
64x64 3.76
Change-Id: I627212df6ac4b3127cc5a6064234d51c23a3075b
|
|
|
|
| |
Change-Id: Ic28e5fbde91fc31fd054b515cae65f0568a69b1e
|
|
|
|
|
|
|
|
|
|
|
| |
Loop filter optimization of functions vertical filter 14 and horizontal filter 8 added.
Unit test is updated for the functions.
Performance gain w.r.t. C,
lpf_vertical_14 ~3.5x
lpf_horizontal_8 ~5.0x
Change-Id: I5c460153598562bf2a719486b247279d8f524fca
|
|
|
|
|
|
| |
Followup from 59721
Change-Id: I272551ab78a0efdcdb8e7297e890f06693ebf3f7
|
|
|
|
|
|
|
|
|
|
|
|
| |
clang might fail to compile when immediate constant expressions
include arithmetics. This has been fixed for v256_shr_n_byte by using
different intrinsics which didn't require arithmetics (and also
reduced the number of instructions), and for v256_shl_n_byte by
stating the range explicitly using the AND operation.
BUG=aomedia:1945
Change-Id: Ie3a614a0ede376e7b2d7329249289c089d98a69a
|
|
|
|
|
|
|
|
| |
This reverts commit 729cd5028ed48cd8bf7a697de038c559b953c7db.
Reason for revert: Visual Studio build failure (it seems the operator overloading on the intrinsics isn't working).
Change-Id: I94bac70d6e6e6c429c417cc1e45cc06a1cbe81b9
|
|
|
|
|
|
|
|
| |
Add the by 128 functions that were missing.
While we are at it, fill out rectangular avx2 functions.
Change-Id: If990ce92d4c23d6225cd11d3815d600e819a8e2c
|
|
|
|
|
|
|
| |
This also adds support for 2d fft (float) to be used
for both denoising and noise power spectral density estimation.
Change-Id: I525d0712235b566d1004aa8b6d0ad0d81eebca67
|
|
|
|
|
|
|
| |
Since EXT_PARTITION was fully adopted, these NEON intrinsics are
no longer included in the build configuration.
Change-Id: Ic3033abf80c71ed3589604f2d50a339095799a31
|
|
|
|
|
|
|
| |
All of the functions in this file are unused.
Also remove tests and SIMD specializations.
Change-Id: I17572c3d5739ebe63e392b0a8c73b097fc139df2
|
|
|
|
|
|
|
|
| |
Don't need to include <string.h>.
BUG=aomedia:1943
Change-Id: Ia46dacd1e7f31309da80f4c24c8c7a785c8ecd96
|
|
|
|
| |
Change-Id: I935fae9048c73de515e753dc6d9abad4f8f687a1
|
|
|
|
|
|
|
|
| |
dual 4 and 8 horizontal and vertical fn 2x performance
dual 4 and 8 tests added
single and dual 4,8 code quality improved
Change-Id: If3e09d6a07585cc8cf336a946cca8152f3998abb
|
|
|
|
|
|
|
| |
These functions are not inlined by some
compiler such as MSVC 2015.
Change-Id: I11fa067bb50e20fd3ec6f2d6096b7eea8b2bb435
|
|
|
|
|
|
|
|
|
|
| |
loopfilter and convolve functions are not hooked up.
intrapred has intrinsic versions.
Remove infrastructure for supporting arm assembly.
Change-Id: Iba53a1a5433fe2ec39e28f886f26e2f479e22473
|
|
|
|
|
|
|
|
| |
More recent versions of gcc are more picky about implicit type
conversions.
BUG:aomedia:1313
Change-Id: I4cf56b6b5c298ac046a41c9cce0f3f8140076240
|
|
|
|
|
|
|
| |
Unconditinally enable all blocks it guarded, and remove
blocks active only when it was disabled.
Change-Id: Id39ac68829dfcee5f8d3766e5dc59de148f7c678
|
|
|
|
|
|
| |
the experiment using 2x2 blocks was abandoned
Change-Id: Iafc42a46a1c2fde0aee5121fd0c4058e712e0bd0
|
|
|
|
| |
Change-Id: I565e0dbf7fe5dcafd539e08bf27cb8634cb18d15
|
|
|
|
|
|
|
|
| |
aom_highbd_lpf_vertical_4_sse2 single and dual
now use only the necessary data
and have smaller transpose size
Change-Id: I42d32ea2f10c7e88ced9f8a60098ab440f41485d
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Remove includes of config.h wrapped by HAVE_CONFIG_H. This
is an obsolete vestige of our libvpx ancestry.
- Change (nearly) all remaining include sites; use proper path
to the include relative to project root instead of ./.
- Correct include order where appropriate.
- Note: This part of this patch is NOT exhaustive. In an
attempt to be conseravative about the impact of this change
I limited ordering and grouping changes to places that
appear extremely unlikely to cause a problem. A more exhaustive
follow up might be appropriate, but this issue can likely be
handled organically from here on out.
Change-Id: I3b421ffd46c5da6ef78e43e7a6d3b9550cb30325
|
|
|
|
|
|
|
| |
Move rtcd header outputs to config subdir of config dir,
and update include sites.
Change-Id: I25c5f1808a091f3727934adc274637ebdcdcb1cf
|
|
|
|
|
|
|
|
|
|
| |
Does away with somewhat confusing usage of "./aom_config.h" in
include statements while keeping linters silent.
aom_config.asm, aom_config.c, and aom_config.h are now written to
the config sub dir.
Change-Id: I99e2422d6ca8b20b9cdf2feee83a866c273e47b0
|
|
|
|
|
|
|
|
|
| |
Use dual SIMD functions for loop filtering, including
luma 13-tap, 7-tap, 4-tap for vertical and horizontal direction
Chroma 5-tap do not have dual SIMD function yet.
Change-Id: I3afdaab240613baffcd8c19d824bfb048ed64d8f
|
|
|
|
| |
Change-Id: Iecac7672a5002e2780f4506cfaa39678b3d70e0d
|
|
|
|
|
|
|
|
| |
Update the parameter list of this function,
pass a pointer of SubpelParams, instead pass
the four members of it.
Change-Id: I8bd1b29ab2befb23fcffc22539784ba50f32f4d2
|
|
|
|
|
|
| |
avoids a warning when creating a target with an empty source list
Change-Id: I0ff13998c8e2af8392474dca3ae49883389f8157
|
|
|
|
| |
Change-Id: Idd22da6bf5b34bad87193f3b360c9bd25842f5ae
|
|
|
|
|
|
|
| |
1. Remove redundant parameter in macro AOM_VAR_LOOP_SSE2
2. Move loop iterator into loops scope.
Change-Id: Ib5569368fb467ac4eb332bb8141c6838e6cbc489
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1. Migrate optimization from libvpx
https://chromium-review.googlesource.com/c/webm/libvpx/+/1015844
78ba83bb9 Update variance avx2 functions
2. Add more avx2 functions(32x{8,64},16x{4,8,32,64}).
3. For encoder, about 1.2% faster shows by encoding
15 frame of city_cif, with CL58321 and this CL.
333356 ms --> 329440 ms
a) gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
b) CPU: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz
c) Config cmd
cmake ../ -DENABLE_CCACHE=1 -DCONFIG_LOWBITDEPTH=1
d) Test cmd:
./aomenc --cpu-used=1 --end-usage=vbr \
--target-bitrate=600 --limit=15
Change-Id: I8b058944ad23353b77c0cdd5b4714b4413e31d73
|
|
|
|
|
|
|
|
|
|
| |
1. Migrate optimization from libvpx
https://chromium-review.googlesource.com/c/webm/libvpx/+/1014306
55ca875e6 Update variance sse2 functions
2. Add missing cases in unit test
Change-Id: Ifa009c85dbb8d41ef7c89cc6b309717b198a31e7
|
|
|
|
| |
Change-Id: Ic0f17f94ae793e7d5a6036e57000dd4d35a01999
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this case, when bit_depth == 8, the PSNR-HVS was not being calculated
at all -- and was returning junk values (often negative).
Issue introduced in this patch:
https://aomedia-review.googlesource.com/c/aom/+/50901
BUG=aomedia:1882
Change-Id: Iee3ee8dc2e78126e1fb42bc96dc7c850cbd24961
|