summaryrefslogtreecommitdiff
path: root/aom_dsp
Commit message (Collapse)AuthorAge
* [film-grain] Fix film grain + --all-layersDavid Barker2018-07-10
| | | | | | | | | | | | Fixes the case where multiple spatial layers in a video use film grain, and the --all-layers option is used. Also avoids reallocating the film grain buffer in a few cases where it doesn't need to be reallocated. BUG=aomedia:2002 Change-Id: I1126b47ee134a665881070aa2da83276e5c1a662
* [film grain] Fix clear of {cb,cr}_grain_blockFrederic Barbier2018-07-10
| | | | | | BUG=aomedia:1995 Change-Id: Ied317364eba92a4bb903a42f87b870b7d719d93b
* hbd lpf sse2 perf and code quality improvementVictoria Zhislina2018-06-23
| | | | | | | | | aom_highbd_lpf_horizontal_14_sse2 -1.15x perf due to full sse2 register width usage for some ops highbd_hev_filter_mask_x_sse2 added for code quality if blocks in _4,_6,_8 and _14 eliminated Change-Id: Ie28a70798833c95fb21cac238ffdebfcead5f0a7
* Clear {cb,cr}_grain_block when there is no scaling functionNeil Birkbeck2018-06-22
| | | | | | BUG=aomedia:1948,aomedia:1955 Change-Id: I0384e7ae9402f1117b97dae827097214e2907cbc
* Add ARM Neon optimization of aom_highbd_dc_predictorSachin Kumar Garg2018-06-22
| | | | | | | | | | | Block size c/neon 4x4 6.10x 8x8 9.27x 16x16 5.28x 32x32 5.02x 64x64 4.48x Change-Id: I1fbd5527a9179a87159765cd35cfb6af63ea54b8
* Add ARM Neon optimization of blend_vmask and blend_hmaskSachin Kumar Garg2018-06-20
| | | | | | | | | | | | | | | | | Scaling w.r.t. C Block Size blend_vmask blend_hmask 8x4 9.48x 9.81x 8x8 10.16x 10.56x 16x8 8.00x 7.45x 16x16 8.30x 7.74x 32x16 5.96x 5.06x 32x32 5.96x 4.90x 64x32 4.92x 4.30x 64x64 4.38x 3.95x 128x64 4.43x 3.73x 128x128 3.52x 3.26x Change-Id: Ibfdaf151a5220c134bcccb79eafd1d06b1ce39b2
* Make type conversions explicitYaowu Xu2018-06-19
| | | | | | This fixes a number of MSVC compiler warnings. Change-Id: I046afb92f9350a534e66220846bd32e1701f4e87
* Fix use of uninitialized value in noise_util.cNeil Birkbeck2018-06-18
| | | | | | BUG=aomedia:1963 Change-Id: If08601d556fbefbb680a2b9ecfd48115d32bef60
* Constrain the range of immediate constantsSteinar Midtskogen2018-06-18
| | | | | | | | Improvement over solution in 8a99b5f BUG=aomedia:1945 Change-Id: I6c72494544919943dbce799f2fb046b1ef33abb0
* Add 2d wiener denoiser for regrainerNeil Birkbeck2018-06-15
| | | | | | | This also adds support for 2d fft (float) to be used for both denoising and noise power spectral density estimation. Change-Id: Ie95b44280bb301dfd3f0cf06d139e307d2f4e11b
* Fix build issues in previous commitSteinar Midtskogen2018-06-14
| | | | | | | "Defined but not used" warnings that somehow appeared after submission. Change-Id: I7222dc02306d5759619b798b7a02407ffab8edd6
* Import SIMD intrinsics from more recent Thor codeSteinar Midtskogen2018-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This enables ARMv8/aarch64 optimisations of CDEF as well as a few minor improvements to x86 and ARMv7. Several new intrinsics also added, which makes it possible to remove x86 specific code in the CDEF code. Also, various sanitizer warnings have been addressed (mostly related to intended two-complement overflow/underflow). And there are several AVX2 improvements. New intrinsics: v64_sadd_s8, v64_sadd_u8, v64_pack_s32_u16, v64_rdavg_u16, v128_sad_u16, v128_ssd_s16, v128_sadd_s8, v128_sadd_u8, v128_add_64, v128_sub_64, v128_pack_s32_u16, v128_rdavg_u16, v128_min_s32, v128_max_s32, v128_cmpgt_s32, v128_cmpeq_32, v128_cmplt_s32, v128_padd_u8, v128_shl_n_64, v128_shr_n_u64, v128_shr_n_s64, v128_shr_s64, v128_shr_u64, v128_shl_64, v128_dotp_su8, v128_dotp_s32, v128_movemask_8, v128_dup_64, v128_blend_8, v256_sad_u16, v256_ssd_s16, v256_low_u64, v256_dotp_su8, v256_dotp_s32, v256_sadd_s8, v256_sadd_u8, v256_add_64, v256_sub_64, v256_pack_s32_u16, v256_rdavg_u16, v256_min_s32, v256_max_s32, v256_cmpgt_s32, v256_cmplt_s32, v256_cmpeq_32, v256_wideshuffle_8, v256_padd_u8, v256_shl_n_64, v256_shr_n_u64, v256_shr_n_s64, v256_shr_s64, v256_shr_u64, v256_shl_64, v256_movemask_8, v256_dup_64, v256_blend_8, v256_unziplo_64, v256_unziphi_64 The unit tests have been updated. Change-Id: If051e902f2095e3a02aaf13cf1230475392f051e
* Add NEON optimization of lpf_horizontal_6Cherma Rajan A2018-06-14
| | | | | | | | | | Intrinsic optimization and unit test changes of horizontal filter 6 added. Performance gain w.r.t. C, lpf_horizontal_6 ~4.8x Change-Id: Ib3f814f5ce1abe902124b5635d287b82f1ab4b1e
* lpf 6 dual functions added, single sse2 improvedVictoria Zhislina2018-06-14
| | | | | | | | | | c and sse2 functions for 5 tap dual filtering added corresponding unit tests added aom_lpf_vertical_6_sse2 -1.2x performace, no memcpy aom_highbd_lpf_vertical_6_sse2 -less pixels involved highbd sse2 loopfiler minor code improvement Change-Id: I2f01701a8a4d19aebcff13c4a5cd854c1dd21549
* lpf sse2 14 dual performance improvementsVictoria Zhislina2018-06-14
| | | | | | | | dual 14 horizontal fn - 2x performance dual 14 vertical fn - 3x performance 6,8 and 14 minor code quality improvement Change-Id: Ifb86eae32d6e28d6f9653cdf792a8e3f9113e3c1
* Add NEON optimization of lpf_vertical_8Cherma Rajan A2018-06-13
| | | | | | | | | | Intrinsic optimization and unit test changes of vertical filter 8 added. Performance gain w.r.t. C, lpf_vertical_8 ~3.1x Change-Id: Icbb2b43867c8a14c39af1e24d08a20662ed39937
* Add ARM Neon optimization of aom_lowbd_blend_a64_d16_maskRemya2018-06-13
| | | | | | | | | | | | | | | | Block Gain w.r.t.C 8x4 4.58x 8x8 5.88x 16x8 4.42x 16x16 4.82x 32x16 3.99x 32x32 4.03x 64x32 3.19x 64x64 3.12x 128x64 2.59x 128x128 2.38x Change-Id: I1318e40f27d55272e3c5dc3cb0d5c1a1a22ff8bb
* Add Arm Neon optimization of round_shift_arraySachin Kumar Garg2018-06-12
| | | | | | | | | | | | | | Block size c/neon 16x8 3.33 16x16 3.32 16x32 3.35 32x16 3.59 32x32 3.74 32x64 3.68 64x32 3.78 64x64 3.76 Change-Id: I627212df6ac4b3127cc5a6064234d51c23a3075b
* Remove unused static function 'filter_hev_mask4'David Michael Barr2018-06-12
| | | | Change-Id: Ic28e5fbde91fc31fd054b515cae65f0568a69b1e
* Add NEON optimization of lpf_vertical_14 and lpf_horizontal_8Cherma Rajan A2018-06-12
| | | | | | | | | | | Loop filter optimization of functions vertical filter 14 and horizontal filter 8 added. Unit test is updated for the functions. Performance gain w.r.t. C, lpf_vertical_14 ~3.5x lpf_horizontal_8 ~5.0x Change-Id: I5c460153598562bf2a719486b247279d8f524fca
* Change blend function param order from h,w to w,hScott LaVarnway2018-06-12
| | | | | | Followup from 59721 Change-Id: I272551ab78a0efdcdb8e7297e890f06693ebf3f7
* Constrain the range of immediate constantsSteinar Midtskogen2018-06-11
| | | | | | | | | | | | clang might fail to compile when immediate constant expressions include arithmetics. This has been fixed for v256_shr_n_byte by using different intrinsics which didn't require arithmetics (and also reduced the number of instructions), and for v256_shl_n_byte by stating the range explicitly using the AND operation. BUG=aomedia:1945 Change-Id: Ie3a614a0ede376e7b2d7329249289c089d98a69a
* Revert "Add 2d wiener denoiser for regrainer"Wan-Teh Chang2018-06-11
| | | | | | | | This reverts commit 729cd5028ed48cd8bf7a697de038c559b953c7db. Reason for revert: Visual Studio build failure (it seems the operator overloading on the intrinsics isn't working). Change-Id: I94bac70d6e6e6c429c417cc1e45cc06a1cbe81b9
* Add missing subpel variance functions for x86.Kyle Siefring2018-06-09
| | | | | | | | Add the by 128 functions that were missing. While we are at it, fill out rectangular avx2 functions. Change-Id: If990ce92d4c23d6225cd11d3815d600e819a8e2c
* Add 2d wiener denoiser for regrainerNeil Birkbeck2018-06-08
| | | | | | | This also adds support for 2d fft (float) to be used for both denoising and noise power spectral density estimation. Change-Id: I525d0712235b566d1004aa8b6d0ad0d81eebca67
* Remove unused files of convolve8 and convolve_copyDavid Michael Barr2018-06-08
| | | | | | | Since EXT_PARTITION was fully adopted, these NEON intrinsics are no longer included in the build configuration. Change-Id: Ic3033abf80c71ed3589604f2d50a339095799a31
* Remove 'aom_dsp/avg.c' and related sourcesDavid Michael Barr2018-06-08
| | | | | | | All of the functions in this file are unused. Also remove tests and SIMD specializations. Change-Id: I17572c3d5739ebe63e392b0a8c73b097fc139df2
* psnrhvs: Remove unused M_PI macro definition.Wan-Teh Chang2018-06-07
| | | | | | | | Don't need to include <string.h>. BUG=aomedia:1943 Change-Id: Ia46dacd1e7f31309da80f4c24c8c7a785c8ecd96
* Remove unused static function 'highbd_flat_mask5'David Michael Barr2018-06-05
| | | | Change-Id: I935fae9048c73de515e753dc6d9abad4f8f687a1
* lpf sse2 4,8 single and dual improvementsVictoria Zhislina2018-06-01
| | | | | | | | dual 4 and 8 horizontal and vertical fn 2x performance dual 4 and 8 tests added single and dual 4,8 code quality improved Change-Id: If3e09d6a07585cc8cf336a946cca8152f3998abb
* Use force inline for obmc_sad_w4n/w8nPeng Bin2018-06-01
| | | | | | | These functions are not inlined by some compiler such as MSVC 2015. Change-Id: I11fa067bb50e20fd3ec6f2d6096b7eea8b2bb435
* arm: remove unused assemblyJohann2018-05-31
| | | | | | | | | | loopfilter and convolve functions are not hooked up. intrapred has intrinsic versions. Remove infrastructure for supporting arm assembly. Change-Id: Iba53a1a5433fe2ec39e28f886f26e2f479e22473
* Add explicit casting for ARM intrinsics.Steinar Midtskogen2018-05-26
| | | | | | | | More recent versions of gcc are more picky about implicit type conversions. BUG:aomedia:1313 Change-Id: I4cf56b6b5c298ac046a41c9cce0f3f8140076240
* Remove CONFIG_AV1 preproc symbol.Tom Finegan2018-05-26
| | | | | | | Unconditinally enable all blocks it guarded, and remove blocks active only when it was disabled. Change-Id: Id39ac68829dfcee5f8d3766e5dc59de148f7c678
* intrapred: remove 2x2 function defsJames Zern2018-05-26
| | | | | | the experiment using 2x2 blocks was abandoned Change-Id: Iafc42a46a1c2fde0aee5121fd0c4058e712e0bd0
* Add aom_lowbd_blend_a64_d16_mask_sse4_1Scott LaVarnway2018-05-25
| | | | Change-Id: I565e0dbf7fe5dcafd539e08bf27cb8634cb18d15
* highbd lpf 4 sse2 vertical improvementVictoria Zhislina2018-05-24
| | | | | | | | aom_highbd_lpf_vertical_4_sse2 single and dual now use only the necessary data and have smaller transpose size Change-Id: I42d32ea2f10c7e88ced9f8a60098ab440f41485d
* Includes clean up.Tom Finegan2018-05-23
| | | | | | | | | | | | | | | | - Remove includes of config.h wrapped by HAVE_CONFIG_H. This is an obsolete vestige of our libvpx ancestry. - Change (nearly) all remaining include sites; use proper path to the include relative to project root instead of ./. - Correct include order where appropriate. - Note: This part of this patch is NOT exhaustive. In an attempt to be conseravative about the impact of this change I limited ordering and grouping changes to places that appear extremely unlikely to cause a problem. A more exhaustive follow up might be appropriate, but this issue can likely be handled organically from here on out. Change-Id: I3b421ffd46c5da6ef78e43e7a6d3b9550cb30325
* cmake: generate rtcd headers in config subdir.Tom Finegan2018-05-23
| | | | | | | Move rtcd header outputs to config subdir of config dir, and update include sites. Change-Id: I25c5f1808a091f3727934adc274637ebdcdcb1cf
* cmake: Output aom_config in config sub dir.Tom Finegan2018-05-23
| | | | | | | | | | Does away with somewhat confusing usage of "./aom_config.h" in include statements while keeping linters silent. aom_config.asm, aom_config.c, and aom_config.h are now written to the config sub dir. Change-Id: I99e2422d6ca8b20b9cdf2feee83a866c273e47b0
* Use dual functions for loop filterCheng Chen2018-05-22
| | | | | | | | | Use dual SIMD functions for loop filtering, including luma 13-tap, 7-tap, 4-tap for vertical and horizontal direction Chroma 5-tap do not have dual SIMD function yet. Change-Id: I3afdaab240613baffcd8c19d824bfb048ed64d8f
* Remove unused static function 'read64'Urvang Joshi2018-05-22
| | | | Change-Id: Iecac7672a5002e2780f4506cfaa39678b3d70e0d
* Refactor: av1_make_inter_predictorPeng Bin2018-05-21
| | | | | | | | Update the parameter list of this function, pass a pointer of SubpelParams, instead pass the four members of it. Change-Id: I8bd1b29ab2befb23fcffc22539784ba50f32f4d2
* cmake,asm targets: silently ignore empty listsJames Zern2018-05-17
| | | | | | avoids a warning when creating a target with an empty source list Change-Id: I0ff13998c8e2af8392474dca3ae49883389f8157
* Move iwht4x4 to av1_inv_txfm2d.cAngie Chiang2018-05-16
| | | | Change-Id: Idd22da6bf5b34bad87193f3b360c9bd25842f5ae
* Cosmetic changes to variance_sse2/avx2Peng Bin2018-05-16
| | | | | | | 1. Remove redundant parameter in macro AOM_VAR_LOOP_SSE2 2. Move loop iterator into loops scope. Change-Id: Ib5569368fb467ac4eb332bb8141c6838e6cbc489
* Refactor: Update variance avx2 functionsPeng Bin2018-05-16
| | | | | | | | | | | | | | | | | | | | | | 1. Migrate optimization from libvpx https://chromium-review.googlesource.com/c/webm/libvpx/+/1015844 78ba83bb9 Update variance avx2 functions 2. Add more avx2 functions(32x{8,64},16x{4,8,32,64}). 3. For encoder, about 1.2% faster shows by encoding 15 frame of city_cif, with CL58321 and this CL. 333356 ms --> 329440 ms a) gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609 b) CPU: Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz c) Config cmd cmake ../ -DENABLE_CCACHE=1 -DCONFIG_LOWBITDEPTH=1 d) Test cmd: ./aomenc --cpu-used=1 --end-usage=vbr \ --target-bitrate=600 --limit=15 Change-Id: I8b058944ad23353b77c0cdd5b4714b4413e31d73
* Refactor: Update variance sse2 functionsPeng Bin2018-05-16
| | | | | | | | | | 1. Migrate optimization from libvpx https://chromium-review.googlesource.com/c/webm/libvpx/+/1014306 55ca875e6 Update variance sse2 functions 2. Add missing cases in unit test Change-Id: Ifa009c85dbb8d41ef7c89cc6b309717b198a31e7
* cmake-format: update aom_dsp.cmakeJohann2018-05-15
| | | | Change-Id: Ic0f17f94ae793e7d5a6036e57000dd4d35a01999
* psnrhvs.c: Bugfix for CONFIG_LOWBITDEPTH=0Urvang Joshi2018-05-15
| | | | | | | | | | | | In this case, when bit_depth == 8, the PSNR-HVS was not being calculated at all -- and was returning junk values (often negative). Issue introduced in this patch: https://aomedia-review.googlesource.com/c/aom/+/50901 BUG=aomedia:1882 Change-Id: Iee3ee8dc2e78126e1fb42bc96dc7c850cbd24961