We investigate the benefit of combining blind audio recordings with 3D scene information for acoustic synthesis of novel views. Given audio recordings from 2-4 microphones and the 3D geometry and material of a scene containing multiple unknown sound sources, we estimate the sound anywhere in the scene. We identify the main challenges of acoustic synthesis of novel views as sound source localization, separation, and dereverberation. While naïve training of an end-to-end network does not yield high-quality results, we show that incorporating room impulse responses (RIRs) derived from 3D reconstructed rooms allows the same network to tackle these tasks together. Our method outperforms existing methods designed for the individual tasks, demonstrating its effectiveness in using 3D visual information. In a simulated study on the Matterport3D-NVAS dataset, our model achieves near-perfect accuracy in source localization, a PSNR of 26.44 dB, and a SDR of 14.23 dB for source separation and dereverberation, resulting in a PSNR of 25.55 dB and a SDR of 14.20 dB in novel view acoustic synthesis. We publish our code and model on our project website at https://github.com/apple/ml-nvas3dUse headphones when listening to the results.