Recent advances in deep learning and automatic speech recognition (ASR) have enabled end-to-end (E2E) ASR and increased its accuracy to a new level. E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, merging a separate LM, trained exclusively on text corpus, into the E2E system has proven to be beneficial. However, the application of LM fusion has certain drawbacks, such as its inability to address the domain mismatch problem inherent in internal AM. Inspired by the concept of LM fusion, we propose the integration of an external AM into the E2E system to better address domain mismatch. By implementing this novel approach, we have achieved a significant reduction in the word error rate, with an impressive drop of up to 14.3% across multiple test sets. We also found that this AM fusion approach is particularly beneficial for improving named entity recognition.