This article presents AIM, a collection of vision models pre-trained with an autoregressive objective. These models are inspired by their textual counterparts, i.e., large language models (LLMs), and exhibit similar scaling properties. Specifically, we highlight two key findings: (1) visual feature performance scales with both model capacity and amount of data, (2) objective function value correlates with model performance on subsequent tasks. We illustrate the practical implication of these findings by pre-training a 7 billion parameter AIM on 2 billion images, which reaches 84.0% on ImageNet-1k with a frozen log. Interestingly, even at this scale, we see no signs of performance saturation, suggesting that AIM potentially represents a new frontier for training large-scale vision models. AIM pre-training is similar to LLM pre-training and does not require any image-specific strategy to stabilize training at scale.