Motivated by the problem of next word prediction on user devices, we present and study the problem of personalized frequency histogram estimation in a federated environment. In this problem, in some domain, each user observes a number of samples from a distribution that is specific to that user. The goal is to calculate for all users a personalized estimate of the user's distribution with the error measured in KL divergence. We focus on addressing two central challenges: statistical heterogeneity and user privacy protection. Our approach to the problem is based on discovering and exploiting similar subpopulations of users that are often present and latent in real-world data, while minimizing user privacy leakage. We first present a non-private clustering-based algorithm for the problem and provide a provable joint differentially private version with a data-dependent private initialization scheme. Next, we propose a simple data model that is based on a combination of Dirichlet distributions, to formally motivate our non-private algorithm and demonstrate some properties of its components. Finally, we provide an extensive empirical evaluation of our private and non-private algorithms under different levels of statistical and size heterogeneity on the Reddit, StackOverflow, and amazon Reviews datasets. Our results demonstrate significant improvements over standard and clustering-based baselines and, in particular, show that improvement is possible over direct customization of a single global model.