HoundDog.ai, a startup that helps developers ensure their code doesn't leak personally identifiable information (PII), ai/2024/05/22/hounddog-ai-is-born/”>came out of stealth on Wednesday and announced a $3.1 million seed round led by E14, Mozilla Ventures and ex/ante, as well as several angel investors. Unlike other scanning tools, HoundDog actually analyzes the code a developer writes, using both traditional pattern matching and large language models (LLM) to find potential problems.
HoundDog was founded by Amjad Afanah, who previously co-founded DCHQ, which was later acquired by Gridstore (which, to complicate matters, later changed its name to HyperGrid) in 2016. Afanah also co-founded apisec.ai, which is still in operation, and worked at Cruise, a startup autonomous. The inspiration for HoundDog came during his time at data security startup Cyral and speaking to the privacy teams there, he told me.
“When I was at Cyral, we had a lot of data,” he said. “What Cyral does, like many others in the data security space, is focus on production systems. They help you discover, classify your structured data and databases, and then help you apply access controls. But the overwhelming feedback I kept hearing from security and privacy teams was, 'You know, it's too reactive and doesn't keep up with changes to the codebase.'”
HoundDog then shifts this process even further to the left. While you are still in the continuous integration pipeline and not yet in the development environment (although that may happen in the future), the idea here is to find potential data leaks before merging the code. And most importantly, HoundDog does this by looking at the actual code, not the data stream it produces. “Our source of truth is the codebase,” Afanah said.
Thanks to this, if a development team starts collecting Social Security numbers, for example, HoundDog would raise a flag and warn the team before the code was merged; It would also alert the security team. After all, that could be a major and expensive problem.
The service currently supports code written in Java, C#, JavaScript, and TypeScript, as well as SQL, GraphQL, and OpenAPI/Swagger queries. Support for Python is imminent, the company says.
Afanah noted that a tool like this is becoming especially important in this era of ai-generated code, something Replit CEO (and HoundDog angel investor) Amjad Masad also echoed.
“As an increasing number of companies turn to ai-generated code to accelerate development, incorporating security best practices and ensuring the security of the generated code becomes essential,” Masad said. “HoundDog.ai is leading the way in protecting PII data early in the development cycle, making it an indispensable component of any ai code generation workflow. This is the reason why I chose to invest in this company.”
However, HoundDog itself also uses ai. It currently relies on OpenAI models to do this, but it is important to emphasize that this is optional. Users who worry about their code leaving their private repositories can also choose to rely solely on the company's more traditional code scanner.
An important part of HoundDog's value proposition is that it can reduce compliance costs for startups thanks to its automated reporting capabilities. The Service may automatically generate a Record of Processing Activities (RoPA). To do this, HoundDog uses generative ai to generate these reports and sends that data to OpenAI. The team emphasizes that only tokens that the service has discovered through its regular scanner are shared with OpenAI and that no actual source code is shared.
The company offers a ai/pricing/”>limited free planwith paid plans starting at $200/month to scan up to two repositories.