DeepSeek-affiliated Hangzhou DeepSeek AI Fundamental Technology Research Co.,Female Hostel 3 Ltd. today filed a patent for a new web data collection system designed to improve efficiency and data quality. The patent outlines a method for discovering more webpage links while minimizing website traffic impact. It assesses downloaded content to predict the quality of undiscovered links, prioritizing high-value data and reducing redundant downloads. Efficient web data collection is crucial for training large language models (LLMs), which power AI systems like ChatGPT. Existing techniques struggle with incomplete link retrieval, excessive downloads that can crash websites, and low-quality data filtering. DeepSeek’s proposed system aims to solve these issues by optimizing data allocation and maintaining metadata accuracy. [iThome, in Chinese]
Related Articles
2025-06-27 07:47
1200 views
Texas vs. Arizona State football livestreams: kickoff time, streaming deals, and more
Wondering how to watch college football this season? Here are your best options: Best
Read More
2025-06-27 06:53
258 views
YouTube's 2024 trending topics: How news, fandom, and indie animation defined the year
It's undeniable that YouTube is where culture happens. It's how people get their news, consume hours
Read More
2025-06-27 06:36
1017 views
Best Bose QuietComfort earbuds deal: Save $50 at Amazon
SAVE $50:As of Dec. 3, Bose's QuietComfort earbuds are on sale for $129 after Cyber Monday. This is
Read More