How-to: download linked images from a website
Join the DZone community and get the full member experience.
Join For FreeThere was actually a question that got me thinking – how would I implement a program that downloads pictures from a web page, that are pointed by some links?
Here is a sample console application I came up with:
using System; using System.Collections.Generic; using System.Net; using System.Threading; using System.IO; using System.Text.RegularExpressions; using System.Drawing; namespace ConsoleApplication { class Program { static int totalFiles = 0; static int currentFiles = 0; static void Main(string[] args) { GetImages("<a href="http://www.textureking.com/index.php/category/all-textures%22);">http://www.textureking.com/index.php/category/all-textures");</a> } static void GetImages(string url) { string responseString; HttpWebRequest initialRequest = (HttpWebRequest)WebRequest.Create(url); using (HttpWebResponse initialResponse = (HttpWebResponse)initialRequest.GetResponse()) { using (StreamReader reader = new StreamReader(initialResponse.GetResponseStream())) { responseString = reader.ReadToEnd(); } } List<string> imageset = new List<string>(); Regex regex = new Regex(@"f=""[^""]*jpg|bmp|tif|gif|png",RegexOptions.IgnoreCase); foreach (Match m in regex.Matches(responseString)) { if (!imageset.Contains(m.Value)) imageset.Add(m.Value); } for (int i = 0; i < imageset.Count; i++) imageset[i] = imageset[i].Remove(0, 3); totalFiles = imageset.Count; currentFiles = totalFiles; Console.WriteLine(totalFiles.ToString() + " images will be downloaded."); foreach (string f in imageset) { ThreadPool.QueueUserWorkItem(new WaitCallback(DownloadImage), f); } Console.Read(); } static void DownloadImage(object path) { currentFiles--; Console.WriteLine("Downloading " + Path.GetFileName(path.ToString()) + "... (" + (totalFiles - currentFiles).ToString() + "/" + totalFiles + ")"); HttpWebRequest request = (HttpWebRequest)WebRequest.Create(path.ToString()); using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) { Image image = Image.FromStream(response.GetResponseStream()); image.Save(@"D:\Temporary\" + Path.GetFileName(path.ToString())); } Console.WriteLine(Path.GetFileName(path.ToString()) + " downloaded."); } } }
The sample URL provided in the method call is used to download several textures linked on the webpage.
I am using regex to actually find the URLs. The case is ignored since I am not sure whether the file extensions are written with in caps or not. Since there is a chance for the same URL to be mentioned twice on the same page, I am making sure that there are no duplicates, so before adding the regex match to the List, I am checking if that already contains an entry for the match.
The final saving path also can be modified, but I decided to leave it hardcoded like this for testing purposes. In case you want to make the path dynamic, you can pass a generic collection or an array as the parameter for the DownloadImage method and then explicitly convert it and read the needed values (identified by an index, for example).
NOTE: I am using ThreadPool here so all threads are automatically set as background – if the application is closed, the download process will be canceled. To avoid this and wait for all downloads to complete (which is probably not a good idea but still a possibility), the Thread class should be used with IsBackground set to false.
Opinions expressed by DZone contributors are their own.
Comments