Over a million developers have joined DZone.

How-to: download linked images from a website

·

There was actually a question that got me thinking – how would I implement a program that downloads pictures from a web page, that are pointed by some links?

Here is a sample console application I came up with:

using System;
using System.Collections.Generic;
using System.Net;
using System.Threading;
using System.IO;
using System.Text.RegularExpressions;
using System.Drawing;
namespace ConsoleApplication
{
class Program
{
static int totalFiles = 0;
static int currentFiles = 0;
static void Main(string[] args)
{
GetImages("<a href="http://www.textureking.com/index.php/category/all-textures%22);">http://www.textureking.com/index.php/category/all-textures");</a>
}
static void GetImages(string url)
{
string responseString;
HttpWebRequest initialRequest = (HttpWebRequest)WebRequest.Create(url);
using (HttpWebResponse initialResponse = (HttpWebResponse)initialRequest.GetResponse())
{
using (StreamReader reader = new StreamReader(initialResponse.GetResponseStream()))
{
responseString = reader.ReadToEnd();
}
}
List<string> imageset = new List<string>();
Regex regex = new Regex(@"f=""[^""]*jpg|bmp|tif|gif|png",RegexOptions.IgnoreCase);
foreach (Match m in regex.Matches(responseString))
{
if (!imageset.Contains(m.Value))
imageset.Add(m.Value);
}
for (int i = 0; i < imageset.Count; i++)
imageset[i] = imageset[i].Remove(0, 3);
totalFiles = imageset.Count;
currentFiles = totalFiles;
Console.WriteLine(totalFiles.ToString() + " images will be downloaded.");
foreach (string f in imageset)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(DownloadImage), f);
}
Console.Read();
}
static void DownloadImage(object path)
{
currentFiles--;
Console.WriteLine("Downloading " + Path.GetFileName(path.ToString()) + "... (" + (totalFiles - currentFiles).ToString() + "/" + totalFiles + ")");
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(path.ToString());
using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
{
Image image = Image.FromStream(response.GetResponseStream());
image.Save(@"D:\Temporary\" + Path.GetFileName(path.ToString()));
}
Console.WriteLine(Path.GetFileName(path.ToString()) + " downloaded.");
}
}
} 

The sample URL provided in the method call is used to download several textures linked on the webpage.

I am using regex to actually find the URLs. The case is ignored since I am not sure whether the file extensions are written with in caps or not. Since there is a chance for the same URL to be mentioned twice on the same page, I am making sure that there are no duplicates, so before adding the regex match to the List, I am checking if that already contains an entry for the match.

The final saving path also can be modified, but I decided to leave it hardcoded like this for testing purposes. In case you want to make the path dynamic, you can pass a generic collection or an array as the parameter for the DownloadImage method and then explicitly convert it and read the needed values (identified by an index, for example).

NOTE: I am using ThreadPool here so all threads are automatically set as background – if the application is closed, the download process will be canceled. To avoid this and wait for all downloads to complete (which is probably not a good idea but still a possibility), the Thread class should be used with IsBackground set to false.

Topics:

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}