DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Curious about the future of data-driven systems? Join our Data Engineering roundtable and learn how to build scalable data platforms.

Data Engineering: The industry has come a long way from organizing unstructured data to adopting today's modern data pipelines. See how.

Threat Detection: Learn core practices for managing security risks and vulnerabilities in your organization — don't regret those threats!

Managing API integrations: Assess your use case and needs — plus learn patterns for the design, build, and maintenance of your integrations.

Avatar

Stoimen Popov

Senior Frontend Developer at MaYoMo

sofia, BG

Joined Apr 2010

About

I've been working on web based projects built mainly with PHP and JavaScript, where I mostly use Zend Framework and jQuery. I am interested in any webpage optimizations techniques - for a faster web!

Stats

Reputation: 2
Pageviews: 1.5M
Articles: 13
Comments: 6
  • Articles
  • Comments

Articles

article thumbnail
Algorithm of the Week: Shortest Path in a Directed Acyclic Graph
Introduction We saw how to find the shortest path in a graph with positive edges using the Dijkstra’s algorithm. We also know how to find the shortest paths from a given source node to all other nodes even when there are negative edges using the Bellman-Ford algorithm. Now we’ll see that there’s a faster algorithm running in linear time that can find the shortest paths from a given source node to all other reachable vertices in a directed acyclic graph, also known as a DAG. Because the DAG is acyclic we don’t have to worry about negative cycles. As we already know it’s pointless to speak about shortest path in the presence of negative cycles because we can “loop” over these cycles and practically our path will become shorter and shorter. The presence of a negative cycles make our attempt to find the shortest path pointless! Thus we have two problems to overcome with Dijkstra and the Bellman-Ford algorithms. First of all we needed only positive weights and on the second place we didn’t want cycles. Well, we can handle both cases in this algorithm. Overview The first thing we know about DAGs is that they can easily be topologically sorted. Topological sort can be used in many practical cases, but perhaps the mostly used one is when trying to schedule dependent tasks. Topological sort is often used to “sort” dependent tasks! After a topological sort we end with a list of vertices of the DAG and we’re sure that if there’s an edge (u, v), u will precede v in the topologically sorted list. If there’s an edge (u,v) then u must precede v. This results in the more general case from the image. There’s no edge between B and D, but B precedes D! This information is precious and the only thing we need to do is to pass through this sorted list and to calculate distances for a shortest paths just like the algorithm of Dijkstra. OK, so let’s summarize this algorithm: - First we must topologically sort the DAG; - As a second step we set the distance to the source to 0 and infinity to all other vertices; - Then for each vertex from the list we pass through all its neighbors and we check for shortest path; It’s pretty much like the Dijkstra’s algorithm with the main difference that we used a priority queue then, while this time we use the list from the topological sort. Code This time the code is actually a pseudocode. Although all the examples so far was in PHP, perhaps pseudocode is easier to understand and doesn’t bind you in a specific language implementation. Also if you don’t feel comfortable with the given programming language it can be more difficult for you to understand the code than by reading pseudocode. 1. Topologically sort G into L; 2. Set the distance to the source to 0; 3. Set the distances to all other vertices to infinity; 4. For each vertex u in L 5. - Walk through all neighbors v of u; 6. - If dist(v) > dist(u) + w(u, v) 7. - Set dist(v) <- dist(u) + w(u, v); Application It’s clear why and where we must use this algorithm. The only problem is that we must be sure that the graph doesn’t have cycles. However if we’re aware of how the graph is created we may have some additional information if there are cycles or not – then this linear time algorithm can be very applicable.
October 30, 2012
· 27,583 Views
article thumbnail
Algorithm of the Week: Graphs and Their Representation
Although this post is supposed to be about algorithms I’ll cover more on graphs and their computer representation.
September 4, 2012
· 58,057 Views · 8 Likes
article thumbnail
Algorithm of the Week: How to Determine the Day of the Week
Do you know what day of the week was the day you were born? Monday or maybe Saturday? Well, perhaps you know that. Everybody knows the day he’s born on, but do you know what day was the 31st of January in 1883? No? Well, there must be some method to determine any day in any century. We know that 2012 started at Sunday. After we know that, it’s easy to determine what day is the 2nd of January. It should be Monday. But things get a little more complex if we try to guess some date distant from January the 1st. Indeed 1st of Jan was on Sunday, but what day is 9th of May the same year. This is far more difficult to say. Of course we can go with a brute force approach and count from 1/Jan till 9/May, but that is quite slow and error prone. So what would we do if we had to code a program that answers this question? The easiest way is to use a library. Almost every major library has built-in functions that can answer what day is on a given date. Such are date() in PHP or getDate() in JavaScript. But the question remains: How these library functions know the answer and how can we code such library functions if our library doesn’t support such functionality? There must be some algorithm to help us. Overview Because months have different number of days, and most of them aren’t divisible by 7 without a remainder, months begin on different days of the week. Thus, if January begins on Sunday, the month of February the same year will begin on Wednesday. Of course, in common years February has 28 days, which fortunately is divisible by 7 and thus February and March both begin on the same day, which is great, but isn’t true for leap years. What Do We Know About the Calendar First thing to know is that each week has exactly 7 days. We also know that a common year has 365 days, while a leap year has one day more – 366. Most of the months have 30 or 31 days, but February has only 28 days in common years and 29 in leap years. Because 365 mod 7 = 1 in a common year each year begins exactly on the next day of the preceding year. Thus if 2011 started on Saturday, 2012 starts on Sunday. And yet again, that is because 2011 is not a leap year. What else do we know? Because a week has exactly seven days only February (with its 28 days in a common year) is divisible by 7 (28 mod 7 = 0) and has exactly four weeks in it. Thus in a common year February and March start on a same day. Unfortunately that is not true about the other months. All these things we know about the calendar are great, so we can make some conclusions. Although eleven of the months have either 30 or 31 days they don’t start on a same day, but some of the months do appear to start on a same day just because the number of days between them is divisible by 7 without a remainder. Let’s take a look on some examples. For instance September has 30 days, as does November, while October, which is in between them has 31 days. Thus 30+30+31 makes 91. Fortunately 91 mod 7 = 0. So for each year September and December start on the same day (as they are after February they don’t depend on leap years). The same thing occurs to April and July and the good news is that in leap years even January starts on the same day as April and July. Now we know that there are some relations between months. Thus, if we know somehow that the 13th of April is Monday, we’ll be sure that 13th of July is also Monday. Let’s see now a summary of these observations. We can also refer to the following diagram. For leap years there are other corresponding months. Let’s take a look at the following image. Another way to get the same information is the following table. We also know that leap years happen to occur once every four years. However, if there is a common year like the year 2001, which will be the next year that is common and starts and corresponds exactly on 2001? Because of leap years we can have a year starting on one of the seven days of the week and to be either leap or common. This means just 14 combinations. Following these observations we can refer to the following table. You can clearly see the pattern “6 4 2 0” Here’s the month table. Columns 2 and 3 differs only for January and February. Clearly the day table is as follows: Now let’s go back to the algorithm. Using these tables and applying a simple formula, we can calculate what day was on some given date. Here are the steps of this algorithm. Get the number for the corresponding century from the centuries table; Get the last two digits from the year; Divide the number from step 2 by 4 and get it without the remainder; Get the month number from the month table; Sum the numbers from steps 1 to 4; Divide it by 7 and take the remainder; Find the result of step 6 in the days table; Implementation First let’s take a look at a simple and practical example of the example above and then the code. Let’s answer the question from the first paragraph of this post. What day was on January 31st, 1883? Take a look at the centuries table: for 1800 – 1899 this is 2. Get the last two digits from the year: 83. Divide 83 by 4 without a remainder: 83/4 = 20 Get the month number from the month table: Jan = 0. Sum the numbers from steps 1 to 4: 2 + 83 + 20 + 0 = 105. Divide it by 7 and take the remainder: 105 mod 7 = 0 Find the result of step 6 in the days table: Sunday = 0. The following code in PHP implements the algorithm above. function get_century_code($century) { // XVIII if (1700 <= $century && $century <= 1799) return 4; // XIX if (1800 <= $century && $century <= 1899) return 2; // XX if (1900 <= $century && $century <= 1999) return 0; // XXI if (2000 <= $century && $century <= 2099) return 6; // XXII if (2100 <= $century && $century <= 2199) return 4; // XXIII if (2200 <= $century && $century <= 2299) return 2; // XXIV if (2300 <= $century && $century <= 2399) return 0; // XXV if (2400 <= $century && $century <= 2499) return 6; // XXVI if (2500 <= $century && $century <= 2599) return 4; // XXVII if (2600 <= $century && $century <= 2699) return 2; } /** * Get the day of a given date * * @param $date */ function get_day_from_date($date) { $months = array( 1 => 0,// January 2 => 3,// February 3 => 3,// March 4 => 6,// April 5 => 1,// May 6 => 4,// June 7 => 6,// July 8 => 2,// August 9 => 5,// September 10 => 0,// October 11 => 3,// November 12 => 5,// December ); $days = array( 0 => 'Sunday', 1 => 'Monday', 2 => 'Tuesday', 3 => 'Wednesday', 4 => 'Thursday', 5 => 'Friday', 6 => 'Saturday', ); // calculate the date $dateParts = explode('-', $date); $century = substr($dateParts[2], 0, 2); $year = substr($dateParts[2], 2); // 1. Get the number for the corresponding century from the centuries table $a = get_century_code($dateParts[2]); // 2. Get the last two digits from the year $b = $year; // 3. Divide the number from step 2 by 4 and get it without the remainder $c = floor($year / 4); // 4. Get the month number from the month table $d = $months[$dateParts[1]]; // 5. Sum the numbers from steps 1 to 4 $e = $a + $b + $c + $d; // 6. Divide it by 7 and take the remainder $f = $e % 7; // 7. Find the result of step 6 in the days table return $days[$f]; } // Sunday echo get_day_from_date('31-1-1883'); Application This algorithm can be applied in many different cases although most of the libraries have built-in functions that can do that. The only problem besides that is that there are much more efficient algorithms that don’t need additional space (tables) of data. However this algorithm isn’t difficult to implement and it gives a good outlook of some facts in the calendar.
April 24, 2012
· 61,041 Views · 1 Like
article thumbnail
Algorithm of the Week: Rabin-Karp String Searching
Brute force string matching is a very basic sub-string matching algorithm, but it’s good for some reasons. For example it doesn’t require preprocessing of the text or the pattern. The problem is that it’s very slow. That is why in many cases brute force matching can’t be very useful. For pattern matching we need something faster, but to understand other sub-string matching algorithms let’s take a look once again at brute force matching. In brute force sub-string matching we checked every single character from the text with the first character of the pattern. Once we have a match between them we shift the comparison between the second character of the pattern with the next character of the text, as shown on the picture below. This algorithm is slow for mainly two reasons. First, we have to check every single character from the text. On the other hand even if we find a match between a text character and the first character of the pattern we continue to check step by step (character by character) every single symbol of the pattern in order to find whether it is in the text. So is there any other approach to find whether the text contains the pattern? In fact there is a “faster” approach. In this case, in order to avoid the comparison between the pattern and the text character by character, we’ll try to compare them all at once, so we need a good hash function. With its help we can hash the pattern and check against hashed sub-strings of the text. We must be sure that the hash function is returning “small” hash codes for larger sub-strings. Another problem is that for larger patterns we can’t expect to have short hashes. But besides this the approach should be quite effective compared to the brute force string matching. This approach is known as Rabin-Karp algorithm. Overview Michael O. Rabin and Richard M. Karp came up with the idea of hashing the pattern and to check it against a hashed sub-string from the text in 1987. In general the idea seems quite simple, the only thing is that we need a hash function that gives different hashes for different sub-strings. Said hash function, for instance, may use the ASCII codes for every character, but we must be careful for multi-lingual support. The hash function may vary depending on many things, so it may consist of ASCII char to number converting, but it can also be anything else. The only thing we need is to convert a string (pattern) into some hash that is faster to compare. Let’s say we have the string “hello world”, and let’s assume that its hash is hash(‘hello world’) = 12345. So if hash(‘he’) = 1 we can say that the pattern “he” is contained in the text “hello world”. So in every step, we take from the text a sub-string with the length of m, where m is the pattern length. Thus we hash this sub-string and we can directly compare it to the hashed pattern, as in the picture above. Implementation So far we saw some diagrams explaining the Rabin-Karp algorithm, but let’s take a look at its implementation here, in this very basic example where a simple hash table is used in order to convert the characters into integers. The code is PHP and it’s used only to illustrate the principles of this algorithm. function hash_string($str, $len) { $hash = ''; $hash_table = array( 'h' => 1, 'e' => 2, 'l' => 3, 'o' => 4, 'w' => 5, 'r' => 6, 'd' => 7, ); for ($i = 0; $i < $len; $i++) { $hash .= $hash_table[$str{$i}]; } return (int)$hash; } function rabin_karp($text, $pattern) { $n = strlen($text); $m = strlen($pattern); $text_hash = hash_string(substr($text, 0, $m), $m); $pattern_hash = hash_string($pattern, $m); for ($i = 0; $i < $n-$m+1; $i++) { if ($text_hash == $pattern_hash) { return $i; } $text_hash = hash_string(substr($text, $i, $m), $m); } return -1; } // 2 echo rabin_karp('hello world', 'ello'); Multiple Pattern Match It’s great to say that the Rabin-Karp algorithm is great for multiple pattern match. Indeed its nature is supposed to support such functionality, which is its advantage in comparison to other string searching algorithms. Complexity The Rabin-Karp algorithm has the complexity of O(nm) where n, of course, is the length of the text, while m is the length of the pattern. So where is it compared to brute-force matching? Well, brute force matching complexity is O(nm), so as it seems there’s not much of a gain in performance. However, it’s considered that Rabin-Karp’s complexity is O(n+m) in practice, and that makes it a bit faster, as shown on the chart below. Note that the Rabin-Karp algorithm also needs O(m) preprocessing time. Application As we saw Rabin-Karp is not much faster than brute force matching. So where we should use it? 3 Reasons Why Rabin-Karp is Cool 1. Good for plagiarism, because it can deal with multiple pattern matching! 2. Not faster than brute force matching in theory, but in practice its complexity is O(n+m)! 3. With a good hashing function it can be quite effective and it’s easy to implement! 2 Reasons Why Rabin-Karp is Not Cool 1. There are lots of string matching algorithms that are faster than O(n+m) 2. It’s practically as slow as brute force matching and it requires additional space Final Words Rabin-Karp is a great algorithm for one simple reason – it can be used to match against multiple patterns. This makes it perfect to detect plagiarism even for larger phrases.
April 3, 2012
· 36,217 Views
article thumbnail
Algorithm of the Week: Brute Force String Matching
String matching is something crucial for database development and text processing software. Fortunately, every modern programming language and library is full of functions for string processing that help us in our everyday work. However it's important to understand their principles. String algorithms can typically be divided into several categories. One of these categories is string matching. When it comes to string matching, the most basic approach is what is known as brute force, which simply means to check every single character from the text to match against the pattern. In general we have a text and a pattern (most commonly shorter than the text). What we need to do is to answer the question whether this pattern appears in the text. Overview The principles of brute force string matching are quite simple. We must check for a match between the first characters of the pattern with the first character of the text as on the picture bellow. If they don’t match, we move forward to the second character of the text. Now we compare the first character of the pattern with the second character of the text. If they don’t match again, we move forward until we get a match or until we reach the end of the text. In case they match, we move forward to the second character of the pattern comparing it with the “next” character of the text, as shown in the picture bellow. Just because we have found a match between the first character from the pattern and some character of the text, doesn’t mean that the pattern appears in the text. We must move forward to see whether the full pattern is contained in the text. Implementation Implementation of brute force string matching is easy and here we can see a short PHP example. The bad news is that this algorithm is naturally quite slow. function sub_string($pattern, $subject) { $n = strlen($subject); $m = strlen($pattern); for ($i = 0; i < $n-$m; $i++) { $j = 0; while ($j < $m && $subject[$i+$j] == $pattern[$j]) { $j++; } if ($j == $m) return $i; } return -1; } echo sub_string('o wo', 'hello world!'); Complexity As I said this algorithm is slow. Actually every algorithm that contains “brute force” in its name is slow, but to show how slow string matching is, I can say that its complexity is O(n.m). Here n is the length of the text, while m is the length of the pattern. In case we fix the length of the text and test against variable length of the pattern, again we get a rapidly growing function. Application Brute force string matching can be very ineffective, but it can also be very handy in some cases. Just like the sequential search. It can be very useful… Doesn’t require pre-processing of the text – Indeed if we search the text only once we don’t need to pre-process it. Most of the algorithms for string matching need to build an index of the text in order to search quickly. This is great when you’ve to search more than once into a text, but if you do only once, perhaps (for short texts) brute force matching is great! Doesn’t require additional space – Because brute force matching doesn’t need pre-processing it also doesn’t require more space, which is one cool feature of this algorithm Can be quite effective for short texts and patterns It can be ineffective… If we search the text more than once – As I said in the previous section if you perform the search more than once it’s perhaps better to use another string matching algorithm that builds an index, and it’s faster. It’s slow – In general brute force algorithms are slow and brute force matching isn’t an exception. Final Words String matching is something very special in software development and it is used in various cases, so every developer must be familiar with this topic.
March 27, 2012
· 61,096 Views · 3 Likes
article thumbnail
Algorithm of the Week: Data Compression with Prefix Encoding
Prefix encoding, sometimes called front encoding, is yet another algorithm that tries to remove duplicated data in order to reduce its size. Its principles are simple, however this algorithm tends to be difficult to implement. To understand why, first let’s take a look at its nature. Have a look on the following dictionary. use used useful usefulness uselss uselessly uselessness Instead of keeping all these words in plain text or transferring all them over a network, we can compress (encode) them with prefix encoding. It’s clear that each of these words begins with the prefix “use” which is also the first word from the list. So we can easily compress them into the following array. $data = array( 0 => 'use', 1 => '0d', 2 => '0ful', 3 => '0fully', 4 => '0less', 5 => '0lessly', 6 => '0lessness', ); It’s clear that this is not the best compression and we can go even further by using not only the first word as prefix. $data = array( 0 => 'use', 1 => '0d', 2 => '0ful', 3 => '2ly', 4 => '0less', 5 => '4ly', 6 => '4ness', ); Now the compression is better and the good news is that decompression is a fairly simple process. However the tricky part is compression itself. The problem is that it is quite difficult to chose an appropriate prefix. In our first example this is simple, but most of the time in practice we will have more heterogeneous data. Indeed the process of compression can be very difficult for randomly generated data and the algorithm will be not only slow, but difficult to implement. The good thing is that this algorithm can be used in many cases once we know the data format in advance. So let’s see three examples where this algorithm can be very handy. Application Here are three examples of prefix encoding. As I stated above, the process of compression can be very difficult for random data, so it is a good practice to use it only if you know in advance the format of the input data. Date and time prefixes We humans often skip the first two digits of a year, so for instance we don’t always write 1995 or 1996, but we use the shorter – ‘95 and ‘96. Thus years can be encoded with shorter strings. input: (1991, 1992, 1993, 1994, 1995, 1996 output: (91, 92, 93, 94, 95, 96) The problem is that with small changes of the input stream we can confuse the decoder. Thus if we add years from the 21st century we lose the uniqueness of the data. input: (1998, 1992, 1999, 2011, 2012) output: (98, 92, 99, 11, 12) Now the decoder can decode the last two values as (1911, 1912) as “19” is considered to be the prefix. So we must know in advance that our prefix is absolutely equal for each of the values. If not the encoding format must be different. For instance we can also encode the prefix, with some special marker. input: (1998, 1992, 1932, 1924, 2001, 2012) output: (#19, 98, 92, 32, 24, #20, 01, 12) Once the decoder reads the # character it will know to decode the following number as prefix. This can be used in practice for date and time formats. Let’s say we have some datetime values, but we know that all of them are in the same day. 2012-01-31 15:33:45 2012-01-31 16:12:11 2012-01-31 17:32:35 2012-01-31 18:54:34 Obviously we can omit the date part of these strings and send (keep) only the time. Once again, we must be absolutely sure that all these values are in the same day. If not, we can use the encoding strategy of the previous example. Phone numbers Phone numbers are the typical case of prefix encoding. Not only the international code, but also the mobile network operators use prefixes for their phone numbers. Thus if we have to transfer phone numbers from, let’s say the UK, we can replace the leading “+44” with something shorter. If you happen to code a phone book for a mobile device you can save some space by compressing the data using prefix encoding and thus the user will have more space and will store more phone numbers on his or her mobile device. Phone number prefixes can be also used for database normalization. Thus you can store them in a separate db table and leave only the unique numbers from the phonebook. Geo Coordinates Using the same example from my previous post we can send GEO coordinates by removing a common prefix, for large levels of zoom. Indeed when you need to send lots of markers to your map application you can expect all of these markers to be fairly close to each other in large zoom level. On large zoom levels we can expect markers to have the same prefix. Now the coordinates of those points can have a common prefix, like the example bellow with the Subway stations. LatLon(40.762959,-73.985989) LatLon(40.761886,-73.983629) LatLon(40.762861,-73.981612) LatLon(40.764616,-73.98056) We can see that all of these GEO points have the same prefix (40.76x, -73.98x), so we can send the prefix only once. Prefix: (40.76,-73.98) Data: LatLon(2959,5989) LatLon(1886,3629) LatLon(2861,1612) LatLon(4616,056) These are only three examples of prefix encoding and this algorithm is very useful when transferring homogeneous data. Suffix Encoding Suffix encoding is practically the same algorithm as prefix encoding, with the small difference that we use to encode duplicating suffixes. Like the examples below, suffix encoding can be useful in replacing repeating last name suffixes. Johsnon Clark Jackson Or company names. Apple Inc. Google Inc. Yahoo! Inc. Here we can replace “ Inc.” with something else, but shorter. Related posts: Computer Algorithms: Data Compression with Relative Encoding Computer Algorithms: Data Compression with Run-length Encoding Computer Algorithms: Data Compression with Diagram Encoding and Pattern Substitution Source: http://www.stoimen.com/blog/2012/02/06/computer-algorithms-data-compression-with-prefix-encoding/
February 7, 2012
· 18,476 Views
article thumbnail
Algorithm of the Week: Data Compression with Relative Encoding
Overview Relative encoding is another data compression algorithm. While run-length encoding, bitmap encoding and diagram and pattern substitution were trying to reduce repeating data, with relative encoding the goal is a bit different. Indeed run-length encoding was searching for long runs of repeating elements, while pattern substitution and bitmap encoding were trying to “map” where the repetitions happen to occur. The only problem with these algorithms is that the input stream of data is not always constructed out of repeating elements. It is clear that if the input stream contains many repeating elements there must be some way of reducing them. However that doesn’t mean that we cannot compress data if there are no repetitions. It all depends on the data. Let’s say we have the following stream to compress. 1, 2, 3, 4, 5, 6, 7 It's hard to imagine how this stream of data can be compressed. The same problem may occur when trying to compress the alphabet. Indeed the letters of the alphabet are the very base of words so it is the minimal part for word construction and therefore hard to compress. Fortunately this isn’t true always. An algorithm that tries to deal with non-repeating data is relative encoding. Let’s see the following input stream – years from a given decade (the 90′s). 1991, 1991, 1999, 1998, 1991, 1993, 1992, 1992 Here we have 39 characters and we can reduce them. A natural approach is to remove the leading “19” as we humans often do. 91, 91, 99, 98, 91, 93, 92, 92 Now we have a shorter string, but we can go even further by keeping only the first year. All other years will as relative to this year. 91, 0, 8, 7, 0, 2, 1, 1 Now the volume of transferred data is reduced a lot (from 39 to 16 – more than 50%). However there are some questions we need to answer first, because the stream wont always be formatted in such a pretty way. How about the next character stream? 91, 94, 95, 95, 98, 100, 101, 102, 105, 110 We see that the value 100 is somehow in the middle of the interval and it is handy to use it as a base value for the relative encoding. Thus the stream above will become: -9, -6, -5, -5, -2, 100, 1, 2, 5, 10 The problem is that we can’t always decide which value will be the base value so easily. What if the data was dispersed in a different way: 96, 97, 98, 99, 100, 101, 102, 103, 999, 1000, 1001, 1002 Now the value of “100” isn’t useful, because compressing the stream will get something like this: -4, -3, -2, -1, 100, 1, 2, 3, 899, 900, 901, 902 To group the relative values around “some” base values will be far more handy. (-4, -3, -2, -1, 100, 1, 2, 3) (-1, 1000, 1, 2) However, to decide which value will be the base value isn’t that easy. Also the encoding format is not so trivial. On the other hand, this type of encoding can be useful in some specific cases as we can see below. Implementation The implementation of this algorithm depends on the specific task and the format of the data stream. Assuming that we have to transfer the stream of years in JSON from a web server to a browser, here’s a short PHP snippet. // JSON: [1991,1991,1999,1998,1999,1998,1995,1997,1994,1993] $years = array(1991,1991,1999,1998,1999,1998,1995,1997,1994,1993); function relative_encoding($input) { $output = array(); $inputLength = count($input); $base = $input[0]; $output[] = $base; for ($i = 1; $i < $inputLength; $i++) { $output[] = $input[$i] - $base; } return $output; } // JSON: [1991,0,8,7,8,7,4,6,3,2] echo json_encode(relative_encoding($years)); Application This algorithm may be very useful in many cases, such as this one: there are plenty of map applications around the web. Some products such as Google Maps, Yahoo! Maps, Bing Maps are quite famous, while there are also very useful open source projects like OpenStreetMap. The web sites using these apps number in the thousands. A typical use case is to transfer lots of Geo coordinates from a web server to a browser using JSON. Indeed any GEO point on Earth is relative to the point (0,0), which is located near the west coast of Africa, however on large zoom levels, when there are tons of markers we can transfer the information with relative encoding. For instance the following diagram shows San Francisco with some markers on it. The coordinates are relative to the point (0,0) on Earth. Map markers can be relative to the (0, 0) point on Earth, which can occasionally be useless. Far more useful may be to encode those markers, relative to the center of the city, thus we can save some space. Relative encoding can be useful for map markers on a large zoom level, however this type of compression can be tricky. For example, when dragging the map and updating the marker array. On the other hand, we must group markers if we have to load more than one city. That’s why we must be careful when implementing it. But it can be very useful – for instance on initial load of the map we can reduce data and speed up the load time. The thing is that with relative encoding we can save only changes to base value (data) – something like version control systems and thus reducing data transfer and load. Here’s a graphical example. In the first case on the diagram below we can see that each item is stored on its own. It doesn’t depend on the adjacent items and it can be completely independent of them. However we can keep full info only for the first item and any other item will be relative to it, like on the diagram bellow. Source: http://www.stoimen.com/blog/2012/01/30/computer-algorithms-data-compression-with-relative-encoding/
January 31, 2012
· 17,052 Views
article thumbnail
Algorithm of the Week: Data Compression with Diagram Encoding and Pattern Substitution
Two variants of run-length encoding are the diagram encoding and the pattern substitution algorithms. The diagram encoding is actually a very simple algorithm. Unlike run-length encoding, where the input stream must consists of many repeating elements, “aaaaaaaa” for instance, which are very rare in a natural language, there are many so-called “diagrams” in almost any natural language. In plain English there are some diagrams such as “the”, “and”, “ing” (in the word “waiting” for example), “ a”, “ t”, “ e” and many doubled letters. Actually we can extend those diagrams by adding surrounding spaces. Thus we can encode not only “the”, but “ the “, which are 5 characters (2 spaces and 3 letters) with something shorter. On the other hand, as I said, in plain English there are too many doubled letters, which unfortunately aren’t something special for run-length encoding and the compression ratio will be small. Even worse the encoded text may happen to be longer than the input message. Let’s see some examples. Let’s say we’ve to encode the message “successfully accomplished”, which consists of four doubled letters. However to compress it with run-length encoding we’ll need at least 8 characters, which doesn’t help us a lot. // 8 chars replaced by 8 chars!? input: "successfully accomplished" output: "su2ce2sfu2ly a2complished" The problem is that if the input text contains numbers, “2” in particular, we’ve to chose an escape symbol (“@” for example), which we’ll use to mark where the encoded run begins. Thus if the input message is “2 successfully accomplished tasks”, it will be encoded as “2 su@2ce@2sfu@2ly a@2complished tasks”. Now the output message is longer!!! than the input string. // the compressed message is longer!!! input: "2 successfully accomplished" output: "2 su@2ce@2sfu@2ly a@2complished tasks" Again if the input stream contains the escape symbol, we have to find another one, and the problem is that it is often too difficult to find short escape symbol that doesn’t appear in the input text, without a full scan of the text. That is why run-length encoding isn’t a good solution when compressing plain text, where long runs rarely appear. Well, of course, there are exceptions. For example such an exception is the lossy text compression with run-length encoding. It is intuitively clear that compressing text with loss is rarely useful, especially when you’ve to decompress exactly the same text. However there are some cases that lossy compression may be useful. Such case can be removing spaces. Indeed the text “successfully accomplished” brings us exactly the same information as “successfully accomplished”. In this case we can simply remove those spaces. Indeed we can use a marker to indicate the long run of spaces like “successfully@6 accomplished” in order to decompress the input string with absolutely no loss, but we can also throw those symbols away. This desision depends on the goal. Exactly with the same goal in mind we can remove new lines and tabs, only if we’re sure that the sense of the text is preserved. Yet again, a problem is that such long runs don’t happen to occur in random texts. That is why it’s better to use diagram encoding for plain text compression instead of run-length encoding. A Few Questions After understanding the principles of the diagram encoding, let’s see some examples. In the example above it is better to replace doubled letters with something shorter. Let’s say # for “cc”, @ for “ss” and % for “ll”. Thus the input text will be compressed as “su#e@fu%y a#omplished”, which is shorter. But yet again what will happen if the input message contains one of the substitutions? Also we can’t say if there are many doubled letters and enough reasonable substitutions for them. A better approach is to replace patterns. Run-length encoding isn't a good approach for text compression, because long runs rarely appear in a natural language. Pattern Substitution The pattern substitution algorithm is a variant of the diagram encoding. As I said above in plain English a very commonly used pattern can be “ the “, which is five characters long. We can now replace it with something like “$%” for example. In this case the message “I send the message” will become “I send$%message”. However there are some obstacles to overcome. The first problem is that we need to know the language and somehow to define commonly used patterns in a dictionary. What would happen with a message written in some language we don’t know nothing about. Let’s say – Latin like the example bellow. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Cras venenatis, sapien eget suscipit placerat, justo quam blandit mauris, quis tempor ante sapien sodales augue. Praesent ut mauris quam. Phasellus scelerisque, ante quis consequat tristique, metus turpis consectetur leo, vitae facilisis sapien mi eu sapien. Praesent vitae ligula elit, et faucibus augue. Sed rhoncus sodales dolor ut gravida. In quis augue ac nulla auctor mattis sed sed libero. Donec eget purus eget enim tempor porta vitae eget diam. Mauris aliquet malesuada ipsum, non pulvinar urna vestibulum ac. Donec feugiat velit vitae nunc cursus imperdiet. Donec accumsan faucibus dictum. Phasellus sed mauris sapien. Maecenas mi metus, tincidunt sed rhoncus nec, sodales non sapien. Clearly without knowing Latin it isn’t easy to define which are those commonly used patterns. The thing is that it’s better to use pattern substitution if you know in advance the set of words and characters. The second problem is related to decompression. It is obvious that we need to define a dictionary and this dictionary must be used when decoding the message. It will be great also if we find more patterns longer than three characters. If not, the compression ratio will be low. Unfortunately such patterns aren’t very common in any natural language. Diagram encoding and pattern substitution are far more suitable for text compression than run-length encoding. In fact, pattern substitution is very effective on compressing programming languages. Application It is interesting to answer the question, how to use diagram encoding or patter substitution to compress text in natural language, especially when we don’t know the language in detail? The answer hides in the question. We wont compress natural languages, but machine language. Exactly machine (programming) languages are limited to a smaller sets of words and symbols. Isn’t it true for any programing language? Like PHP, where words like “function”, “while”, “for”, “break”, “switch”, “foreach” happen to be often in use, or HTML with its defined set of tags. Perhaps the best example is CSS, where only the values of the properties can vary. CSS files also tend to have multiple new lines, tabs and spaces, which only humans read. The question here is why should we compress those file types. It’s clear that after the compression they will be completely useless, both for humans and machines. Yes, that is true, but what if we have to store versions of those files into a DB. Kind of a backup. Imagine you’re working for a web hosting company that has to store daily versions of the sites it’s hosting. Thus the volume of stored information even for small companies hosting only few sites can be enormous. The problem is that compressing those files with some conventional compressing tool isn’t a good idea. Thus we’ve to save a copy of the entire site every day, but as we know the difference between daily versions of a site can be small. A version control system is another solution, but then you’ve to store the plain text of the files. Perhaps a better approach is to compress the text using pattern substitution and then saving only differences – kind of version control, which can be done with “relative encoding”. Using the above method we can save lots of disk space and in the same time we can compress/decompress easily. Another good thing is that you can save only changes to the initial files, like version control, which can also be compressed. Implementation The implementation of this algorithm is again on PHP and tries only to describe the main principles of compression. In this case I tried to compress a CSS file using the compression above. Although this example is quite primitive we can see some interesting facts. First of all you only need encoding and decoding dictionaries. Practically the encoding and decoding processes are equal, so you don’t need to implement two different functions. Here in this example a native PHP function is used – str_replace, because the purpose of this algorithm is not to describe pattern substitution techniques, but pattern substitution. It assumes that today’s programming languages have string manipulation functions for the purposes of this task. $str = file_get_contents('large_style_file.css'); $encoding_dict = array( "\n" => '$0', 'text' => '$1', 'color' => '$2', 'display' => '$3', 'font' => '$4', 'width' => '$5', 'height' => '$6', ' ' => '', ); function replace_patterns($input, $dict) { foreach ($dict as $pattern => $replace) { $input = str_replace($pattern, $replace, $input); } return $input; } $result = replace_patterns($str, $encoding_dict); By only replacing few CSS properties I achieved almost 40% of compression ratio (as shown the diagram bellow). The initial file is 202 KB, while compressed it’s only 131 KB. Of course, it all depends on the CSS file, but how about replacing all property names with shorter ones. Perhaps then the compression will be even better. Source: http://www.stoimen.com/blog/2012/01/23/computer-algorithms-data-compression-with-diagram-encoding-and-pattern-substitution/
January 24, 2012
· 23,132 Views
article thumbnail
Algorithm of the Week: Data Compression with Bitmaps
In my previous post we saw how to compress data consisting of very long runs of repeating elements. This type of compression is known as “run-length encoding” and can be very handy when transferring data with no loss. The problem is that the data must follow a specific format. Thus the string “aaaaaaaabbbbbbbb” can be compressed as “a8b8”. Now a string with length 16 can be compressed as a string with length 4, which is 25% of its initial length without loosing any information. There will be a problem in case the characters (elements) were dispersed in a different way. What would happen if the characters are the same, but they don’t form long runs? What if the string was “abababababababab”? The same length, the same characters, but we cannot use run-length encoding! Indeed using this algorithm we’ll get at best the same string. In this case, however, we can see another fact. The string consists of too many repeating elements, although not arranged one after another. We can compress this string with a bitmap. This means that we can save the positions of the occurrences of a given element with a sequence of bits, which can be easily converted into a decimal value. In the example above the string “abababababababab” can be compressed as “1010101010101010”, which is 43690 in decimals, and even better AAAA in hexadecimal. Thus the long string can be compressed. When decompressing (decoding) the message we can convert again from decimal/hexadecimal into binary and match the occurrences of the characters. Well, the example above is too simple, but let’s say only one of the characters is repeating and the rest of the string consists of different characters like this: “abacadaeafagahai”. Then we can use bitmap only for the character “a” – “1010101010101010” and compress it as “AAAA bcdefghi”. As you can see all the example strings are exactly 16 characters and that is a limitation. To use bitmaps with variable length of the data is a bit tricky and it is not always easy (if possible) to decompress it. Basically bitmap compression saves the positions of an element that is repeated very often in the message! In the other hand bitmap compression is not only applicable on strings. We can compress also arrays, objects or any kind of data. The example from my previous post is very suitable. Then we had to transfer a large array from a server to the client (browser) using JSON. The data then was very suitable for “run-length encoding”. Now let’s assume we have the same data – a set of different years, which this time are dispersed in a different way. $data = array( 0 => 1991, 1 => 1992, 2 => 1993, 3 => 1994, 4 => 1991, 5 => 1992, 6 => 1993, 7 => 1992, 8 => 1991, 9 => 1991, 10 => 1991, 11 => 1992, 12 => 1992, 13 => 1991, 14 => 1991, 15 => 1992, ... ); The JSON will encoded message will be the following (a simple but yet very large javascript array). [1991,1992,1993,1994,1991,1992,1993,1992,1991,1991,1991,1992,1992,1991,1991,1992, ...] However if we use bitmap compression we’ll get a “shorter” array. $data = array( 0 => array(1991, '1000100011100110'), 1 => array(1992, '0100010100011001'), 2 => array(1993, '0010001000000000'), 3 => array(1994, '0001000000000000'), ); Now the JSON is: [[1991,"1000100011100110"],[1992,"0100010100011001"],[1993,"0010001000000000"],[1994,"0001000000000000"]] It is obvious that the compression ratio is getting better and better as the uncompressed data grows. In fact, most of us know bitmap compression from images, because this algorithm is largely used for image compression. We can imagine how successful it can be when compressing black and white images (as black and white can be represented as 0 and 1s). Actually it is used for more than two colors (256 for instance) and again the level of compression is very high. Implementation The following implementation on PHP aims only to illustrate the bitmap compressing algorithm. As we know this algorithm can be applicable for any kind of data structures. // too many repeating "a" characters $msg = 'aazahalavaatalawacamaahakafaaaqaaaiauaacaaxaauaxaaaaaapaayatagaaoafaawayazavaaaazaaabararaaaaakakaaqaarazacajaazavanazaaaeanaaoajauaaaaaxalaraaapabataaavaaab'; function bitmap($message) { $i = 0; $bits = $rest = ''; while ($v = $message[$i]) { if ($v == 'a') { $bits .= '1'; } else { $bits .= '0'; $rest .= $v; } $i++; } return number_format(bindec($bits), 0, '.', '') . $rest;; } echo bitmap($msg); // uncompressed: acaaaaadaaaabalaaeaaaaganaaxakaavawamaasavajawaaaayaauaaadalanagaeaeamaarafalaazaaaiasaanaahaaazaraxaalaahaaawaaajasamahaajaakarapanaakaoakaanawalaacamauaamaal // compressed: 152299251941730035874325065523548237677352452096zhlvtlwcmhkfqiucxuxpytgofwyzvzbrrkkqrzcjzvnzenojuxlrpbtvb Application This algorithm is very useful when there is an element in our data that repeats very often, so you need to investigate the nature of the data you want to compress. Actually because of this fact this algorithm is used for image compression as PNG8 or GIF. Source: http://www.stoimen.com/blog/2012/01/16/computer-algorithms-data-compression-with-bitmaps/
January 17, 2012
· 19,550 Views
article thumbnail
How to Check if a Date is More or Less Than a Month Ago with PHP
Let’s say we have the following problem: we have to check whether a date is more than a month ago or less than a month ago. Many developers go in the wrong direction by calculating the current month and then subtracting the number of months from it. Of course, this approach is slow and full of risks of allowing bugs. Since, two months before January, which is the first month of the year, is actually November, which is the eleventh month. Because of these pitfalls, this approach is entirely wrong. strtotime() is a lot more powerful than you think! The question is whether PHP cannot help us with built-in functions to perform these calculations for us. It is obvious, that from version 5.3.0 and later, there is an OOP section, which is great, but unfortunately this version is still not updated everywhere. So, how to accomplish the task? The Wrong Approach As I said, there are many ways to go in the wrong direction. One of them is to subtract 30 days from current date. This is completely wrong, because not every month has 30 days. Here, some developers will begin to predefine arrays to indicate the number of days in each month, which then will be used in their complicated calculations. Here is an example of this wrong approach. echo date('Y-m-d', strtotime(date('Y-m-d')) - 60*60*24*30); This line is full of mistakes. First of all strtotime(date(‘Y-m-d’)) can be replaced by the more elegant strtotime(‘now’), but for this later. Another big mistake is that 60*60*24*30, which is number of seconds in 30 days can be predefined as a constant. Eventually the result is wrong, because not every month has 30 days. The Correct Approach A small research of the problem and the functions in versions prior of 5.3.0 of PHP is needed. Typical case study of date formatting happen when working with dates from a database. The following code is a classical example. // 2008 05 23, 2008-05-23 is stored into the DB echo date('Y m d', strtotime('2008-05-23')); // 2008 May 23 echo date('Y F d', strtotime('2008-05-23')); The problem, perhaps, is that too often strtotime() is used like this, with exactly this type of strings. However much more interesting is that strtotime() can do much more. strtotime() Can Do Much More Let us first look at the documentation of this function. What parameters it accepts? int strtotime ( string $time [, int $now = time() ] ) The function expects to be given a string containing an English date format and will try to parse that format into a Unix timestamp (the number of seconds since January 1 1970 00:00:00 UTC), relative to the timestamp given in now, or the current time if now is not supplied. In particular we are interested in the first parameter, time. time - A date/time string. Valid formats are explained in Date and Time Formats. It is especially important to note what are the valid Date and Time Formats. Here are the supported formats, but most interesting are those that are Relative. Exactly these formats are very convenient in our case, because they give us the ability to work with human readable strings, and here are some examples from the documentation of strtotime(). echo strtotime("now"), "\n"; echo strtotime("10 September 2000"), "\n"; echo strtotime("+1 day"), "\n"; echo strtotime("+1 week"), "\n"; echo strtotime("+1 week 2 days 4 hours 2 seconds"), "\n"; echo strtotime("next Thursday"), "\n"; echo strtotime("last Monday"), "\n"; Thus, a valid string would be “1 month ago”. // if current date is 2011-11-04, this will return 2011-10-04 echo date('Y-m-d', strtotime('1 month ago')) Or “-1 month”: // the same as the example above echo date('Y-m-d', strtotime('-1 month')); It’s interesting that “+1 -1 month” is also a valid string. // 2011-10-04, if today's 2011-11-04 echo date('Y-m-d', strtotime('+1 -1 month')); In fact strtotime() can do a lot more than most of the developers have ever imagined. Maybe its frequent use with string formatted dates (2010-01-13) makes it a bit unknown. Here are some interesting use cases. // 1970-01-01, Calculations in braces are bad! echo date('Y-m-d', strtotime('(60*60) minute')); // 2 months into the future echo date('Y-m-d', strtotime('-2 months ago')); For instance, do you know how to get the date of the day before yesterday? Yes 2 days before today, but here’s yet another solution. // 1 day before yesterday echo date('Y-m-d', strtotime('yesterday -1 day')); Another example is the fully human readable: // get the first monday of the current month echo date('Y-m-d', strtotime('first monday this month')); The Solution of the Task Finally, what is the solution of the original task? Well, just have to check whether a date is more or less than a month ago. // a random date $my_date = '2011-09-23'; // true if my_date is more than a month ago (strtotime($my_date) < strtotime('1 month ago')) Related posts: Thing to Know About PHP Arrays javascript get locale month with full name PHP Strings: How to Get the Extension of a File Source: http://www.stoimen.com/blog/2011/11/04/how-to-check-if-a-date-is-more-or-less-than-a-month-ago-with-php/
December 18, 2011
· 37,610 Views
article thumbnail
PHP: Don’t Call the Destructor Explicitly
“PHP 5 introduces a destructor concept similar to that of other object-oriented languages, such as C++”[1] says the documentation for destructors, but let’s see the following class. class A { public function __construct() { echo 'building an object'; } public function __destruct() { echo 'destroying the object'; } } $obj = new A(); Well, as you can not call the constructor explicitly: $obj->__construct(); So we should not call the destructor explicitly: $obj->__destruct(); The problem is that I’ve seen this many times, but it’s a pity that this won’t destroy the object and it is still a valid PHP code. PHP destructors cannot be called explicitly! Constructors and destructors in PHP are part of the so called magic methods. Here’s what the doc page says about them. The function names __construct(), __destruct(), __call(), __callStatic(), __get(), __set(), __isset(), __unset(), __sleep(), __wakeup(), __toString(), __invoke(), __set_state() and __clone() are magical in PHP classes. You cannot have functions with these names in any of your classes unless you want the magic functionality associated with them. To be more precise let’s take a look of the definition of destructors. PHP 5 introduces a destructor concept similar to that of other object-oriented languages, such as C++. The destructor method will be called as soon as there are no other references to a particular object, or in any order during the shutdown sequence. What if I Call the Destructor Explicitly? Let’s see some examples! class A { public function __destruct() { echo 'destroying the object'; } } $obj = new A(); // prints hello world echo 'hello world'; // PHP interpreter stops the script execution and prints // 'destroying the object' This is actually a normal behavior. At the end of the script the interpreter frees the memory. Actually every object has a built-in destructor, just like it has built-in constructor. So even we don’t define it explicitly, the object has its destructor. Usually this destructor is executed at the end of the script, or whenever the object isn’t needed anymore. This can happen, for instance, at the end of a function body. Now if we call the destructor explicitly, which as I said I’ve seen many times, here’s what happens. class A { public function __destruct() { echo 'destroying the object'; } } $obj = new A(); // this is valid and it prints 'destroying the object' // BUT IT DOES NOT DESTROY THE OBJECT $obj->__destruct(); // prints hello world echo 'hello world'; // HERE PHP ACTUALLY DESTROYS THE $obj OBJECT // ... and again prints 'destroying the object' As you can see calling the destructor explicitly doesn’t destroy the object. So the question is … How to Destroy an Object Before the Script Stops? Well, to destroy an object you can assign a NULL value to it. class A { public function __destruct() { echo 'destroying the object'; } } $obj = new A(); // prints 'destroying the object' $obj = null; // prints 'hello world' echo 'hello world'; // the script stop its execution Caution Be aware that if you don’t clone the object $obj, and simply assign it to another variable, then $obj = null will be pointless. Let’s see the following example. class A { public function printMsg() { echo 'I still exist'; } public function __destruct() { echo 'destroying the object'; } } $obj = new A(); // $newObj is pointing to $obj $newObj = $obj; // this doesn't destroy $newObj // as it appears both $obj and $newObj point to the same memory // so PHP doesn't free this memory $obj = null; // prints 'i still exist' $newObj->printMsg(); // prints 'hello world' echo 'hello world'; // now the scripts destroys the "object", which in this // case is $newObj and prints 'destroying the object' This example shows us that actually assigning NULL to an object doesn’t quite destroy it if there are another objects pointing to the same memory. class A { public function printMsg() { echo 'I still exist'; } public function __destruct() { echo 'destroying the object'; } } $b = new A(); $d = $c = $b; $b = null; $c->printMsg(); $d->printMsg(); // prints 'destroying the object' ONLY ONCE In this last example there are two interesting things to note. First $b = null doesn’t call the destructor, and at the end of the script there’s only one implicit call of the destructor, although there are two objects. Conclusion The important thing to note is that you shouldn’t call the destructor of an object explicitly! Not because it will throw an fatal error, but simply because it won’t destroy the object. [1] PHP: Constructors and Destructors Related posts: Some Notes on the Object-oriented Model of PHP Construct a Sorted PHP Linked List Object Cloning and Passing by Reference in PHP Source: http://www.stoimen.com/blog/2011/11/14/php-dont-call-the-destructor-explicitly/
November 15, 2011
· 17,332 Views · 1 Like
article thumbnail
Reading GPS Latitude and Longitude from Image and Video Files
The State of GPS Data from Mobile Devices Most of the mobile devices today support GPS geo tagging. In fact most of them come bundled with navigation software that uses GPS and therefore all the pictures and (maybe) videos can be geo tagged. But as expected different vendors come with different support and formats. iPhone OS comes with geotagging both on video and image files, while the latest Android and Symbian (the Nokia main OS for smartphones) can geo tag only images. Even more – until recently Symbian didn’t support any geotagging before the installation of an additional software – such as Location Tagger . So generally the things are quite simple: iPhone OS geotags both video and image files; Android geotags only images; Symbian geotags only images – and on some devices this is possible only after installing a software; This is in breve the state of mobile device geotagging! Why Use GPS Data? Perhaps one of the main reasons why not support geotagging especially on video files can be the usage of those geo tags. First of all what a geotag means? You may know that even Android doesn’t “geotag” videos this is not quite true. Because after using you gallery you can see where those videos are shot. This is fantastic, but actually the real information about where the video has been taken is not into the video file, but it’s in an additional log file that keeps it. Thus actually the video files doesn’t know the geo coordinates. Here comes the problem with video format, because you cannot be sure that every format supports tags that can keep geo coordinates. Actually quicktime’s MOV can store them, while Symbian’s 3GP cannot. In fact Symbian cannot store any geo information about video files! So now we’ve at least three different formats for each of those three vendors. quicktime for iPhone mpeg-4 with Android 3gp with Symbian For now I can only say that iPhone can keep the geotag into his video files. But let me return to the question – why we need this geo tags? Until the video file is on the mobile device – there’s no problem. But once you try to download it – whether on Flickr, YouTube, Picasa, etc. you’ll lose any geo information if it’s not into the file tags. And of course if the above sites can’t read it! The general reason to store this data into the file is to move it along with the file. Once you move this file from your mobile device to a web platform you’ll see where the file has been created. EXIF, Exiftool and PHP’s exif_read_data There are several tools to read geotags. For images, and here we talk only for JPEGs, this is the EXIF information. You can download the exif command line program and try to reed data with it:
June 22, 2011
· 15,037 Views
article thumbnail
How to Collect the Images and Meta Tags from a Webpage with PHP
Meta Tags and the Facebook Example You’ve definitely seen the “share a link” screen in Facebook. When you paste a link into the box (fig. 1) and press the “Attach” button you’ll get the prompted cite parsed with a title, description and possibly thumb (fig. 2). This functionality is well known in Facebook, but it appears to be well known also in various social services. In fact Linkedin, Reddit, ‘s bookmarklet use it. fig. 1 - Facebook Attach a Link Prompt Screen Fist thing to notice is that this information, prompted by Facebook, is the same as the meta tag information. However there is a slight difference. fig. 2 - Facebook Attached Link Screen Facebook prefers for the thumb the image set into the . In the case above this tag appears to be: And the image pointed in the SRC attribute is exactly the same as the one prompted by Facebook (fig. 3). fig. 3 - Vimeo Thumb First thing to note is that the real thumb is bigger than the thumb shown in Facebook, so Facebook resizes it and the second thing to note is that there are more meta tags of the og:… format. Meta Tags and The Open Graph Protocol By default meta tags contain various information about the web page. They are not visible in the webpage, but contain some info about it. The most common meta tags are the title, description and keywords tags. They of course contain the title of the page, not that this can be different from the
February 25, 2011
· 24,033 Views · 1 Like

Comments

Propel Criteria Builder

Feb 22, 2012 · Stefan Koopmanschap

Actually both bubble sort and insertion sort have the same complexity - O(n*n) in the average case and O(n) in the best case. Which of them both are easier to implement, well it depends on the developer, but for sure both are quite easy.
How to Collect the Images and Meta Tags from a Webpage with PHP

Feb 28, 2011 · Katie Mckinsey

I don't say this are the exact regexs that can parse HTML. Agree that using regex is shaky but this is only a possible approach, not a tutorial how-to make it.

Yet again if regex is bad approach, what is the solution?

PHP: What is More Powerful Than list()

Aug 27, 2010 · Stoimen Popov

Yeah, I have never thought about that, but you're completely right!
How Windows Shuts Down?

Jul 08, 2010 · Tony Thomas

Hi, nobody says you should stop writing code and start writing comments. However when you write code it's obviously the most important thing to do, but does that mean that you should stop writing comments, just because the code is "super understandable"?
How Windows Shuts Down?

Jul 08, 2010 · Tony Thomas

Hi, nobody says you should stop writing code and start writing comments. However when you write code it's obviously the most important thing to do, but does that mean that you should stop writing comments, just because the code is "super understandable"?
How Windows Shuts Down?

Jul 08, 2010 · Tony Thomas

Hi, nobody says you should stop writing code and start writing comments. However when you write code it's obviously the most important thing to do, but does that mean that you should stop writing comments, just because the code is "super understandable"?

User has been successfully modified

Failed to modify user

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: