Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service
Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.
Programming languages allow us to communicate with computers, and they operate like sets of instructions. There are numerous types of languages, including procedural, functional, object-oriented, and more. Whether you’re looking to learn a new language or trying to find some tips or tricks, the resources in the Languages Zone will give you all the information you need and more.
Automate Migration Assessment With XML Linter
Embracing the Zen of Python: A Simple Guide to Mastering Pythonic Programming
In today’s interconnected world, communication is key, and what better way to enhance your application’s communication capabilities than by integrating Twilio with the Ballerina programming language? Ballerina, known for its simplicity and power in building cloud-native integrations, combines with Twilio’s versatile communication APIs to help you send SMS, make voice calls, send WhatsApp messages, and more. In this blog, we’ll explore how the ballerinax/twilio package can empower you to build robust communication features effortlessly. Prerequisites Install Ballerina Swan Lake and Ballerina VS Code plugin. Create a Twilio account. Obtain Twilio phone number. Obtain Twilio Auth Tokens. Obtain the Twilio WhatsApp number from the Console’s WhatsApp Sandbox. Sample 1: Send/Receive Calls and Messages With Ballerina Create a new Ballerina package using the command below. bal new twilio-samples This creates a new Ballerina package in the default module with the Ballerina.toml file, which identifies a directory as a package and a sample source file (i.e., main.bal) with a main function. To provide the configurations required, create a new file named Config.toml and add the send/receive phone numbers, SID, and auth token received from Twilio. The file structure within the package will look like below. Ballerina package structure Add the following code to main.bal file. Go import ballerina/log; import ballerinax/twilio; configurable string accountSId = ?; configurable string authToken = ?; configurable string fromNumber = ?; configurable string fromWhatsAppNumber = ?; configurable string toNumber = ?; configurable string message = "This is a test message from Ballerina"; //Create Twilio client final twilio:Client twilio = check new ({twilioAuth: {accountSId, authToken}); public function main() returns error? { //Send SMS twilio:SmsResponse smsResponse = check twilio->sendSms(fromNumber, toNumber, message); log:printInfo(string `SMS Response: ${smsResponse.toString()}`); //Get the details of SMS sent above twilio:MessageResourceResponse details = check twilio->getMessage(smsResponse.sid); log:printInfo("Message Detail: " + details.toString()); //Make a voice call twilio:VoiceCallResponse voiceResponse = check twilio->makeVoiceCall(fromNumber, toNumber, { userInput: message, userInputType: twilio:MESSAGE_IN_TEXT }); log:printInfo(string `Voice Call Response: ${voiceResponse.toString()}`); //Send whatsapp message twilio:WhatsAppResponse whatsappResponse = check twilio->sendWhatsAppMessage(fromWhatsAppNumber, toNumber, message); log:printInfo(string `WhatsApp Response: ${whatsappResponse.toString()}`); // Get Account Details twilio:Account accountDetails = check twilio->getAccountDetails(); log:printInfo(string `Account Details: ${accountDetails.toString()}`); } Add the configuration values to the Config.toml file. It will look like below. accountSId="xxxxxxxxxxxxxxxxxxxxxxx" authToken="xxxxxxxxxxxxxxxxxxxxxxx" fromNumber="+1xxxxxxxxxx" fromWhatsAppNumber="+1xxxxxxxxxx" toNumber="+1xxxxxxxxxx" Then, run the program using bal run command, and you will see the following logs. C++ time = 2023-08-29T16:54:47.536-05:00 level = INFO module = anupama/twilio_samples message = "SMS Response: {\"sid\":\"SM12099885cce2c78bf5f50903ca83d3ac\",\"dateCreated\":\"Tue, 29 Aug 2023 21:54:47 +0000\",\"dateUpdated\":\"Tue, 29 Aug 2023 21:54:47 +0000\",\"dateSent\":\"\",\"accountSid\":\"xxxxxxxxxxxxx\",\"toNumber\":\"+1xxxxxxxxxx\",\"fromNumber\":\"+1xxxxxxxxxx\",\"body\":\"Sent from your Twilio trial account - This is a test message from Ballerina\",\"status\":\"queued\",\"direction\":\"outbound-api\",\"apiVersion\":\"2010-04-01\",\"price\":\"\",\"priceUnit\":\"USD\",\"uri\":\"/2010-04-01/Accounts/xxxxxxxxxxxxx/Messages/SM12099885cce2c78bf5f50903ca83d3ac.json\",\"numSegments\":\"1\"}" time = 2023-08-29T16:54:47.694-05:00 level = INFO module = anupama/twilio_samples message = "Message Detail: {\"body\":\"Sent from your Twilio trial account - This is a test message from Ballerina\",\"numSegments\":\"1\",\"direction\":\"outbound-api\",\"fromNumber\":\"outbound-api\",\"toNumber\":\"+1xxxxxxxxxx\",\"dateUpdated\":\"Tue, 29 Aug 2023 21:54:47 +0000\",\"price\":\"\",\"errorMessage\":\"\",\"uri\":\"/2010-04-01/Accounts/xxxxxxxxxxxxx/Messages/SM12099885cce2c78bf5f50903ca83d3ac.json\",\"accountSid\":\"xxxxxxxxxxxxx\",\"numMedia\":\"0\",\"status\":\"sent\",\"messagingServiceSid\":\"\",\"sid\":\"SM12099885cce2c78bf5f50903ca83d3ac\",\"dateSent\":\"Tue, 29 Aug 2023 21:54:47 +0000\",\"dateCreated\":\"Tue, 29 Aug 2023 21:54:47 +0000\",\"errorCode\":\"\",\"priceUnit\":\"USD\",\"apiVersion\":\"2010-04-01\",\"subresourceUris\":\"{\"media\":\"/2010-04-01/Accounts/xxxxxxxxxxxxx/Messages/SM12099885cce2c78bf5f50903ca83d3ac/Media.json\",\"feedback\":\"/2010-04-01/Accounts/xxxxxxxxxxxxx/Messages/SM12099885cce2c78bf5f50903ca83d3ac/Feedback.json\"}\"}" time = 2023-08-29T16:54:47.828-05:00 level = INFO module = anupama/twilio_samples message = "Voice Call Response: {\"sid\":\"CAaa2e5a5c7591928f7e28c79da97e615a\",\"status\":\"queued\",\"price\":\"\",\"priceUnit\":\"USD\"}" time = 2023-08-29T16:54:47.993-05:00 level = INFO module = anupama/twilio_samples message = "WhatsApp Response: {\"sid\":\"SM3c272753409bd4814a60c7fd06d97232\",\"dateCreated\":\"Tue, 29 Aug 2023 21:54:47 +0000\",\"dateUpdated\":\"Tue, 29 Aug 2023 21:54:47 +0000\",\"dateSent\":\"\",\"accountSid\":\"xxxxxxxxxxxxx\",\"toNumber\":\"whatsapp:+1xxxxxxxxxx\",\"fromNumber\":\"whatsapp:+1xxxxxxxxxx\",\"messageServiceSid\":\"\",\"body\":\"This is a test message from Ballerina\",\"status\":\"queued\",\"numSegments\":\"1\",\"numMedia\":\"0\",\"direction\":\"outbound-api\",\"apiVersion\":\"2010-04-01\",\"price\":\"\",\"priceUnit\":\"\",\"errorCode\":\"\",\"errorMessage\":\"\",\"uri\":\"\",\"subresourceUris\":\"{\"media\":\"/2010-04-01/Accounts/xxxxxxxxxxxxx/Messages/SM3c272753409bd4814a60c7fd06d97232/Media.json\"}\"}" time = 2023-08-29T16:54:48.076-05:00 level = INFO module = anupama/twilio_samples message = "Account Details: {\"sid\":\"xxxxxxxxxxxxx\",\"name\":\"My first Twilio account\",\"status\":\"active\",\"type\":\"Trial\",\"createdDate\":\"Fri, 18 Aug 2023 21:14:20 +0000\",\"updatedDate\":\"Fri, 18 Aug 2023 21:14:54 +0000\"}" Also, you will get an SMS, a voice call, and a WhatsApp message to the specified number. Are you interested in seeing this sample's sequence diagram generated by the Ballerina VS Code plugin? You can see the interactions with Twilio clearly in this diagram without reading the code. The Sequence diagram can capture how the logic of your program flows, how the concurrent execution flow works, which remote endpoints are involved, and how those endpoints interact with the different workers in the program. See the Sequence diagram view for more details. Sequence diagram view of the sample Sample 2: Use TwiML for Programmable Messaging With Ballerina The TwiML (Twilio Markup Language) is a set of instructions you can use to tell Twilio what to do when you receive an incoming call, SMS, MMS, or WhatsApp message. Let’s see how to write a Ballerina program that makes a voice call with the instructions of the given TwiML URL. Here, we need to have a URL that returns TwiML Voice instructions to make the call. If you don’t have such a URL that returns TwiML, you can write a simple Ballerina HTTP service to return it as follows and run it. The instruction returned by this service will contain a bell sound, which is looped ten times. Go import ballerina/http; service /twilio on new http:Listener(9090) { resource function post voice() returns xml { xml response = xml `<?xml version="1.0" encoding="UTF-8"?> <Response> <Play loop="10">https://api.twilio.com/cowbell.mp3</Play> </Response>`; return response; } } If you are running the above service locally, you need to expose it to external so that it can be accessed by Twilio when making the call. You can use ngrok for that, which is a cross-platform application that enables developers to expose a local development server to the Internet with minimal effort. Expose the above service with the following ngrok command. ./ngrok http 9090 This will return a URL similar to the following: https://a624-2600-1700-1bd0-1390-9587-3a61-a470-879b.ngrok.io Then, you can append it with the service path and resource path in your above Ballerina service to be used with the Twilio voice call example. https://a624-2600-1700-1bd0-1390-9587-3a61-a470-879b.ngrok.io/twilio/voice Next, write your Ballerina code as follows. Use the above complete ngrok URL as the voiceTwimUrl configurable. Go import ballerina/log; import ballerinax/twilio; configurable string accountSId = ?; configurable string authToken = ?; configurable string fromNumber = ?; configurable string toNumber = ?; configurable string voiceTwimUrl = ?; //Create Twilio client final twilio:Client twilio = check new ({twilioAuth: {accountSId, authToken}); public function main() returns error? { //Make a voice call twilio:VoiceCallResponse voiceResponse = check twilio->makeVoiceCall(fromNumber, toNumber, { userInput: voiceTwimUrl, userInputType: twilio:TWIML_URL }); log:printInfo(string `Voice Call Response: ${voiceResponse.toString()}`); } When running the program, you will receive a phone call to the specified number with a Bell sound of 10 times. You can see the below logs in your Ballerina application. C++ time = 2023-08-29T17:03:13.804-05:00 level = INFO module = anupama/twilio_samples message = "Voice Call Response: {\"sid\":\"CA3d8f5cd381a4eaae1028728f00770f00\",\"status\":\"queued\",\"price\":\"\",\"priceUnit\":\"USD\"}" In conclusion, Twilio’s synergy with Ballerina through the ballerinax/twilio package presents a powerful tool for elevating your application’s communication prowess. The showcased sample code highlights its ease and adaptability, setting it apart from connectors in other languages. Hope you’ve enjoyed making calls and sending messages using this seamless integration.
Do you ever wonder about a solution that you know or you wrote is the best solution, and nothing can beat that in the years to come? Well, it’s not quite how it works in the ever-evolving IT industry, especially when it comes to big data processing. From the days of Apache Spark and the evolution of Cassandra 3 to 4, the landscape has witnessed rapid changes. However, a new player has entered the scene that promises to dominate the arena with its unprecedented performance and benchmark results. Enter ScyllaDB, a rising star that has redefined the standards of big data processing. The Evolution of Big Data Processing To appreciate the significance of ScyllaDB, it’s essential to delve into the origins of big data processing. The journey began with the need to handle vast amounts of data efficiently. Over time, various solutions emerged, each addressing specific challenges. From the pioneering days of Hadoop to the distributed architecture of Apache Cassandra, the industry witnessed a remarkable evolution. Yet, each solution presented its own set of trade-offs, highlighting the continuous quest for the perfect balance between performance, consistency, and scalability. You can check here at the official website for benchmarks and comparisons with Cassandra and Dynamo DB. Understanding the Big Data Processing and NoSQL Consistency Big data processing brought about a paradigm shift in data management, giving rise to NoSQL databases. One of the pivotal concepts in this realm is eventual consistency, a principle that allows for distributed systems to achieve availability and partition tolerance while sacrificing strict consistency. This is closely tied to the CAP theorem, which asserts that a distributed system can achieve only two out of three: Consistency, Availability, and Partition Tolerance. As organizations grapple with the complexities of this theorem, ScyllaDB has emerged as a formidable contender that aims to strike an optimal balance between these factors. You can learn more about the CAP Theorem in this video. Fine-Tuning Performance and Consistency With ScyllaDB ScyllaDB enters the arena with a promise to shatter the conventional limits of big data processing. It achieves this by focusing on two critical aspects: performance and consistency. Leveraging its unique architecture, ScyllaDB optimizes data distribution and replication to ensure minimal latency and high throughput. Moreover, it provides tunable consistency levels, enabling organizations to tailor their database behavior according to their specific needs. This flexibility empowers users to strike a harmonious equilibrium between data consistency and system availability, a feat that was often considered challenging in the world of big data. The Rust Advantage ScyllaDB provides support for various programming languages, and each has its strengths. However, one language that stands out is Rust. Rust’s focus on memory safety, zero-cost abstractions, and concurrency. Its robustness significantly reduces common programming errors, bolstering the overall stability of your application. When evaluating the choice of programming language for your project, it’s essential to consider the unique advantages that Rust brings to the table, alongside other supported languages like Java, Scala, Node.js, and more. Each language offers its own merits, allowing you to tailor your solution to your specific development needs. One Last Word About “ScyllaDB and Rust Combination” Scylla with Rust brings together the performance benefits of the Scylla NoSQL database with the power and capabilities of the Rust programming language. Just as Apache Spark and Scala or Cassandra and Java offer synergistic combinations, Scylla’s integration with Rust offers a similar pairing of a high-performance database with a programming language known for its memory safety, concurrency, and low-level system control. Rust’s safety guarantees make it a strong choice for building system-level software with fewer risks of memory-related errors. Combining Rust with Scylla allows developers to create efficient, safe, and reliable applications that can harness Scylla’s performance advantages for handling large-scale, high-throughput workloads. This pairing aligns well with the philosophy of leveraging specialized tools to optimize specific aspects of application development, akin to how Scala complements Apache Spark or Java complements Cassandra. Ultimately, the Scylla and Rust combination empowers developers to build resilient and high-performance applications for modern data-intensive environments. Demo Time Imagine handling a lot of information smoothly. I’ve set up a way to do this using three main parts. First, we keep making new users. Then, we watch this data using a Processor, which can change it if needed. Lastly, we collect useful insights from the data using Analyzers. This setup is similar to how popular pairs like Apache Spark and Scala or Cassandra and Java work together. We’re exploring how Scylla, a special database, and Rust, a clever programming language, can team up to make this process efficient and safe. Set Up Scylla DB YAML services: scylla: image: scylladb/scylla ports: - "9042:9042" environment: - SCYLLA_CLUSTER_NAME=scylladb-bigdata-demo - SCYLLA_DC=dc1 - SCYLLA_LISTEN_ADDRESS=0.0.0.0 - SCYLLA_RPC_ADDRESS=0.0.0.0 Create a Data Generator, Processor, and Analyzer Shell mkdir producer processor analyzer cargo new producer cargo new processor cargo new analyzer Producer Rust [tokio::main] async fn main() -> Result<(), Box<dyn Error>> { let uri = std::env::var("SCYLLA_CONTACT_POINTS") .unwrap_or_else(|_| "127.0.0.1:9042".to_string()); let session = SessionBuilder::new() .known_node(uri) .compression(Some(Compression::Snappy)) .build() .await?; // Create the keyspace if It doesn't exist session .query( "CREATE KEYSPACE IF NOT EXISTS ks WITH REPLICATION = \ {'class' : 'SimpleStrategy', 'replication_factor' : 1}", &[], ) .await?; // Use the keyspace session .query("USE ks", &[],) .await?; // toTimestamp(now()) // Create a Table if doesn't exist session .query("CREATE TABLE IF NOT EXISTS ks.big_data_demo_table (ID UUID PRIMARY KEY, NAME TEXT , created_at TIMESTAMP)", &[],) .await?; loop { let id = Uuid::new_v4(); let name = format!("User{}", id); let name_clone = name.clone(); session .query( "INSERT INTO ks.big_data_demo_table (id, name, created_at) VALUES (?, ?, toTimestamp(now()))", (id, name_clone), ) .await?; println!("Inserted: ID {}, Name {}", id, name); let delay = rand::thread_rng().gen_range(1000..5000); // Simulate data generation time sleep(Duration::from_millis(delay)).await; } } Processor Rust #[tokio::main] async fn main() -> Result<(), Box<dyn Error>> { let uri = std::env::var("SCYLLA_CONTACT_POINTS") .unwrap_or_else(|_| "127.0.0.1:9042".to_string()); let session = SessionBuilder::new() .known_node(uri) .compression(Some(Compression::Snappy)) .build() .await?; let mut last_processed_time = SystemTime::now(); loop { // Calculate the last processed timestamp let last_processed_str = last_processed_time .duration_since(SystemTime::UNIX_EPOCH) .expect("Time went backwards") .as_secs() as i64; // Convert to i64 let query = format!( "SELECT id, name FROM ks.big_data_demo_table WHERE created_at > {} ALLOW FILTERING", last_processed_str); // Query data if let Some(rows) = session .query(query, &[]) .await? .rows{ for row in rows{ println!("ID:"); if let Some(id_column) = row.columns.get(0) { if let Some(id) = id_column.as_ref().and_then(|col| col.as_uuid()) { println!("{}", id); } else { println!("(NULL)"); } } else { println!("Column not found"); } println!("Name:"); if let Some(name_column) = row.columns.get(1) { if let Some(name) = name_column.as_ref().and_then(|col| col.as_text()) { println!("{}", name); } else { println!("(NULL)"); } } else { println!("Column not found"); } // Update the last processed timestamp last_processed_time = SystemTime::now(); // Perform your data processing logic here } }; // Add a delay between iterations tokio::time::sleep(Duration::from_secs(10)).await; // Adjust the delay as needed } } Analyzer Rust #[tokio::main] async fn main() -> Result<(), Box<dyn Error>> { let uri = std::env::var("SCYLLA_CONTACT_POINTS") .unwrap_or_else(|_| "127.0.0.1:9042".to_string()); let session = SessionBuilder::new() .known_node(uri) .compression(Some(Compression::Snappy)) .build() .await?; let mut total_users = 0; let mut last_processed_time = SystemTime::now(); loop { // Calculate the last processed timestamp let last_processed_str = last_processed_time .duration_since(SystemTime::UNIX_EPOCH) .expect("Time went backwards") .as_secs() as i64; // Convert to i64 let query = format!( "SELECT id, name, created_at FROM ks.big_data_demo_table WHERE created_at > {} ALLOW FILTERING", last_processed_str); // Query data if let Some(rows) = session .query(query, &[]) .await? .rows{ for row in rows{ println!("ID:"); if let Some(id_column) = row.columns.get(0) { if let Some(id) = id_column.as_ref().and_then(|col| col.as_uuid()) { total_users += 1; if total_users > 0 { println!("Active Users {}, after adding recent user {}", total_users, id); } } else { println!("(NULL)"); } } else { println!("Column not found"); } println!("Name:"); if let Some(name_column) = row.columns.get(1) { if let Some(name) = name_column.as_ref().and_then(|col| col.as_text()) { println!("{}", name); } else { println!("(NULL)"); } } else { println!("Column not found"); } // Update the last processed timestamp last_processed_time = SystemTime::now(); // Perform your data processing logic here } }; // Add a delay between iterations tokio::time::sleep(Duration::from_secs(10)).await; // Adjust the delay as needed } } Now, Let's Run the Docker Compose Shell docker compose up -d Validate the POD state directly in VS Code with the Docker plugin. Let us attach the logs in VS Code. You should see the output below. Producer Processor Analyzer Summary A New Chapter in Big Data ProcessingIn the relentless pursuit of an ideal solution for big data processing, ScyllaDB emerges as a trailblazer that combines the lessons learned from past solutions with a forward-thinking approach. By reimagining the possibilities of performance, consistency, and language choice, ScyllaDB showcases how innovation can lead to a new era in the realm of big data. As technology continues to advance, ScyllaDB stands as a testament to the industry’s unwavering commitment to elevating the standards of data processing and setting the stage for a future where excellence is constantly redefined. That’s all for now, Happy Learning!
Over 100,000 organizations use Apache Kafka for data streaming. However, there is a problem: The broad ecosystem lacks a mature client framework and managed cloud service for Python data engineers. Quix Streams is a new technology on the market trying to close this gap. This blog post discusses this Python library, its place in the Kafka ecosystem, and when to use it instead of Apache Flink or other Python- or SQL-based substitutes. Why Python and Apache Kafka Together? Python is a high-level, general-purpose programming language. It has many use cases for scripting and development. But there is one fundamental purpose for its success: Data engineers and data scientists use Python. Period. Yes, there is R as another excellent programming language for statistical computing. And many low-code/no-code visual coding platforms for machine learning (ML). SQL usage is ubiquitous amongst data engineers and data scientists, but it’s a declarative formalism that isn’t expressive enough to specify all necessary business logic. When data transformation or non-trivial processing is required, data engineers and data scientists use Python. Hence, data engineers and data scientists use Python. If you don’t give them Python, you will find either shadow IT or Python scripts embedded into the coding box of a low-code tool. Apache Kafka is the de facto standard for data streaming. It combines real-time messaging, storage for true decoupling and replayability of historical data, data integration with connectors, and stream processing for data correlation. All in a single platform. At scale for transactions and analytics. Python and Apache Kafka for Data Engineering and Machine Learning In 2017, I wrote a blog post about “How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka.” The article is still accurate and explores how data streaming and AI/ML are complementary: Machine Learning requires a lot of infrastructure for data collection, data engineering, model training, model scoring, monitoring, and so on. Data streaming with the Kafka ecosystem enables these capabilities in real-time, reliable, and at scale. DevOps, microservices, and other modern deployment concepts merged the job roles of software developers and data engineers/data scientists. The focus is much more on data products solving a business problem, operated by the team that develops it. Therefore, the Python code needs to be production-ready and scalable. As mentioned above, the data engineering and ML tasks are usually realized with Python APIs and frameworks. Here is the problem: The Kafka ecosystem is built around Java and the JVM. Therefore, it lacks good Python support. Let’s explore the options and why Quix Streams is a brilliant opportunity for data engineering teams for machine learning and similar tasks. What Options Exist for Python and Kafka? Many alternatives exist for data engineers and data scientists to leverage Python with Kafka. Python Integration for Kafka Here are a few common alternatives for integrating Python with Kafka and their trade-offs: Python Kafka client libraries: Produce and consume via Python. This is solid but insufficient for advanced data engineering as it lacks processing primitives, such as filtering and joining operations found in Kafka Streams and other stream processing libraries. Kafka REST APIs: Confluent REST Proxy and similar components enable producing and consuming to/from Kafka. It works well for gluing interfaces together but is not ideal for ML workloads with low latency and critical SLAs. SQL: Stream processing engines like ksqlDB or FlinkSQL allow querying of data in SQL. KsqlDB and Flink are other systems that need to be operated. And SQL isn’t expressive enough for all use cases. Instead of just integrating Python and Kafka via APIs, native stream processing provides the best of both worlds: The simplicity and flexibility of dynamic Python code for rapid prototyping with Jupyter notebooks and serious data engineering AND stream processing for stateful data correlation at scale either for data ingestion and model scoring. Stream Processing With Python and Kafka In the past, we had two suboptimal open-source options for stream processing with Kafka and Python: Faust: A stream processing library, porting the ideas from Kafka Streams (a Java library and part of Apache Kafka) to Python. The feature set is much more limited compared to Kafka Streams. Robinhood open-sourced Faust. But it lacks maturity and community adoption. I saw several customers evaluating it but then moving to other options. Apache Flink’s Python API: Flink’s adoption grows significantly yearly in the stream processing community. This API is a Python version of DataStream API, which allows Python users to write Python DataStream API jobs. Developers can also use the Table API, including SQL, directly in there. It is an excellent option if you have a Flink cluster and some folks want to run Python instead of Java or SQL against it for data engineering. The Kafka-Flink integration is very mature and battle-tested. As you see, all the alternatives for combining Kafka and Python have trade-offs. They work for some use cases but are imperfect for most data engineering and data science projects. A new open-source framework to the rescue? Introducing a brand new stream processing library for Python: Quix Streams… What Is Quix Streams? Quix Streams is a stream processing library focused on ease of use for Python data engineers. The library is open-source under Apache 2.0 license and available on GitHub. Instead of a database, Quix Streams uses a data streaming platform such as Apache Kafka. You can process data with high performance and save resources without introducing a delay. Some of the Quix Streams differentiators are defined as being lightweight and powerful, with no JVM and no need for separate clusters of orchestrators. It sounds like the pitch for why to use Kafka Streams in the Java ecosystem minus the JVM — this is a positive comment! :-) Quix Streams does not use any domain-specific language or embedded framework. It’s a library that you can use in your code base. This means that with Quix Streams, you can use any external library for your chosen language. For example, data engineers can leverage Pandas, NumPy, PyTorch, TensorFlow, Transformers, and OpenCV in Python. So far, so good. This was more or less the copy and paste of Quix Streams marketing (it makes sense to me)… Now, let’s dig deeper into the technology. The Quix Streams API and Developer Experience The following is the first feedback after playing around, doing code analysis, and speaking with some Confluent colleagues and the Quix Streams team. The Good The Quix API and tooling persona is the data engineer (that’s at least my understanding). Hence, it does not directly compete with other offerings, say a Java developer using Kafka Streams. Again, the beauty of microservices and data mesh is the focus of an application or data product per use case. Choose the right tool for the job! The API is mostly sleek, with some weirdness / unintuitive parts. But it is still in beta, so hopefully, it will get more refined in the subsequent releases. No worries at this early stage of the project. The integration with other data engineering and machine learning Python frameworks is excellent. If you can combine stream processing with Pandas, NumPy, and similar libraries is a massive benefit for the developer experience. The Quix library and SaaS platform are compatible with open-source Kafka and commercial offerings and cloud services like Cloudera, Confluent Cloud, or Amazon MSK. Quix’s commercial UI provides out-of-the-box integration with self-managed Kafka and Confluent Cloud. The cloud platform also provides a managed Kafka for testing purposes (for a few cents per Kafka topic and not meant for production). The Improvable The stream processing capabilities (like powerful sliding windows) are still pretty limited and not comparable to advanced engines like Kafka Streams or Apache Flink. The roadmap includes enhanced features. The architecture is complex since executing the Python API jumps through three languages: Python -> C# -> C++. Does it matter to the end user? It depends on the use case, security requirements, and more. The reasoning for this architecture is Quix’s background coming from the McLaren F1 team and ultra-low latency use cases and building a polyglot platform for different programming environments. It would be interesting to see a benchmark for throughput and latency versus Faust, which is Python top to bottom. There is a trade-off between inter-language marshaling/unmarshalling versus the performance boost of lower-level compiled languages. This should be fine if we trust Quix’s marketing and business model. I expect they will provide some public content soon, as this question will arise regularly. The Quix Streams Data Pipeline Low Code GUI The commercial product provides a user interface for building data pipelines and code, MLOps, and a production infrastructure for operating and monitoring the built applications. Here is an example: Tiles are K8’s containers. Each purple (transformation) and orange (destination) node is backed by a Git project containing the application code. The three blue (source) nodes on the left are replay services used to test the pipeline by replaying specific streams of data. Arrows are individual Kafka topics in Confluent Cloud (green = live data). The first visible pipeline node (bottom left) is joining data from different physical sites (see the three input topics; one was receiving data when I took the image). There are three modular transformations in the visible pipeline (two rolling windows and one interpolation). There are two real-time apps (one real-time Streamlit dashboard and the other is an integration with a Twilio SMS service). Quix Streams vs. Apache Flink for Stream Processing With Python The Quix team wrote a detailed comparison of Apache Flink and Quix Streams. I don’t think it’s an entirely fair comparison as it compares open-source Apache Flink to a Quix SaaS offering. Nevertheless, for the most part, it is a good comparison. Flink was always Java-first and has added support for Python for its DataStream and Table APIs at a later stage. On the contrary, Quix Streams is brand new. Hence, it lacks maturity and customer case studies. Having said all this, I think Quix Streams is a great choice for some stream processing projects in the Python ecosystem! Should You Use Quix Streams or Apache Flink? TL;DR: There is a place for both! Choose the right tool… Modern enterprise architectures built with concepts like data mesh, microservices, and domain-driven design allow this flexibility per use case and problem. I recommend using Flink if the use case makes sense with SQL or Java. And if the team is willing to operate its own Flink cluster or has a platform team or a cloud service taking over the operational burden and complexity. On the contrary, I would use Quix Streams for Python projects if I want to go to production with a more microservice-like architecture building Python applications. However, beware that Quix currently only has a few built-in stateful functions or JOINs. More advanced stream processing use cases cannot be done with Quix (yet). This is likely to change in the next months by adding more capabilities. Hence, make sure to read Quix’s comparison with Flink. But keep in mind if you want to evaluate the open-source Quix Streams library or the Quix SaaS platform. If you are in the public cloud, you might combine Quick Streams SaaS with other fully managed cloud services like Confluent Cloud for Kafka. On the other side, in your own private VPC or on-premise, you need to build your own platform with technologies like the Quix Streams library, Kafka or Confluent Platform, and so on. The Current State and Future of Quix Streams If you build a new framework or product for data streaming, you need to make sure that it does not overlap with existing established offerings. You need differentiators and/or innovation in a new domain that does not exist today. Quix Streams accomplishes this essential requirement to be successful: The target audience is data engineers with Python backgrounds. No severe and mature tool or vendor exists in this space today. And the demand for Python will grow more and more with the focus on leveraging data for solving business problems in every company. Maturity: Making the Right (Marketing) Choices in the Early Stage Quix Streams is in the early maturity stage. Hence, a lot of decisions can still be strengthened or revamped. The following buzzwords come into my mind when I think about Quix Streams: Python, data streaming, stream processing, Python, data engineering, Machine Learning, open source, cloud, Python, .NET, C#, Apache Kafka, Apache Flink, Confluent, MSK, DevOps, Python, governance, UI, time series, IoT, Python, and a few more. TL;DR: I see a massive opportunity for Quix Streams to become a great data engineering framework (and SaaS offering) for Python users. I am not a fan of polyglot platforms. It requires finding the lowest common denominator. I was never a fan of Apache Beam for that reason. The Kafka Streams community did not choose to implement the Beam API because of too many limitations. Similarly, most people do not care about the underlying technology. Yes, Quix Streams’ core is C++. But is the goal to roll out stream processing for various programming languages, only starting with Python, then going to .NET, and then to another one? I am skeptical. Hence, I would like to see a change in the marketing strategy already: Quix Streams started with the pitch of being designed for high-frequency telemetry services when you must process high volumes of time-series data with up to nanosecond precision. It is now being revamped to focus mainly on Python and data engineering. Competition: Friends or Enemies? Getting market adoption is still hard. Intuitive use of the product, building a broad community, and the right integrations and partnerships (can) make a new product such as Quix Streams successful. Quix Streams is on a good way here. For instance, integrating serverless Confluent Cloud and other Kafka deployments works well: This is a native integration, not a connector. Everything in the pipeline image runs as a direct Kafka protocol connection using raw TCP/IP packets to produce and consume data to topics in Confluent Cloud. Quix platform is orchestrating the management of the Confluent Cloud Kafka Cluster (create/delete topics, topic sizing, topic monitoring, etc.) using Confluent APIs. However, one challenge of these kinds of startups is the decision to complement versus compete with existing solutions, cloud services, and vendors. For instance, how much time and money do you invest in data governance? Should you build this or use the complementing streaming platform or a separate independent tool (like Collibra)? We will see where Quix Streams will go here —building its cloud platform for addressing Python engineers or overlapping with other streaming platforms. My advice is the proper integration with partners that lead in their space. Working with Confluent for over six years, I know what I am talking about: We do one thing: data streaming, but we are the best in that one. We don’t even try to compete with other categories. Yes, a few overlaps always exist, but instead of competing, we strategically partner and integrate with other vendors like Snowflake (data warehouse), MongoDB (transactional database), HiveMQ (IoT with MQTT), Collibra (enterprise-wide data governance), and many more. Additionally, we extend our offering with more data streaming capabilities, i.e., improving our core functionality and business model. The latest example is our integration of Apache Flink into the fully managed cloud offering. Kafka for Python? Look At Quix Streams! In the end, a data engineer or developer has several options for stream processing deeply integrated into the Kafka ecosystem: Kafka Streams: Java client library ksqlDB: SQL service Apache Flink: Java, SQL, Python service Faust: Python client library Quix Streams: Python client library All have their pros and cons. The persona of the data engineer or developer is a crucial criterion. Quix Streams is a nice new open-source framework for the broader data streaming community. If you cannot or do not want to use just SQL but native Python, then watch the project (and the company/cloud service behind it). bytewax is another open-source stream processing library for Python integrating with Kafka. It is implemented in Rust under the hood. I never saw it in the field yet. But a few comments mentioned it after I shared this blog post on social networks. I think it is worth a mention. Let’s see if it gets more traction in the following months.
What Is the ESP32? The ESP32 is an incredible microcontroller developed by Espressif Systems. Based on its predecessor's legacy, the ESP8266, the ESP32 boasts dual-core processing capabilities, integrated Wi-Fi, and Bluetooth functionalities. Its rich features and cost-effectiveness make it a go-to choice for creating Internet of Things (IoT) projects, home automation devices, wearables, and more. What Is Xedge32? Xedge32, built upon the Barracuda App Server C Code Library, offers a comprehensive range of IoT protocols known as the "north bridge." Xedge32 extends the Barracuda App Server's Lua APIs and interfaces seamlessly with the ESP32's GPIOs, termed the "south bridge." While not everyone is an embedded C or C++ expert, with Xedge32, programming embedded systems becomes accessible to all. The beauty of Xedge32 lies in its simplicity: you don't need any C code experience. All you need is Lua, which is refreshingly straightforward to pick up. It's a friendly environment that invites everyone, whether you're a seasoned developer or just someone curious about microcontroller-based IoT projects. With Lua's friendly nature, diving into microcontroller programming becomes super easy. The Xedge IDE is built on the foundation of the Visual Studio Code editor. Installing the Xedge32 Lua IDE To get started with Xedge32, an ESP32 development board is your starting point. However, Xedge32 has specific requirements regarding the type of ESP32 you can employ. If you're new to the world of ESP32, it's recommended to opt for the newer ESP32-S3. Ensure it comes with 8MB RAM, although most ESP32-S3 models feature this. If your project scope involves camera integrations, consider the ESP32-S3 CAM board. The CAM board will allow you to explore exciting functionalities like streaming images to a browser via WebSockets or using the MQTT CAM example, which publishes images to an MQTT broker. While Amazon offers the ESP32-S3, marketplaces like AliExpress often present more budget-friendly options. Once you have your ESP32, proceed to the Xedge32 installation page for step-by-step firmware installation instructions. The following video shows how to install Xedge32 using Windows. Your First Xedge32 Program: Blinking an LED Many ESP32 development boards come equipped with a built-in LED. This LED can be great for simple tests, allowing you to verify if your code is running correctly quickly. However, if you wish to dive deeper into understanding the wiring and interfacing capabilities of the ESP32, we recommend getting your hands on a breadboard. The figure below provides a visual guide. A breadboard is a handy tool that lets you prototype circuits without soldering, making it perfect for beginners and experiments. Using a breadboard, you can connect multiple components, test out different configurations, and quickly see the results of your efforts. An external LED can be a great starting component for your initial projects. LEDs are simple to use, and they offer immediate visual feedback. While many DIY electronics kits come with an assortment of LEDs, you can also purchase them separately. Places like Amazon offer various LED packs suitable for breadboard projects. Remember, when working with an LED, ensure you have the appropriate resistor to prevent too much current from flowing through the LED, which could damage it. As you become more comfortable with the ESP32 and the breadboard, you can expand your component collection and experiment with more complex circuits. The Lua Blink LED Script The following Lua script shows how to create a blinking LED pattern using Lua. Here's a step-by-step breakdown: Local blink Function:The script defines a local function named blink. This function is designated to handle the LED's blinking behavior. Utilizing Coroutines for Timing:Within the blink function, an infinite loop (while true do) uses the Lua Coroutines concept for its timing mechanism. Specifically, coroutine.yield(true) is employed to make the function "sleep" for a specified duration. In this context, it pauses the loop between LED state changes for one second. LED State Control:The loop inside the blink function manages the LED's state. It first turns the LED on with pin:value(true), sleeps for a second, turns it off with pin:value(false), and then sleeps for another second. This on-off cycle continues indefinitely, creating the blink effect. GPIO Port 21 Initialization:Before the blinking starts, the GPIO port 21 is set up as an output using esp32.gpio(21,"OUT") and is referenced by the pin variable. You must modify this number if your LED is connected to a different GPIO port. If you're unfamiliar with how GPIO works, check out the GPIO tutorial to understand this concept better. Finally, the two last code lines outside the function initialize the blinking pattern, setting the timer to trigger the blink function every 1,000 milliseconds (every second). Lua local function blink() local pin = esp32.gpio(21,"OUT") while true do trace"blink" pin:value(true) coroutine.yield(true) -- Sleep for one timer tick pin:value(false) coroutine.yield(true) -- Sleep end end timer=ba.timer(blink) -- Create timer timer:set(1000) -- Timer tick = one second How To Create an Xedge Blink LED App When the Xedge32-powered ESP is running, use a browser, navigate to the ESP32's IP address, and click the Xedge IDE link to start the Xedge IDE. Create a new Xedge app as follows: Right-click Disk and click New Folder on the context menu. Enter blink as the new folder name and click Enter. Expand Disk, right-click the blink directory, and click New App in the context menu. Click the Running button and click Save. Expand the blink app now visible in the left pane tree view. The blink app should be green, indicating the app is running. Right-click the blink app and click New File on the context menu. Type blinkled.xlua and click Enter. Click the new blinkled.xlua file to open the file In Xedge. Select all and delete the template code. Copy the above blink LED Lua code and paste the content into the Xedge editor. Click the Save & Run button to save and start the blink LED example. See the Xedge IDE Video for more information on creating Xedge applications. If everything is correct, the LED should start blinking. References Online Interactive Lua Tutorials Xedge32 introduction How to upload the Xedge32 firmware to your ESP32 CAM board Lua timer API Lua GPIO API What's Next? If you're eager to explore further, there are numerous Xedge32 examples available on GitHub to kickstart your learning. However, it's worth noting that Xedge32 is still a budding tool undergoing active development. As a result, while examples are available on GitHub, comprehensive tutorials accompanying them might be sparse at the moment.
Recap: Server-Side Web Pages With Kotlin In the first article, server-side web pages with Kotlin part 1, a brief history of web development was outlined: namely, the four main stages being static HTML page delivery; server-side programmatic creation of web pages; HTML templating engines, again server-side; and finally, client-side programmatic creation of web pages. While contemporary web development is mostly focused on the last of the four stages (i.e., creating web pages on the client side), there still exist good cases for rendering web pages on the server side of the web application; furthermore, new technologies like kotlinx.html – a library by the authors of Kotlin for generating HTML code via a domain-specific language (DSL) – provide additional options for server-side web development. To give an example, the following two approaches produce the same homepage for the Spring Boot-powered website of a hypothetical bookstore: Templating Engine (Thymeleaf) The basic workflow for rendering a webpage with a template engine like Thymeleaf is to create an HTML template page in the resources/templates folder of the project, in this case home.html: HTML <!DOCTYPE html> <html xmlns:th="http://www.thymeleaf.org" lang="en"> <head> <title>Bookstore - Home</title> <link th:href="@{/css/bootstrap.min.css}" rel="stylesheet"> <script th:src="@{/js/bootstrap.min.js}"></script> </head> <body> <main class="flex-shrink-0"> <div th:insert="~{fragments/general.html :: header(pageName = 'Home')}"></div> <div> <h2>Welcome to the Test Bookstore</h2> <h3>Our Pages:</h3> <ul> <li><a th:href="@{/authors}">Authors</a></li> <li><a th:href="@{/books}">Books</a></li> </ul> </div> </main> <div th:insert="~{fragments/general.html :: footer}"></div> </body> </html> Next, a web controller endpoint function needs to be created and return a string value that corresponds to the name of the template file in the resources folder (only without the .html file extension): Kotlin @Controller class HomeController { @GetMapping(value= ["/", "/home"]) fun home() = "home" } kotlinx.html While there are again two basic steps for rendering a web page using the kotlinx.html library, the difference here is that the HTML generation code can be placed directly within the class structure of the web application. First, create a class that generates the HTML code in string form: Kotlin @Service class HomepageRenderer { fun render(): String { return writePage { head { title("Bookstore - Home") link(href = "/css/bootstrap.min.css", rel = "stylesheet") script(src = "/js/bootstrap.min.js") {} } body { main(classes = "flex-shrink-0") { header("Home") div { h2 { +"Welcome to the Test Bookstore" } h3 { +"Our Pages:" } ul { li { a(href = "/authors") { +"Authors" } } li { a(href = "/books") { +"Books" } } } } } footer() } } } } Next, the web controller will return the string generated by the renderer class as the response body of the relevant web endpoint function: Kotlin @Controller class HomeController(private val homepageRenderer: HomepageRenderer) { @GetMapping(value = ["/", "/home"], produces = [MediaType.TEXT_HTML_VALUE]) @ResponseBody fun home() = homepageRenderer.render() } Comparison As mentioned in the previous article, this approach brings some appreciable benefits compared to the traditional approach of templating engines, such as having all relevant code in one location within the project and leveraging Kotlin’s typing system as well as its other features. However, there are also downsides to this approach, one of which is that the very embedding of the code within the (compiled) class structure means that no “hot reloading” is possible: any changes to the HTML-rendering code will require the server to restart compared to simply being able to refresh the target webpage when using a templating engine like Thymeleaf. As this concluding article will examine, this issue has a potential solution in the form of the Kotlin Scripting library. Kotlin Scripting: An Introduction As the name suggests, Kotlin Scripting is a library that allows a developer to write Kotlin code and have it execute not after having been compiled for the targeted platform, but rather as a script that has been read in and interpreted by the executing host. The potential upsides of this are obvious: leveraging the power of Kotlin’s typing system and other features while being able to execute the Kotlin code without having to re-compile the files after any refactorings. While the functionality is officially still in an “experimental” state, it already has a very prominent early adopter: Gradle’s Kotlin DSL, with which one can write Gradle build scripts in place of using the traditional Groovy-based script files. Moreover, a third-party library is available for enhancing the experience of developing and executing Kotlin script files, especially for command-line applications. Adaptation Steps For this exercise – which will again be based on the hypothetical bookstore website as used in the first article – it will be sufficient to apply a relatively simple “scripting host” that will load Kotlin scripts and execute them within a Spring Boot web application. Note that a tutorial for how to set up a basic script-executing application is available on Kotlin’s website – and it is upon this tutorial that the code setup in this article is based – but it contains some parts that are unnecessary (e.g., dynamically loading dependencies from Maven) as well as parts that are missing (e.g., passing arguments to a script), so a further explanation in the steps below will be provided. Step 1: Dependencies First, the following dependencies will need to be added to the project: org.jetbrains.kotlin:kotlin-scripting-common org.jetbrains.kotlin:kotlin-scripting-jvm org.jetbrains.kotlin:kotlin-scripting-jvm-host Note that these dependencies all follow the same naming conventions as the “mainstream” Kotlin dependencies – and are bundled in the same version releases as well – so one can use the kotlin() dependency-handler helper function in Gradle (e.g., implementation(kotlin(“scripting-common”))) as well as omit the package version if one uses the Kotlin Gradle plugin. Step 2: Script Compilation Configuration Next, an object needs to be defined that contains the configuration for how the Kotlin scripts will be loaded: Kotlin object HtmlScriptCompilationConfiguration : ScriptCompilationConfiguration( { jvm { jvmTarget("19") dependenciesFromCurrentContext("main", "kotlinx-html-jvm-0.8.0") } } ) As seen above, the object needs to contain two configuration declarations: The version of the JVM that will be used to compile the Kotlin scripts – this needs to match the version of the JVM that executes the script host (i.e., the Spring Boot web application). Any dependencies that are to be passed into the context that loads and executes the Kotlin scripts. “main” is obligatory for importing the core Kotlin libraries; “kotlinx-html-jvm-0.8.0” is for the kotlinx.html code that was introduced in the previous article. Step 3: Script Placeholder Definition With the script compilation configuration object defined, we can now define the abstract class that will serve as a placeholder for the scripts to be loaded and executed: Kotlin @KotlinScript( fileExtension = "html.kts", compilationConfiguration = HtmlScriptCompilationConfiguration::class ) abstract class HtmlScript(@Suppress("unused") val model: Map<String, Any?>) As the code demonstrates, it is necessary to pass in the file extension that will identify the script files that will use the previously defined compilation configuration. Furthermore, the abstract class’s constructor serves as the entry point for any variables that need to be passed into the script during execution; in this case, the parameter model has been defined to serve in a similar manner to how the similarly-named model object works for Thymeleaf’s HTML template files. Step 4: Script Executor After defining the script placeholder, it is now possible to define the code that will load and execute the scripts: Kotlin @Service class ScriptExecutor { private val logger = LoggerFactory.getLogger(ScriptExecutor::class.java) private val compilationConfiguration = createJvmCompilationConfigurationFromTemplate<HtmlScript>() private val scriptingHost = BasicJvmScriptingHost() fun executeScript(scriptName: String, arguments: Map<String, Any?> = emptyMap()): String { val file = File(Thread.currentThread().contextClassLoader.getResource(scriptName)!!.toURI()) val evaluationConfiguration = ScriptEvaluationConfiguration { constructorArgs(arguments) } val response = scriptingHost.eval(file.toScriptSource(), compilationConfiguration, evaluationConfiguration) response.reports.asSequence() .filter { it.severity == ScriptDiagnostic.Severity.ERROR } .forEach { logger.error("An error occurred while rendering {}: {}", scriptName, it.message) } return (response.valueOrNull()?.returnValue as? ResultValue.Value)?.value as? String ?: "" } } A couple of things of note to mention here: Evaluating a script in Kotlin requires two configurations: one for compilation, and one for evaluation. The former was defined in a previous step, whereas the latter needs to be generated for every script execution, as it is here that any arguments get passed into the script via the constructorArgs() function call (in our case, onto the model parameter defined in the previous step). The script execution host will not throw any executions encountered when executing a script (e.g., syntax errors). Instead, it will aggregate all “reports” gathered during the script evaluation and return them in a parameter named as such in the response object. Thus, it is necessary to create a reporting mechanism (in this case, the logger object) after the fact to inform the developer and/or user if an exception has occurred. There is no type definition for what a “successful” script execution should return; as such, the return value needs to be cast to the appropriate type before being returned out of the function. Step 5: Script Definition Now, the actual HTML-rendering script can be defined. Following on the examples from the previous article, we will be creating the script that renders the “view all authors” page of the website: Kotlin val authors = model[AUTHORS] as List<Author> writePage { head { title("Bookstore - View Authors") link(href = "/css/bootstrap.min.css", rel = "stylesheet") script(src = "/js/bootstrap.min.js") {} script(src = "/js/util.js") {} } body { header("Authors") div { id = "content" h2 { +"Our Books' Authors" } ul { authors.forEach { author -> li { form(method = FormMethod.post, action = "/authors/${author.id}/delete") { style = "margin-block-end: 1em;" onSubmit = "return confirmDelete('author', \"${author.name}\")" a(href = "/authors/${author.id}") { +author.name style = "margin-right: 0.25em;" } button(type = ButtonType.submit, classes = "btn btn-danger") { +"Delete" } } } } } a(classes = "btn btn-primary", href = "/authors/add") { +"Add Author" } } footer() } } The items to note: IDE support for Kotlin scripting is limited, and it will currently (e.g., with IDEA 2022.3.2) not recognize arguments passed into the script like the model object. As a consequence, the IDE will probably mark the file as erroneous, even though this is not actually the case. It is recommended to simulate the package structure within which the scripts should be placed. In this case, the above file is located in resources/com/severett/thymeleafcomparison/kotlinscripting/scripting and thus is marked as residing in the package com.severett.thymeleafcomparison.kotlinscripting.scripting. This allows it to access the functions in the common.kt file like the header() and footer() functions that generate the boilerplate header and footer sections of the HTML code, respectively. Step 6: Web Controller The final step is to create the web controller that dictates which scripts should be executed to generate the HTML code for the web requests. The end result is similar to the approach for kotlinx.html – i.e., returning a response body of text/html – with the difference being what mechanism is actually called to generate the response body, in this case, the script executor defined above: Kotlin @Controller @RequestMapping("/authors") class AuthorController(private val authorService: AuthorService, private val scriptExecutor: ScriptExecutor) { @GetMapping(produces = [TEXT_HTML]) @ResponseBody fun getAll(): String { return scriptExecutor.executeScript( "$SCRIPT_LOCATION/viewauthors.html.kts", mapOf(AUTHORS to authorService.getAll()) ) } @GetMapping(value = ["/{id}"], produces = [TEXT_HTML]) @ResponseBody fun get(@PathVariable id: Int): String { return scriptExecutor.executeScript( "$SCRIPT_LOCATION/viewauthor.html.kts", mapOf(AUTHOR to authorService.get(id)) ) } @GetMapping(value = ["/add"], produces = [TEXT_HTML]) @ResponseBody fun add() = scriptExecutor.executeScript("$SCRIPT_LOCATION/addauthor.html.kts") @PostMapping(value = ["/save"], produces = [TEXT_HTML]) @ResponseBody fun save( @Valid authorForm: AuthorForm, bindingResult: BindingResult, httpServletResponse: HttpServletResponse ): String { return if (!bindingResult.hasErrors()) { authorService.save(authorForm) httpServletResponse.sendRedirect("/authors") "" } else { scriptExecutor.executeScript( "$SCRIPT_LOCATION/addauthor.html.kts", mapOf(ERRORS to bindingResult.allErrors.toFieldErrorsMap()) ) } } @PostMapping(value = ["/{id}/delete"]) fun delete(@PathVariable id: Int, httpServletResponse: HttpServletResponse) { authorService.delete(id) httpServletResponse.sendRedirect("/authors") } } Note that SCRIPT_LOCATION is “com/severett/thymeleafcomparison/kotlinscripting/scripting” and is used as a common prefix for all script paths in the application. Analysis As a result, we now have a web application that combines the language features of Kotlin and the ability for quick reloading of web page-generating code as is available for templating engines like Thymeleaf. Mission accomplished, right? Unfortunately, no. Configuring the scripting host requires a lot of dependencies, far more than either the Thymeleaf or kotlinx.html applications. Running the bootJar Gradle task for the Kotlin Scripting application produces a JAR file that is 87.24 megabytes in size – quite larger than either the Thymeleaf (26.94 megabytes) or kotlinx.html (26.26 megabytes) applications. Moreover, this approach re-introduces one of the drawbacks of using a templating engine compared to kotlinx.html: the website code has gone back to being split between two different locations, and having to track between the two will increase the cognitive load of the developer, especially given the incomplete support that Kotlin Scripting enjoys in IDEs compared to more mature technologies like Thymeleaf. Finally, the execution time is quite bad compared to the two approaches from the previous article: As in, over a second to load one relatively small webpage – up to 26 times slower than either the Thymeleaf- or kotlinx.html-based applications! Any potential gains of hot reloading are going to be eaten up almost immediately by this approach. Conclusion In the end, this is an interesting exercise in exploring the capabilities of Kotlin Scripting and how it could be integrated into a Spring Boot web application, but the current technical limitations and relative lack of documentation do not make it an attractive option for web development, at least in the Spring Boot ecosystem at this point. Still, knowledge of more tools available is always a good thing: even if this use case ultimately never proves to be viable for Kotlin Scripting, it’s possible that one may come across a different scenario in the future where it will indeed come in handy. Furthermore, the authors of Kotlin have a strong incentive to invest in improving the performance of the library, as any speed-ups in the code execution will translate to the Kotlin DSL for Gradle also becoming a more attractive tool for developers. As with most (relatively) new and experimental technology, time will tell whether Kotlin Scripting becomes a better option in the future; as for now.
Open-source Cloud Foundry Korifi is designed to provide developers with an efficient approach to delivering and managing cloud-native applications on Kubernetes with automated networking, security, availability, and more. With Korifi, the simplicity of the cf push command is now available on Kubernetes. In this tutorial, I will walk you through the installation of Korifi on kind using a locally deployed container registry. The installation process happens in two steps: Installation of prerequisites Installation of Korifi and dependencies Then, we will deploy two applications developed in two very different programming languages: Java and Python. This tutorial has been tested on Ubuntu Server 22.04.2 LTS. Let's dive in! Installing Prerequisites There are several prerequisites needed to install Korifi. There is a high chance that Kubernetes users will already have most of them installed. Here is the list of prerequisites: Cf8 cli Docker Go Helm Kbld Kind Kubectl Make To save time, I wrote a Bash script that installs the correct version of prerequisites for you. You can download it and run it by running the two commands below. Shell git clone https://github.com/sylvainkalache/korifi-prerequisites-installation cd korifi-prerequisites-installation && ./install-korifi-prerequisites.sh Installing Korifi The Korifi development team maintains an installation script to install Korifi on a kind cluster. It installs the required dependencies and a local container registry. This method is especially recommended if you are trying Korifi for the first time. Shell git clone https://github.com/cloudfoundry/korifi cd korifi/scripts && ./deploy-on-kind.sh korifi-cluster The install script does the following: Creates a kind cluster with the correct port mappings for Korifi Deploys a local Docker registry using the twuni helm chart Creates an admin user for Cloud Foundry Installs cert-manager to create and manage internal certificates within the cluster Installs kpack, which is used to build runnable applications from source code using Cloud Native Buildpacks Installs contour, which is the ingress controller for Korifi Installs the service binding runtime, which is an implementation of the service binding spec Installs the metrics server Installs Korifi Similar to installing prerequisites, you can always do this manually by following the installation instructions. Setting up Your Korifi Instance Before deploying our application to Kubernetes, we must sign into our Cloud Foundry instance. This will set up a tenant, known as a target, to which our apps can be deployed. Authenticate with the Cloud FoundryAPI: Shell cf api https://localhost --skip-ssl-validation cf auth cf-admin Create an Org and a Space. Shell cf create-org tutorial-org cf create-space -o tutorial-org tutorial-space Target the Org and Space you created. Shell cf target -o tutorial-org -s tutorial-space Everything is ready; let’s deploy two applications to Kubernetes. Single-Command Deployment to Kubernetes Deploying a Java Application For the sake of the tutorial, I am using a sample Java app, but you feel free to try it out on your own. Shell git clone https://github.com/sylvainkalache/sample-web-apps cd sample-web-apps/java Once you are inside your application repository, run the following command. Note that the first run of this command will take a while as it needs to install language dependencies from the requirements.txt and create a runnable container image. But all subsequent updates will be much faster: Shell cf push my-java-app That’s it! The application has been deployed to Kubernetes. To check the application status, you can simply use the following command: Shell cf app my-java-app Which will return an output similar to this: Showing health and status for app my-java-app in org tutorial-org / space tutorial-space as cf-admin... Shell Showing health and status for app my-java-app in org tutorial-org / space tutorial-space as cf-admin... name: my-java-app requested state: started routes: my-java-app.apps-127-0-0-1.nip.io last uploaded: Tue 25 Jul 19:14:34 UTC 2023 stack: io.buildpacks.stacks.jammy buildpacks: type: web sidecars: instances: 1/1 memory usage: 1024M state since cpu memory disk logging details #0 running 2023-07-25T20:46:32Z 0.1% 16.1M of 1G 0 of 1G 0/s of 0/s type: executable-jar sidecars: instances: 0/0 memory usage: 1024M There are no running instances of this process. type: task sidecars: instances: 0/0 memory usage: 1024M There are no running instances of this process. Within this helpful information, we can see the app URL of our app and the fact that it is properly running. You can double-check that the application is properly responding using curl: Shell curl -I --insecure https://my-java-app.apps-127-0-0-1.nip.io/ And you should get an HTTP 200 back. Shell HTTP/2 200 date: Tue, 25 Jul 2023 20:47:07 GMT x-envoy-upstream-service-time: 134 vary: Accept-Encoding server: envoy Deploying a Python Application Next, we will deploy a simple Python Flask application. While we could deploy a Java application directly, there is an additional step for a Python one. Indeed, we need to provide a Buildpack that Korifi can use for Python applications – a more detailed explanation is available in the documentation. Korifi uses Buildpacks to transform your application source code into images that are eventually pushed to Kubernetes. The Paketo open-source project provides base production-ready Buildpacks for the most popular languages and frameworks. In this example, I will use the Python Paketo Buildpacks as the base Buildpacks. Let’s start by adding the Buildpacks source to our ClusterStore by running the following command: Shell kubectl edit clusterstore cf-default-buildpacks -n tutorial-space Then add the line - image: gcr.io/paketo-buildpacks/python, your file should look like this: Shell spec: sources: - image: gcr.io/paketo-buildpacks/java - image: gcr.io/paketo-buildpacks/nodejs - image: gcr.io/paketo-buildpacks/ruby - image: gcr.io/paketo-buildpacks/procfile - image: gcr.io/paketo-buildpacks/go - image: gcr.io/paketo-buildpacks/python Then we need to specify when to use these Buildbacks by editing our ClusterBuilder. Execute the following command: Shell kubectl edit clusterbuilder cf-kpack-cluster-builder -n tutorial-space Add the line - id: paketo-buildpacks/python at the top of the spec order list. your file should look like this: Shell spec: order: - group: - id: paketo-buildpacks/python - group: - id: paketo-buildpacks/java - group: - id: paketo-buildpacks/go - group: - id: paketo-buildpacks/nodejs - group: - id: paketo-buildpacks/ruby - group: - id: paketo-buildpacks/procfile That’s it! Now you can either bring your own Python app or use this sample one by running the following commands: Shell git clone https://github.com/sylvainkalache/sample-web-apps cd sample-web-apps/python And deploy it: Shell cf push my-python-app Run curl to make sure the app is responding: Shell curl --insecure https://my-python-app.apps-127-0-0-1.nip.io/dzone Curl should return the following output: Shell Hello world! Python version: 3.10.12 Video Conclusion As you can see, Korifi makes deploying applications to Kubernetes very easy. While Java and Python are two very different stacks, the shipping experience remains the same. Korifi supports many other languages like Go, Ruby, PHP, Node.js, and more.
Electron has revolutionized cross-platform desktop application development by allowing developers to leverage their web technologies at hand. This approach has power-driven popular applications like Atom, VSCode, and Postman. However, Electron apps encounter several comments for their higher memory usage compared to lower-level languages such as C, C++, Rust, or Go. In this article, we will introduce Tauri, an emerging framework that addresses some of the limitations of Electron. Tauri enables developers to build binaries for major desktop platforms while offering improved performance and reduced memory footprint. The tutorial section of the article will guide you through the process of creating basic commands, adding a window menu, and building an application using Tauri. Let's dive in and explore this exciting new framework! What Are We Planning To Build? Tauri is a cutting-edge framework that enables developers to build desktop applications by combining any frontend framework with a powerful Rust core. The architecture of a Tauri app involves two essential components: Rust binary: This component is responsible for creating the application windows and providing access to native functionalities within those windows. Frontend: Developers have the freedom to choose their preferred frontend framework to design the user interface that resides within the application windows. Further in the blog, we will guide you through the process of setting up the frontend scaffold, configuring your Rust project, and demonstrating effective communication between these two components. So, stay tuned for an exciting journey into the world of Tauri! Prerequisites I installed prerequisites for my Mac using this official setup, which essentially guides for installing Clang, MaOS build dependencies, and Rust development environments. Create the Front End SvelteKit, a powerful Svelte frontend framework, is primarily built for Server-Side Rendering (SSR) capabilities. However, in order to integrate SvelteKit with Tauri, we will disable SSR and instead leverage the benefits of Static-Site Generation (SSG) using the @sveltejs/adapter-static. To facilitate the setup process, SvelteKit provides a convenient scaffolding utility, similar to create-tauri-app, that swiftly establishes a new project with various customization possibilities. In this guide, we presume that you have chosen TypeScript as your preferred tool for building cross-platform applications. The command for creating a Svelte project using npm: Rust npm create svelte@latest Project name: The project name refers to the name of your JavaScript project. It directly correlates to the name of the folder that will be created by this utility. However, it does not affect the working of your application in any other way. So, you can freely select any desired name of your project. App template: To create a minimalistic template, we will choose the Skeleton project that provides a basic and essential structure. If you're keen to explore SvelteKit even more and get a broader view of its capabilities, consider checking out their demo app. It provides a more comprehensive display of SvelteKit's abilities, helping you gain a better understanding of its features and functionalities. Type checking: You have the choice between enabling type checking through JSDoc for TypeScript in your project. In this guide, we will assume that you have opted for TypeScript as your preferred option. Code linting and formatting: You have the option to include ESLint for code linting and Prettier for code formatting in your project. Although there won't be further mentions about these options in this guide, we highly recommend enabling both for better code quality and consistent formatting. Browser testing: SvelteKit provides built-in support for browser testing through Playwright. However, since Tauri APIs are not compatible with Playwright, it is advisable not to include it in your project. Instead, you can refer to our WebDriver documentation for alternative options using Selenium or WebdriverIO, which can be utilized in place of Playwright for testing purposes. Svelte Kit in SSG Mode: First, we need to install @sveltejs/adapter-static: Rust npm install --save-dev @sveltejs/adapter-static@next In the file svelte.config.js, paste this code: Rust import adapter from '@sveltejs/adapter-static' // This was changed from adapter-auto import preprocess from 'svelte-preprocess' /** @type {import('@sveltejs/kit').Config} */ const config = { // Consult https://github.com/sveltejs/svelte-preprocess // for more information about preprocessors preprocess: preprocess(), kit: { adapter: adapter(), }, } export default config Lastly, we need to disable SSR and enable prerendering by adding a root +layout.ts file (or +layout.js if you are not using TypeScript) with these contents:Folder location: src/routes/+layout. Rust export const prerender = true export const ssr = false Create Rust Project The core of every Tauri application is a Rust binary that handles various functionalities such as window management, webview integration, and operating system interactions. These tasks are accomplished through the utilization of a Rust crate called "Tauri." The project, along with its dependencies, is managed by Cargo, the official package manager and versatile build tool for Rust. Cargo simplifies the process of managing dependencies and building the Tauri application. The Tauri CLI leverages Cargo internally, reducing the need for direct interaction with it in most cases. However, Cargo offers a wide range of additional features that are not exposed through our CLI. These features include testing, linting, and code formatting capabilities. For more detailed information and guidance on utilizing these features, I recommend referring to the official documentation of Cargo. It provides comprehensive documentation on the various functionalities and options available. For Installing Tauri to Your System: Rust cargo install tauri-cli To scaffold a minimal Rust project that is pre-configured to use Tauri: Run Webfront End and Desktop App: Open a terminal and run the following command in the src-tauri directory: Rust cargo tauri dev And run this command in the Svelte directory: Rust npm run dev Congratulations, you have created your first cross-platform application using Rust, Tauri, and Svelte.Here is the GitHub Link.
Web scraping has become an indispensable tool in today's data-driven world. Python, one of the most popular languages for scraping, has a vast ecosystem of powerful libraries and frameworks. In this article, we will explore the best Python libraries for web scraping, each offering unique features and functionalities to simplify the process of extracting data from websites. This article will also cover the best libraries and best practices to ensure efficient and responsible web scraping. From respecting website policies and handling rate limits to addressing common challenges, we will provide valuable insights to help you navigate the world of web scraping effectively. Scrape-It.Cloud Let's start with the Scrape-It.Cloud library, which provides access to an API for scraping data. This solution has several advantages. For instance, we do it through an intermediary instead of directly scraping data from the target website. This guarantees we won't get blocked when scraping large amounts of data, so we don't need proxies. We don't have to solve captchas because the API handles that. Additionally, we can scrape both static and dynamic pages. Features With Scrape-It.Cloud library, you can easily extract valuable data from any site with a simple API call. It solves the problems of proxy servers, headless browsers, and captcha-solving services. By specifying the right URL, Scrape-It.Cloud quickly returns JSON with the necessary data. This allows you to focus on extracting the right data without worrying about it being blocked. Moreover, this API allows you to extract data from dynamic pages created with React, AngularJS, Ajax, Vue.js, and other popular libraries. Also, if you need to collect data from Google SERPs, you can also use this API key for the serp api python library. Installing To install the library, run the following command: pip install scrapeit-cloud To use the library, you'll also need an API key. You can get it by registering on the website. Besides, you'll get some free credits to make requests and explore the library's features for free. Example of Use A detailed description of all the functions, features, and ways to use a particular library deserves a separate article. For now, we'll just show you how to get the HTML code of any web page, regardless of whether it's accessible to you, whether it requires a captcha solution, and whether the page content is static or dynamic. To do this, just specify your API key and the page URL. Python from scrapeit_cloud import ScrapeitCloudClient import json client = ScrapeitCloudClient(api_key="YOUR-API-KEY") response = client.scrape( params={ "url": "https://example.com/" } ) Since the results come in JSON format, and the content of the page is stored in the attribute ["scrapingResult"]["content"], we will use this to extract the desired data. Python data = json.loads(response.text) print(data["scrapingResult"]["content"]) As a result, the HTML code of the retrieved page will be displayed on the screen. Requests and BeautifulSoup Combination One of the simplest and most popular libraries is BeautifulSoup. However, keep in mind that it is a parsing library and does not have the ability to make requests on its own. Therefore, it is usually used with a simple request library like Requests, http.client, or cUrl. Features This library is designed for beginners and is quite easy to use. Additionally, it has well-documented instructions and an active community. The BeautifulSoup library (or BS4) is specifically designed for parsing, which gives it extensive capabilities. You can scrape web pages using both XPath and CSS selectors. Due to its simplicity and active community, numerous examples of its usage are available online. Moreover, if you encounter difficulties while using it, you can receive assistance to solve your problem. Installing As mentioned, we will need two libraries to use it. For handling requests, we will use the Requests library. The good news is that it comes pre-installed, so we don't need to install it separately. However, we do need to install the BeautifulSoup library to work with it. To do this, simply use the following command: Python pip install beautifulsoup4 Once it's installed, you can start using it right away. Example of Use Let's say we want to retrieve the content of the <h1> tag, which holds the header. To do this, we need first to import the necessary libraries and make a request to get the page's content: Python import requests from bs4 import BeautifulSoup data = requests.get('https://example.com') To process the page, we'll use the BS4 parser: Python soup = BeautifulSoup(data.text, "html.parser") Now, all we have to do is specify the exact data we want to extract from the page: Python text = soup.find_all('h1') Finally, let's display the obtained data on the screen: Python print(text) As we can see, using the library is quite simple. However, it does have its limitations. For instance, it cannot scrape dynamic data since it's a parsing library that works with a basic request library rather than headless browsers. LXML LXML is another popular library for parsing data, and it can't be used for scraping on its own. Since it also requires a library for making requests, we will use the familiar Requests library that we already know. Features Despite its similarity to the previous library, it does offer some additional features. For instance, it is more specialized in working with XML document structures than BS4. While it also supports HTML documents, this library would be a more suitable choice if you have a more complex XML structure. Installing As mentioned earlier, despite needing a request library, we only need to install the LXML library, as the other required components are already pre-installed. To install LXML, enter the following command in the command prompt: Python pip install lxml Now let's move on to an example of using this library. Example of Use To begin, just like last time, we need to use a library to fetch the HTML code of a webpage. This part of the code will be the same as the previous example: Python import requests from lxml import html data = requests.get('https://example.com') Now we need to pass the result to a parser so that it can process the document's structure: Python tree = html.fromstring(data.content) Finally, all that's left is to specify a CSS selector or XPath for the desired element and print the processed data on the screen. Let's use XPath as an example: Python data = tree.xpath('//h1') print(data) As a result, we will get the same heading as in the previous example: Python ['Example Domain'] However, although it may not be very noticeable in a simple example, the LXML library is more challenging for beginners than the previous one. It also has less well-documented resources and a less active community. Therefore, using LXML when dealing with complex XML structures that are difficult to process using other methods is recommended. Scrapy Unlike previous examples, Scrapy is not just a library but a full-fledged framework for web scraping. It doesn't require additional libraries and is a self-contained solution. However, for beginners, it may seem quite challenging. If this is your first web scraper, it's worth considering another library. Features Despite its shortcomings, this framework is an invaluable solution in certain situations. For example, when you want your project to be easily scalable. Or, if you need multiple scrapers within the same project with the same settings, you can run consistently with just one command and efficiently organize all the collected information into the right format. A single scraper created with Scrapy is called a spider and can either be the only one or one of many spiders in a project. The project has its own configuration file that applies to all scrapers within the project. In addition, each spider has its own settings, which will run independently of the settings of the whole project. Installing You can install this framework like any other Python library by entering the installation command in the command line. Python pip install scrapy Now let's move on to an example of using this framework. Example of Use Creating a project, just like a spider file, is done with a special command, unlike the library examples. It has to be entered at the command line. To begin, let's create a new project where we'll build our scraper. Use the following command: Python scrapy startproject test_project Instead of test_project you can enter any other project name. Now we can navigate to our project folder or create a new spider right here. Before we move on to creating a spider, let's look at our project tree's structure. The files mentioned here are automatically generated when creating a new project. Any settings specified in these files will apply to all spiders within the project. You can define common classes in the "items.py" file, specify what to do when the project is launched in the "pipelines.py" file, and configure general project settings in the "settings.py" file. Now let's go back to the command line and navigate to our project folder: Python cd test_project After that, we'll create a new spider while being in the folder of the desired project: Python scrapy genspider example example.com Next, you can open the spider file and manually edit it. The genspider command creates a framework that makes it easier to build your scraper. To retrieve the page's title, go to the spider file and find the following function: Python def parse(self, response): pass Replace pass with the code that performs the necessary functions. In our case, it involves extracting data from the h1 tag: Python def parse(self, response): item = DemoItem() item["text"] = response.xpath("//h1").extract() return items Afterward, you can configure the execution of the spiders within the project and obtain the desired data. Selenium Selenium is a highly convenient library that not only allows you to extract data and scrape simple web pages but also enables the use of headless browsers. This makes it suitable for scraping dynamic web pages. So, we can say that Selenium is one of the best libraries for web scraping in Python. Features The Selenium library was originally developed for software testing purposes, meaning it allows you to mimic the behavior of a real user effectively. This feature reduces the risk of blocking during web scraping. In addition, Selenium allows collecting data and performing necessary actions on web pages, such as authentication or filling out forms. This library uses a web driver that provides access to these functions. You can choose any supported web driver, but the Firefox and Chrome web drivers are the most popular. This article will use the Chrome web driver as an example. Installing Let's start by installing the library: Python pip install selenium Also, as mentioned earlier, we need a web driver to simulate the behavior of a real user. We just need to download it and put it in any folder to use it. We will specify the path to that folder in our code later on. You can download the web driver from the official website. Remember that it is important to use the version of the web driver that corresponds to the version of the browser you have installed. Example of Use To use the Selenium library, create an empty *.py file and import the necessary libraries: Python from selenium import webdriver from selenium.webdriver.common.by import By After that, let's specify the path to the web driver and define that we'll be using it: Python DRIVER_PATH = 'C:\chromedriver.exe' driver = webdriver.Chrome(executable_path=DRIVER_PATH) Here you can additionally specify some parameters, such as the operating mode. The browser can run in active mode, where you will see all your script's actions. Alternatively, you can choose a headless mode, in which the browser window is hidden and is not displayed to the user. The browser window is displayed by default so that we won't change anything. Now that we're done with the setup, let's move on to the landing page: Python driver.get("https://example.com/") At this point, the web driver will start, and your script will automatically go to the desired web page. Now we just have to specify what data we want to retrieve, display the retrieved data, and close the web driver: Python text = driver.find_elements(By.CSS_SELECTOR, "h1") print(text) driver.close() It's important not to forget to close the web driver at the end of the script's execution. Otherwise, it will remain open until the script finishes, which can significantly affect the performance of your PC. Pyppeteer The last library we will discuss in our article is Pyppeteer. It is the Python version of a popular library called Puppeteer, commonly used in NodeJS. Pyppeteer has a vibrant community and detailed documentation, but unfortunately, most of it is focused on NodeJS. So, if you decide to use this library, it's important to keep that in mind. Features As mentioned before, this library was originally developed for NodeJS. It also allows you to use a headless browser, which makes it useful for scraping dynamic web pages. Installing To install the library, go to the command line and enter the command: Python pip install pyppeteer Usually, this library is used together with the asyncio library, which improves script performance and execution speed. So, let's also install it: Python pip install asyncio Other than that, we won't need anything else. Example of Use Let's look at a simple example of using the Pyppeteer library. We'll create a new Python file and import the necessary libraries to do this. Python import asyncio from pyppeteer import launch Now let's do the same as in the previous example: navigate to a page, collect data, display it on the screen, and close the browser. Python async def main(): browser = await launch() page = await browser.newPage() await page.goto('https://example.com') text = await page.querySelectorAll("h1.text") print(await text.getProperty("textContent")) await browser.close() asyncio.get_event_loop().run_until_complete(main()) Since this library is similar to Puppeteer, beginners might find it somewhat challenging. Best Practices and Considerations To make web scraping more efficient, there are some rules to follow. Adhering to these rules helps make your scraper more effective and ethical and reduces the load on the services you gather information from. Avoiding Excessive Requests During web scraping, avoiding excessive requests is important to prevent being blocked and reduce the load on the target website. That's why gathering data from websites during their least busy hours, such as at night, is recommended. This can help decrease the risk of overwhelming the resource and causing it to malfunction. Dealing with Dynamic Content During the process of gathering dynamic data, there are two approaches. You can do the scraping yourself by using libraries that support headless browsers. Alternatively, you can use a web scraping API that will handle the task of collecting dynamic data for you. If you have good programming skills and a small project, it might be better for you to write your own scraper using libraries. However, a web scraping API would be preferable if you are a beginner or need to gather data from many pages. In such cases, besides collecting dynamic data, the API will also take care of proxies and solving captchas, for example scrape it cloud serp api. User-Agent Rotation It's also important to consider that your bot will stand out noticeably without using a User-Agent. Every browser has its own User-Agent when visiting a webpage, and you can view it in the developer console under the DevTools tab. It's advisable to change the User-Agent values randomly for each request. Proxy Usage and IP Rotation As we've discussed before, there is a risk of being blocked when it comes to scraping. To reduce this risk, it is advisable to use proxies that hide your real IP address. However, having just one proxy is not sufficient. It is preferable to have rotating proxies, although they come at a higher cost. Conclusion and Takeaways This article discussed the libraries used for web scraping and the following rules. To summarize, we created a table and compared all the libraries we covered. Here's a comparison table that highlights some key features of the Python libraries for web scraping: Library Parsing Capabilities Advanced Features JS Rendering Ease of Use Scrape-It.Cloud HTML, XML, JavaScript Automatic scraping and pagination Yes Easy Requests and BeautifulSoup Combo HTML, XML Simple integration No Easy Requests and LXML Combo HTML, XML XPath and CSS selector support No Moderate Scrapy HTML, XML Multiple spiders No Moderate Selenium HTML, XML, JavaScript Dynamic content handling Yes (using web drivers) Moderate Pyppeteer HTML, JavaScript Browser automation with headless Chrome or Chromium Yes Moderate Overall, Python is a highly useful programming language for data collection. With its wide range of tools and user-friendly nature, it is often used for data mining and analysis. Python enables tasks related to extracting information from websites and processing data to be easily accomplished.
What F-Strings Are and How to Use Them Effectively In Python, F-strings are a way to embed expressions inside string literals, using a simple and concise syntax. F-strings start with the letter f and are followed by a string literal that may contain expressions that are evaluated at runtime and replaced with their values. These expressions are enclosed in curly braces: {}. For example, the following code prints the value of the variable name inside a string: Python name = "Alice" print(f"Hello, {name}!") # Output Hello, Alice! Benefits of F-Strings F-strings offer several advantages over other ways of formatting strings in Python. First, they are very easy to read and write, as they allow for a concise and natural syntax that closely resembles the final output. This makes code more readable and easier to maintain. Second, F-strings are very flexible and dynamic, as they allow for the use of arbitrary expressions inside the curly braces. This means that complex expressions, such as function calls or mathematical operations, can be used to build more sophisticated output. Finally, F-strings are also very efficient, as they are evaluated at runtime and do not require any pre-processing, compilation, or additional parsing. This makes them faster and more lightweight than other methods of string formatting. One of the reasons why F-strings are so popular is because they are incredibly fast. In fact, they are faster than other string formatting methods like % formatting and str.format(). Therefore, F-strings are a powerful and flexible way to format strings in Python. Their speed and efficiency make them the preferred choice for string formatting in Python. Basic Syntax of F-Strings The basic syntax of F-strings is very simple. It consists of a string literal that may contain expressions enclosed in curly braces, {}. These expressions are evaluated at runtime and replaced with their values. For example, the following code prints the value of a variable x inside a string: Python x = 42 print(f"The answer is {x}") # Output The answer is 42 Expressions inside curly braces can also be more complex, including function calls, mathematical operations, and even other F-strings: Python ame = "Alice" age = 30 print(f"{name} is {age} years old. Next year, she will be {age + 1}.") #Output Alice is 30 years old. Next year, she will be 31. Formatting Numbers With F-Strings F-strings can also be used to format numbers in various ways, including rounding, padding, and adding prefixes or suffixes. To format a number using F-strings, simply include the number inside the curly braces, followed by a colon and a format specifier. The format specifier defines how the number should be formatted, including its precision, width, and alignment. The following script prints a floating-point number with only two decimal places: Python x = 3.14159 print(f"Pi is approximately {x:.2f}") # Output Pi is approximately 3.14 Rounding Numbers With F-Strings F-strings can also be used to round numbers to a specific precision, using the round() function. To round a number using f-strings, simply include the number inside the curly braces, followed by a colon and the number of decimal places to round to. Here, we round a floating-point number to two decimal places: Python x = 3.14159 print(f"Pi is approximately {round(x, 2)}") #Output Pi is approximately 3.14 Formatting Percentages With F-Strings F-strings can also be used to format percentages, using the % format specifier. To format a number as a percentage using F-strings, simply include the number inside the curly braces, followed by a colon and the % symbol: Python x = 0.75 print(f"{x:.2%} of the time, it works every time.") #Output 75.00% of the time, it works every time. Working With Decimals Using F-Strings F-strings can also be used to format decimal objects, which are built-in data types in Python that provide precise decimal arithmetic. To format a decimal object using F-strings, simply include the object inside the curly braces, followed by a colon and the desired format specifier. Here, we print a decimal object with 4 decimal places: Python from decimal import Decimal x = Decimal('3.14159') print(f"The value of pi is {x:.4f}") # Output The value of pi is 3.1416 Formatting Dates With F-Strings F-strings can also be used to format dates and times, using the built-in datetime module. To format a date or time using F-strings, simply include the object inside the curly braces, followed by a colon and the desired format specifier. For example, the current date and time are displayed in ISO format: Python from datetime import datetime now = datetime.now() print(f"The current date and time is {now:%Y-%m-%d %H:%M:%S}") #Output The current date and time is 2023-08-14 14:30:00 Multiline F-Strings With multiline F-strings, you can now write and format strings that span multiple lines without any hassle. Let's take a look at an example to understand how multiline F-strings work. Imagine you have a long string that you want to split across multiple lines for better readability. Instead of using the traditional concatenation method, you can use the power of F-strings. Here's how it works: Python name = "John" age = 25 address = "123 Street, City" message = f""" Hello {name}, I hope this email finds you well. I wanted to inform you that your age is {age} and your address is {address}. Thank you. """ print(message) As you can see, we have used triple quotes (""") to create a multiline string and then used F-string syntax (f"") to insert variables directly into the string. This way, we don't need to worry about concatenation or formatting issues. F-Strings in Dictionaries Dictionaries are an essential data structure in Python, and being able to incorporate them into our F-strings can be incredibly useful. To use a dictionary in an F-string, we simply need to provide the dictionary name followed by the key inside curly braces: Python person = {"name": "John","age": 25,"address": "123 Street, City" } message = f"Hello {person['name']}, your age is {person['age']} and your address is {person['address']}." print(message) We have a dictionary called person with keys such as "name", "age", and "address". We access the values of these keys inside the F-string using square brackets ([]). This allows us to dynamically incorporate dictionary values into our strings. To sum up, by using F-strings effectively, you can create clear and concise code that is easy to understand and maintain. Whether you are working with simple strings or complex data structures, F-strings can help you achieve your formatting goals with ease. As usual, the best way to ensure you understand f-strings is to practice and apply them in real-life Python projects.
One of the components of my OpenTelemetry demo is a Rust application built with the Axum web framework. In its description, axum mentions: axum doesn't have its own middleware system but instead uses tower::Service. This means axum gets timeouts, tracing, compression, authorization, and more, for free. It also enables you to share middleware with applications written using hyper or tonic. — axum README So far, I was happy to let this cryptic explanation lurk in the corner of my mind, but today is the day I want to understand what it means. Like many others, this post aims to explain to me and others how to do this. The tower crate offers the following information: Tower is a library of modular and reusable components for building robust networking clients and servers. Tower provides a simple core abstraction, the Service trait, which represents an asynchronous function taking a request and returning either a response or an error. This abstraction can be used to model both clients and servers. Generic components, like timeouts, rate limiting, and load balancing, can be modeled as Services that wrap some inner service and apply additional behavior before or after the inner service is called. This allows implementing these components in a protocol-agnostic, composable way. Typically, such services are referred to as middleware. — tower crate Tower is designed around Functional Programming and two main abstractions, Service and Layer. In its simplest expression, a Service is a function that reads an input and produces an output. It consists of two methods: One should call poll_ready() to ensure that the service can process requests call() processes the request and returns the response asynchronously Because calls can fail, the return value is wrapped in a Result. Moreover, since Tower deals with asynchronous calls, the Result is wrapped in a Future. Hence, a Service transforms a Self::Request into a Future<Result>, with Request and Response needing to be defined by the developer. The Layer trait allows composing Services together. Here's a slightly more detailed diagram: A typical Service implementation will wrap an underlying component; the component may be a service itself. Hence, you can chain multiple features by composing various functions. The call() function implementation usually executes these steps in order, all of them being optional: Pre-call Call the wrapped component Post-call For example, a logging service could log the parameters before the call, call the logged component, and log the return value after the call. Another example would be a throttling service, which limits the rate of calls of the wrapped service: it would read the current status before the call and, if above a configured limit, would return immediately without calling the wrapped component. It will call the component and increment the status if the status is valid. The role of a layer would be to take one service and wrap it into the other. With this in mind, it's relatively easy to check the axum-tracing-opentelemetry crate and understand what it does. It offers two services with their respective layers: one is to extract the trace and span IDs from an HTTP request, and another is to send the data to the OTEL collector. Note that Tower comes with several out-of-the-box services, each available via a feature crate: balance: load-balance requests buffer: MPSC buffer discover: service discovery filter: conditional dispatch hedge: retry slow requests limit: limit requests load: load measurement retry: retry failed requests timeout: timeout requests Finally, note that Tower comes in three crates: tower is the public crate, while tower-service and tower-layer are considered less stable. In this post, we have explained what is the Tower library: it's a Functional Programming library that provides function composition. If you come from the Object-Oriented Programming paradigm, it's similar to the Decorator pattern. It builds upon two abstractions, Service is the function and Layer composes functions. It's widespread in the Rust ecosystem, and learning it is a good investment. To go further: Axum Tower documentation Tower crate Axum_tracing_opentelemetry documentation
Javin Paul
Lead Developer,
infotech
Reza Rahman
Principal Program Manager, Java on Azure,
Microsoft
Kai Wähner
Technology Evangelist,
Confluent
Alvin Lee
Founder,
Out of the Box Development, LLC