DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • BigQuery DataFrames in Python
  • Python Packages for Validating Database Migration Projects
  • Pydantic: Simplifying Data Validation in Python
  • Bridging Graphviz and Cytoscape.js for Interactive Graphs

Trending

  • The Perfection Trap: Rethinking Parkinson's Law for Modern Engineering Teams
  • Advancing Robot Vision and Control
  • Implementing API Design First in .NET for Efficient Development, Testing, and CI/CD
  • How To Introduce a New API Quickly Using Quarkus and ChatGPT
  1. DZone
  2. Data Engineering
  3. Data
  4. How To Read a File Line by Line Into a List in Python

How To Read a File Line by Line Into a List in Python

In this article, I will discuss how to open a file for reading with the built-in function open() and the use of Pandas library to manipulate data in the file.

By 
Ankur Ranpariya user avatar
Ankur Ranpariya
·
May. 25, 23 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
3.5K Views

Join the DZone community and get the full member experience.

Join For Free

Most of the time, we process data from a file so that we can manipulate it from memory. This data can be numeric, string, or a combination of both. In this article, I will discuss how to open a file for reading with the built-in function open() and the use of Pandas library to manipulate data in the file. This also includes reading the contents of a file line by line and saving the same to a list.

Things To Learn in This Article

  1. Open the file for reading
  2. Read the contents of a file line by line
  3. Store the read lines into a list data type
    • For loop
    • List comprehension
    • Readlines
    • Readline
  4. Read file using pandas

1. Open the File for Reading

Python's built-in function open() can be used to open a file for reading and writing. It is defined below based on the Python documentation.

Python
 
open(file, mode='r', buffering=-1, encoding=None,
     errors=None, newline=None, closefd=True, opener=None)


These are the supported values for the mode.

These are the supported values for the mode

Here is an example script called main.py that will open the file countries.txt for reading.

main.py

Python
 
with open('countries.txt', mode='r') as f:
    # other stuff


An Alternative Method of Opening a File

An alternative way of opening a file for reading is the following.

Python
 
f = open('countries.txt', mode='r')
# other stuff
f.close()


You have to close the file object explicitly with close(). Anyhow use the with statement whenever possible, as the context manager will handle the entry and exit execution of the code; hence closing the file object  close() is not needed.

2. Read the Contents of a File Line by Line

Let us go back to our code in main.py. I will add a block of code that will read the contents of the file line by line and print it.

main.py

Python
 
with open('countries.txt', mode='r') as f:
    for lines in f:
        line = lines.rstrip()  # rstrip() will remove the newline character
        print(line)  # print to console


countries.txt

Python
 
Australia
China
Philippines


Ensure that the main.py and countries.txt are on the same directory. That is because of the code above. In my case, they are in F:\Project\8thesource path.

Execute the main.py from the command line.

Python
 
PS F:\Project\8thesource> python main.py


output

Python
 
Australia
China
Philippines


There we have it. We read countries.txt line by line using the open() function and file object manipulation. The first line printed was Australia, followed by China, and finally the Philippines. It is consistent according to the sequence of how they were written in countries.txt file.

3. Store the Read Lines Into a List Data Type

Python has a popular data type called list that can store other object types or a combination of object types. A list of integers could be [1, 2, 3]. A list of strings could be ['one', 'two', 'three']. A list of integer and string could be [1, 'city', 45]. A list of lists could be [[1, 2], [4, 6]]. A list of tuples could be [(1, 2), ('a', 'b')]. A list of dictionaries could be [{'fruit': 'mango'}, {'count': 100}].

I will modify the main.py to store the read lines into a list.

a) For Loop

main.py

Python
 
data_list = []  # a list as container for read lines

with open('countries.txt', mode='r') as f:
    for lines in f:
        line = lines.rstrip()  # remove the newline character
        data_list.append(line)  # add the line in the list
print(data_list)


The countries.txt is the file name. We open it for reading with symbol r. We use the for loop to read each line and save it to a list called data_list. After saving all the lines to a list via append method, the items in the list are then printed.

After executing the main.py, we got the following output.

output

Python
 
['Australia', 'China', 'Philippines']


b) List Comprehension

Another option to save the read lines into the list is by the use of list comprehension. It uses a for loop behind the scene and is more compact but not beginner-friendly.

Python
 
with open('countries.txt', mode='r') as f:
    data = [item.rstrip() for item in f]
print(data)


output

Python
 
['Australia', 'China', 'Philippines']


c) Readlines

Yet another option to save the read lines in a list is the method readlines().

Python
 
with open('countries.txt', mode='r') as f:
    data = f.readlines()
print(data)


The output still has the newline character \n.

output

Python
 
['Australia\n', 'China\n', 'Philippines']


This newline character can be removed by reading each items on that list and strip it. The readlines method is not an ideal solution if the file is big.

d) Readline

Another option to save the read line is by the use of the readline method.

Python
 
data_list = []
with open('countries.txt', mode='r') as f:
    while True:
        line = f.readline()
        line = line.rstrip()  # remove the newline character \n
        if line == '':
            break
        data_list.append(line)

print(data_list)


output

Python
 
['Australia', 'China', 'Philippines']


4. Read File Using Pandas

For people aspiring to become data scientists, knowledge of processing files is a must. One of the tools that should be learned is the Pandas library. This can be used to manipulate data. It can read files, including popular csv or comma-separated values formatted file.

Here is a sample scenario, we are given a capitals.csv file that contains the name of the country in the first column and the corresponding capital in the second column. Our job is to get a list of country and capital names.

capitals.csv

Python
 
Country,Capital
Australia,Canberra
China,Beijing
Philippines,Manila
Japan,Tokyo


For this particular job, it is better to use the Pandas library. The expected outputs are the country list [Australia, China, Philippines, Japan] and the capital list [Canberra, Beijing, Manila, Tokyo].

Let us create capitals.py to read the capitals.csv using Pandas.

capitals.py

Python
 
"""
requirements:
    pandas

Install pandas with
    pip install pandas
"""

import pandas as pd

# Build a dataframe based from the csv file.
df = pd.read_csv('capitals.csv')
print(df)


command line

Python
 
PS F:\Project\8thesource> python capitals.py


output

Python
 
       Country   Capital
0    Australia  Canberra
1        China   Beijing
2  Philippines    Manila
3        Japan     Tokyo


Now we need to get the values in the Country and Capital columns and convert those to a list.

Python
 
import pandas as pd

# Build a dataframe based from the csv file.
df = pd.read_csv('capitals.csv')
print(df)

# Get the lists of country and capital names.
country_names = df['Country'].to_list()
capital_names = df['Capital'].to_list()


Pandas is very smart about this. It easily gets the tasks that we are after.

Now let us print those lists.

Python
 
import pandas as pd

# Build a dataframe based from the csv file.
df = pd.read_csv('capitals.csv')
print(df)

# Get the lists of country and capital names.
country_names = df['Country'].to_list()
capital_names = df['Capital'].to_list()

# Print names
print('Country names:')
print(country_names)

print('Capital names:')
print(capital_names)


output

Python
 
       Country   Capital
0    Australia  Canberra
1        China   Beijing
2  Philippines    Manila
3        Japan     Tokyo
Country names:
['Australia', 'China', 'Philippines', 'Japan']
Capital names:
['Canberra', 'Beijing', 'Manila', 'Tokyo']


That is it. We got the country and capital names as lists.

5. Conclusion

We use the built-in function open() to open and read the contents of a file and utilize the for loop to read it line by line, then save it to a list — a Python data type. There are also options, such as list comprehension, readlines and readline to save data into the list. Depending on the tasks and file given, we can use the Pandas library to process a csv file.

For further reading, have a look at Python's built-in function open() and the very useful Pandas Python library.

Pandas Python (language) Data Types

Published at DZone with permission of Ankur Ranpariya. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • BigQuery DataFrames in Python
  • Python Packages for Validating Database Migration Projects
  • Pydantic: Simplifying Data Validation in Python
  • Bridging Graphviz and Cytoscape.js for Interactive Graphs

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!