How To Automatically Import And Combine Multiple Files In R

Author:Murphy  |  View: 26159  |  Time: 2025-03-23 12:41:21

In my data scientist job, I regularly have to import several different files that contain the same type of information due to export constraints in different software. If you are in a similar situation, below is a clear and simple way to be able to automatically import your files as individual data frames or combine them into a single data frame.

Prepare Your Files

Before we get started with our code, we first must prepare our files. We need to have a way to programmatically choose the files that we want to import into R. While you could choose any way to distinguish these files, here are two of the easiest ways:

  1. Create a unique prefix on all of the files that you want to import at once.
  2. Create a separate folder in your working directory and only include those files in that folder.

For example, if I had a set of Excel files called "SA#.xlsx". If I had no other similar files that started with SA, then I already have my prefix. If there are other files in my folder that start with SA such as "SAT.xlsx", I can easily create a folder and I will name it "SA". Then, I will only include the files I want to import as SA into that folder.

Create Your File List

Once we have a programmatic way to identify our files, we need to create a list of all of the file names. We can use the R function list.files() to achieve this.

File list with prefix

If you choose to add a prefix to your file names, we will use the pattern parameter of list.files() to select the specific files that we want.

# Formula
filelist <- list.files(pattern = "^")

#Example
filelist <- list.files(pattern = "^SA")

The pattern takes in a regular expression. Therefore, we can use the "^" symbol to represent the beginning of the string. This ensures that any other file names that include "SA" within the name but not at the beginning will not be included in this set of names. Note: This will only pull files from your working directory. You can change the path to pull files from a different directory.

File list in a folder

If you instead choose to add your files to a folder, we will use the path parameter to tell R where to pull our files from.

#Formula
filelist <- list.files(path = "./")

#Example
filelist <- list.files(path = "./SA")

The "." symbol points to the current working directory. Then, it will look for a folder with "SA" and include all the file names from that folder.

Importing Your Files

Now that we have a list, we can run a for loop over our list to import all of our files into our current environment. If we want to include each file as its own variable, we will first need to create the file name, then import the file, then assign the file dynamically to the variable name. While this process is similar for both prefix files and folder-imported files, there is a small difference when importing the files.

Import files

Let's first cover the difference in the reading of the file into the R environment as this will be the only change needed for the two different methods above.

If you used the prefix method, the files exist in your working directory. Therefore, you do not need to specify the path of the file. However, if you added them to a folder, they are no longer in your working directory. Therefore, we need to dynamically construct the file path for these files.

To create this formula, I am going to use the variable "file" to represent the file name in our filelist variable. This will allow me to directly use this code in the for loop below.

# For Prefix Files
df <- (file)

# Prefix example with an excel file
df <- read_excel(file)

#For Folder files
df <- (paste(".//",file,sep=""))

#Folder file example with excel file
df <- read_excel(paste("./SA/",file,sep=""))

For files within a folder, we have to add the folder path to the file name every time we import a file. Luckily, R has the paste command that allows us to add the folder path dynamically to each of the file names. Since paste automatically creates the separator as a space, we have to overwrite the separator (sep) to be a blank space, which we do by adding quotations.

Automatically Importing Files Individually

From now on, I will just use the functions assuming the files exist in the working directory for simplicity. Now, we need to create a for loop that will allow the files to automatically be imported and set to a dynamic variable.

Creating the variable name

First, we need to create our variable dynamically. We will use the file name to create the variable; however, we will want to remove the file extension from the file name. Again, I am using "file" as a placeholder for the file names. You can use this same idea if you want to remove the prefix from the file name as well.

#Formula
name <- gsub(".", "", file)

#Example for Excel file
name <- gsub(".xlsx","", file)

Assigning the variable

Since we have already created our import code above, let's create the code to assign the variable. While it is easier to assign a variable with a "=" or "<-" symbol, we cannot use this with dynamic variable names. Instead, we will use the assign function in R.

assign(name, df)

Creating the for loop

We finally have all the components to create a for loop to automatically import all of our files. All we need to do is add the above code within a for loop that will cycle through our filelist variable.

# Formula
for (file in filelist){

name <- gsub(".", "", file)
df <- (file)
assign(name, df)
}

#Example for Excel files
for (file in filelist) {

name <- gsub(".xlsx", "", file)
df <- read_excel(file)
assign(name, df)

}

Now, you can add your files easily, but what if you wanted to simply add all these files into one combined file?

Automatically Import And Combine Files

There are many times when you may have multiple files that are simple segments of the same information. You really want to combine them and work with a single data frame. Therefore, we will now create a for loop that only outputs a single data frame with all the rows combined. For this, instead of assigning each file to its own variable, we will use the rbind() function to bind the rows to a single data frame.

First, we need to create an empty data frame that the new files will be added to. Then, we can use the for loop to import and bind them to this data frame.

#Formula
 <- data.frame()

for (file in filelist){

df <- (file)
 <- rbind(, df)

}

#Example with SA dataframe and excel files

SA <- data.frame()

for (file in filelist){

df <- read_excel(file)
SA <- rbind(SA, df)

}

Running the example code would generate a single data frame named "SA" with all the data from the files within the filelist.

Creating a file column

There may be times when the specific file the data comes from contains important information. For example, someone may give you data that has a certain date range or specific meeting information that is important to be able to analyze the data. If you just combine the data inside each file, you will not know which file the data came from.

Therefore, we want to create a dynamic column that will add the file information for each file before we bind the data. We will use the same name variable we created before to capture the file name information and add it as a column in the dataset.

#Formula
 <- data.frame()

for (file in filelist){

name <- gsub(".", "", file)
df <- (file)
df$ <- name
 <- rbind(, df)

}

#Example with SA dataframe and excel files

SA <- data.frame()

for (file in filelist){

name <- gsub(".xlsx", "", file)
df <- read_excel(file)
df$filename <- name
SA <- rbind(SA, df)

}

Now, you can use the for loop above to easily import and combine multiple files without losing the information for the file that they came from.

Overall, the ability to automatically import files will save you large amounts of Coding time and instead, you can allow the computer to use computational time to automatically manage multiple files.

Tags: Coding Data Analysis Data Science Programming R Programming

Comment