It has been some time since we have posted some code so we decided that we wanted the ability to search any folder we wanted for Office documents and return specific information. This information could be anything from the Author of the document to the number of pages total. This could be readily useful for anyone looking to see how many documents are actually theirs as opposed to how many another person has made. We could also use it to see how many pages of paper you save by using your computer. Whatever the case, this program could be modified a number of different ways to become useful for recursive navigating through files and folders.

To start we need to download a DLL that will be absolutely crucial to making this program operable. This DLL is DSOfile.dll which allows use to edit Office documents without having to have Office installed. This is useful for those looking to port this code to other platforms as well as save you a little extra cash in you wallet if you use open office but still accept Microsoft Office documents. The following DLL will allow us to access the following documents as stated on the Microsoft website:

  • Microsoft Excel workbooks
  • Microsoft PowerPoint presentations
  • Microsoft Word documents
  • Microsoft Project projects
  • Microsoft Visio drawings
  • Other files that are saved in the OLE Structured Storage format

The first thing we are going to want to do is download it from the Microsoft website as seen in the screenshot below.

The next thing we want to do is install the DLL onto our system. To do this find the file we downloaded and run it. After the program is running we will be asked if we are sure we want to install the following sample on our computer, which we do. After this is done we will be asked to agree to a license agreement which we can go ahead and press Yes to. After this is done we will be asked for the installation path which we left it as the default, “C:\DsoFile”.

After this is done we will get the notification that installation is complete and we can proceed to add a reference to a Visual Studio C Sharp Console project which we entitled “FileGrep”. After the project is created we can navigate to the Project Solution pane and right click on References. We then choose Add Reference which will bring up options to add DLLs to our working project. We want to select the tab that says Browse and navigate to where we installed the DSO file and double click it to add it to our project. Once the DLL is added to our project your references folder should look like the one on the right. With this file added we now have a whole new set of abilities we can tap into using C Sharp.

We can now navigate to program.cs and double click it to view the code. It is pretty barren right now so we should add some code that will allow our program to re-curse through files and folders. To do this first we should update our using statements to included DSOfile as well as add some variables to makes searching a specific authors, file names, the number of files counted, as well as some prompts to populate these variables. The following is what your code should look like after our using statement, variables and prompts should look like:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using DSOFile;
using System.Threading;

namespace FileGrep
{
class Program
{

// Suggested Method
// http://theinfiniteloopblog.com/gen/recursing-through-a-directoryfilesystem-using-c/
static string filename = "";
static int count = 0;
static int fileCount = 0;
static int totalFileCount = 0;
static string publisher = "";
static void Main(string[] args)
{
string startPath = "";
while (startPath != "e")
{
Console.Write("Please enter the path to start at: ");
startPath = Console.ReadLine();
if (startPath == "e")
break;
Console.Write("Please enter the file to save in: ");
filename = Console.ReadLine();
Console.Write("Please enter the author to look for: ");
publisher = Console.ReadLine();
//Thread t = new Thread(checkKeys);
//t.Start();
count = 0;
totalFileCount = 0;
fileCount = 0;

After this is done we can now and check and see if the path we have given actually exists. If it does we want to create an object of DirectoryInfo and set it to our start path. This will allow us a starting point. We then want to check and see if the file we are looking to create exists and if it does we want to delete it as we do not want to append old and new entries. We then want to have a using statement that has a FileStream created with the FileMode set to Append and the FileAccess set to Write. Inside the using statement we create a StreamWriter that will utilize the FileStream and add in the headers which we will be looking for. We can then close the StreamWriter and implement a method we will discuss below called viewDirectories. If the directory does not exist then we want tell the user this so they can press Enter and exit the application. After this is done we output the number of files scanned and number of files added to our .csv file. Below is the code we discussed above:

if (Directory.Exists(startPath))
{
DirectoryInfo dir = new DirectoryInfo(startPath);
if (File.Exists(@"C:\Users\" + System.Environment.UserName.ToString() + "\\" + filename + ".csv"))
File.Delete(@"C:\Users\" + System.Environment.UserName.ToString() + "\\" + filename + ".csv");
using (FileStream fileStream = new FileStream(@"C:\Users\" + System.Environment.UserName.ToString() + "\\" + filename + ".csv",
FileMode.Append, FileAccess.Write))
{
StreamWriter sr = new StreamWriter(fileStream, Encoding.UTF8);
sr.WriteLine("Directory,Author,Company,Category,Subject, Title,Page Count,Slide Count,Version,Application Name,File Name");
sr.Close();
// Directory,Author,Company,Category,Subject,Title,Created Date,Character Count With Spaces,Paragraph Count,
}
viewDirectories(dir);
}
else
{
Console.WriteLine("Directory does not exist. Press Enter to exit");
Console.ReadLine(); //when you hit return the program will terminate.
}
Console.WriteLine("Number Of Files Added To " + filename + ".csv: " + count.ToString());
Console.WriteLine("Number Of Files Total: " + totalFileCount.ToString());

Console.ReadLine(); //when you hit return the program will terminate.
}
}

As we can see below we are going over the viewDirectories method where we pass in our DirectoryInfo object from above. We then can create an OleDocumentProperties object called dso which we made possible by importing our DSOfile.dll. We then want to loop through every file in the directory while using a try statement to catch any errors we may receive. Inside the try brackets we want to increment the filecount and the totalFileCount. We also want to use dso to open the specific file and set it to read only as we want to grab information from the file not write to it. We then want to check and see if any information is null and if it is then pass it by. We can then check the Author, Company, Category, Subject and Title fields to see if they are null. If they are not we want to check them against the Author we entered above.

static void viewDirectories(DirectoryInfo dir)
{
OleDocumentProperties dso = new OleDocumentProperties();

foreach (FileInfo file in dir.GetFiles())
{
try
{
++fileCount;
++totalFileCount;

dso.Open(file.FullName, true, dsoFileOpenOptions.dsoOptionOpenReadOnlyIfNoWriteAccess);
if (dso.SummaryProperties.Author != null || dso.SummaryProperties.Company != null || dso.SummaryProperties.Category != null || dso.SummaryProperties.Subject != null || dso.SummaryProperties.Title != null)
{
// Loop Through publishers and see if match
if (dso.SummaryProperties.Author == publisher || dso.SummaryProperties.Company == publisher || dso.SummaryProperties.Category == publisher)
{

Below we then want to let the user know that we have found an entry. After this is done we want to make sure that we do not exceed a file count of 650000 so that we do not tax the system for too long. When we reach 650000 we will let the user know and allow them to leave and when ready again press Enter to resume. Once the process resumes we create a using statement to open our CSV file and set the FileMode to Append. We then create another StreamWriter to write all of the information to the CSV file and then close the StreamWriter and increment our count. We can then close the OleDocumentProperties and create the catch statements and outputs for any typical errors we may encounter.

Console.WriteLine("Found One!");

if (fileCount == 650000)
{
Console.WriteLine("Stopped for the day, When you return press any key");
Console.ReadLine();
fileCount = 0;
}
using (FileStream fileStream = new FileStream(@"C:\Users\" + System.Environment.UserName.ToString() + "\\" + filename + ".csv",
FileMode.Append, FileAccess.Write))
{
StreamWriter sr = new StreamWriter(fileStream, Encoding.UTF8);
sr.WriteLine(file.DirectoryName + "," + dso.SummaryProperties.Author + "," + dso.SummaryProperties.Company + "," + dso.SummaryProperties.Category + "," + dso.SummaryProperties.Subject + "," + dso.SummaryProperties.Title + "," + dso.SummaryProperties.PageCount + "," + dso.SummaryProperties.SlideCount + "," + dso.SummaryProperties.Version + "," + dso.SummaryProperties.ApplicationName + "," + file.Name);
sr.Close();
++count;
}

dso.Close(false);
break;
}

}
dso.Close(false);

}
catch (UnauthorizedAccessException e)
{
Console.WriteLine("Can't Access File");
using (FileStream fileStream = new FileStream(@"C:\Users\" + System.Environment.UserName.ToString() + "\\Errors.csv",
FileMode.Append, FileAccess.Write))
{
StreamWriter sr = new StreamWriter(fileStream, Encoding.UTF8);
sr.WriteLine("Can't Access File - " + e.Message + "\nFile: " + file.Name + "\nPath: " + file.DirectoryName);
sr.Close();

}
}
catch (NotImplementedException e)
{
Console.WriteLine("Can't Access File");
using (FileStream fileStream = new FileStream(@"C:\Users\" + System.Environment.UserName.ToString() + "\\Errors.csv",
FileMode.Append, FileAccess.Write))
{
StreamWriter sr = new StreamWriter(fileStream, Encoding.UTF8);
sr.WriteLine("Can't Access File - " + e.Message + "\nFile: " + file.Name + "\nPath: " + file.DirectoryName);
sr.Close();

}
}
catch (Exception e)
{
Console.WriteLine("***** HRESULT - " + e.Message + " *****");
using (FileStream fileStream = new FileStream(@"C:\Users\" + System.Environment.UserName.ToString() + "\\Errors.csv",
FileMode.Append, FileAccess.Write))
{
StreamWriter sr = new StreamWriter(fileStream, Encoding.UTF8);
sr.WriteLine("***** HRESULT - " + e.Message + "\nFile: " + file.Name + "\nPath: " + file.DirectoryName);
sr.Close();

}
}
}

Below we have a foreach statement that allows use to navigate into the folders and into sub folders of those folders. To do this we call the viewDirectories method coded above to search all files in the new folder and implement a recursive method. We wrap it all in a try catch statement so that we can present any errors to the user that we may encounter.


foreach (DirectoryInfo subDir in dir.GetDirectories())
{
try
{
viewDirectories(subDir); //recurse
}
catch (UnauthorizedAccessException e)
{
Console.WriteLine("Can't Access Directory");
}
catch (NotImplementedException e)
{
Console.WriteLine("Can't Access File");
}
catch (Exception e)
{
Console.WriteLine("***** HRESULT - " + e.Message + " *****");
}

}
}

This quick and dirty way to recursively search through files and folders to find information that is important to us is far from complete or optimized. We challenge our developer readers to improve upon this and make it optimized. Share your customizations with us in the comments below. We also enjoy giving credit where it is due as well as not coding something that is already been done. You will notice the link in the code at the top where we gained the ability to recurse through files and folders instead of coding from scratch. Below you can find the complete Program.cs instead of it broken into pieces as seen above! Until the next coding update, Happy Hacking!

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using DSOFile;
using System.Threading;

namespace FileGrep
{
class Program
{

// Suggested Method
// http://theinfiniteloopblog.com/gen/recursing-through-a-directoryfilesystem-using-c/
static string filename = “”;
static int count = 0;
static int fileCount = 0;
static int totalFileCount = 0;
static string publisher = “”;
static void Main(string[] args)
{
string startPath = “”;
while (startPath != “e”)
{
Console.Write(“Please enter the path to start at: “);
startPath = Console.ReadLine();
if (startPath == “e”)
break;
Console.Write(“Please enter the file to save in: “);
filename = Console.ReadLine();
Console.Write(“Please enter the author to look for: “);
publisher = Console.ReadLine();
//Thread t = new Thread(checkKeys);
//t.Start();
count = 0;
totalFileCount = 0;
fileCount = 0;
if (Directory.Exists(startPath))
{
DirectoryInfo dir = new DirectoryInfo(startPath);
if (File.Exists(@”C:\Users\” + System.Environment.UserName.ToString() + “\\” + filename + “.csv”))
File.Delete(@”C:\Users\” + System.Environment.UserName.ToString() + “\\” + filename + “.csv”);
using (FileStream fileStream = new FileStream(@”C:\Users\” + System.Environment.UserName.ToString() + “\\” + filename + “.csv”,
FileMode.Append, FileAccess.Write))
{
StreamWriter sr = new StreamWriter(fileStream, Encoding.UTF8);
sr.WriteLine(“Directory,Author,Company,Category,Subject, Title,Page Count,Slide Count,Version,Application Name,File Name”);
sr.Close();
// Directory,Author,Company,Category,Subject,Title,Created Date,Character Count With Spaces,Paragraph Count,
}
viewDirectories(dir);
}
else
{
Console.WriteLine(“Directory does not exist. Press Enter to exit”);
Console.ReadLine(); //when you hit return the program will terminate.
}
Console.WriteLine(“Number Of Files Added To ” + filename + “.csv: ” + count.ToString());
Console.WriteLine(“Number Of Files Total: ” + totalFileCount.ToString());

Console.ReadLine(); //when you hit return the program will terminate.
}
}

static void viewDirectories(DirectoryInfo dir)
{
OleDocumentProperties dso = new OleDocumentProperties();

foreach (FileInfo file in dir.GetFiles())
{
try
{
++fileCount;
++totalFileCount;

dso.Open(file.FullName, true, dsoFileOpenOptions.dsoOptionOpenReadOnlyIfNoWriteAccess);
if (dso.SummaryProperties.Author != null || dso.SummaryProperties.Company != null || dso.SummaryProperties.Category != null || dso.SummaryProperties.Subject != null || dso.SummaryProperties.Title != null)
{
// Find all documents in Root Folder & Check against publishers
for (int i = 0; i < publisher.Length; ++i) { // Loop Through publishers and see if match if (dso.SummaryProperties.Author == publisher || dso.SummaryProperties.Company == publisher || dso.SummaryProperties.Category == publisher) { Console.WriteLine("Found One!"); if (fileCount == 650000) { Console.WriteLine("Stopped for the day, When you return press any key"); Console.ReadLine(); fileCount = 0; } using (FileStream fileStream = new FileStream(@"C:\Users\" + System.Environment.UserName.ToString() + "\\" + filename + ".csv", FileMode.Append, FileAccess.Write)) { StreamWriter sr = new StreamWriter(fileStream, Encoding.UTF8); sr.WriteLine(file.DirectoryName + "," + dso.SummaryProperties.Author + "," + dso.SummaryProperties.Company + "," + dso.SummaryProperties.Category + "," + dso.SummaryProperties.Subject + "," + dso.SummaryProperties.Title + "," + dso.SummaryProperties.PageCount + "," + dso.SummaryProperties.SlideCount + "," + dso.SummaryProperties.Version + "," + dso.SummaryProperties.ApplicationName + "," + file.Name); sr.Close(); ++count; } dso.Close(false); break; } } } dso.Close(false); } catch (UnauthorizedAccessException e) { Console.WriteLine("Can't Access File"); using (FileStream fileStream = new FileStream(@"C:\Users\" + System.Environment.UserName.ToString() + "\\Errors.csv", FileMode.Append, FileAccess.Write)) { StreamWriter sr = new StreamWriter(fileStream, Encoding.UTF8); sr.WriteLine("Can't Access File - " + e.Message + "\nFile: " + file.Name + "\nPath: " + file.DirectoryName); sr.Close(); } } catch (NotImplementedException e) { Console.WriteLine("Can't Access File"); using (FileStream fileStream = new FileStream(@"C:\Users\" + System.Environment.UserName.ToString() + "\\Errors.csv", FileMode.Append, FileAccess.Write)) { StreamWriter sr = new StreamWriter(fileStream, Encoding.UTF8); sr.WriteLine("Can't Access File - " + e.Message + "\nFile: " + file.Name + "\nPath: " + file.DirectoryName); sr.Close(); } } catch (Exception e) { Console.WriteLine("***** HRESULT - " + e.Message + " *****"); using (FileStream fileStream = new FileStream(@"C:\Users\" + System.Environment.UserName.ToString() + "\\Errors.csv", FileMode.Append, FileAccess.Write)) { StreamWriter sr = new StreamWriter(fileStream, Encoding.UTF8); sr.WriteLine("***** HRESULT - " + e.Message + "\nFile: " + file.Name + "\nPath: " + file.DirectoryName); sr.Close(); } } } foreach (DirectoryInfo subDir in dir.GetDirectories()) { try { viewDirectories(subDir); //recurse } catch (UnauthorizedAccessException e) { Console.WriteLine("Can't Access Directory"); } catch (NotImplementedException e) { Console.WriteLine("Can't Access File"); } catch (Exception e) { Console.WriteLine("***** HRESULT - " + e.Message + " *****"); } } } } } [/sourcecode]

Share This