Will this work?

Dec 8, 2016 at 2:44 PM
I have to upload 1million documents, to many document libraries in many site collections within one web app. I have a .csv which references each document by name and the full url of the document library which the document needs to be uploaded into.

Would this software allow me to load the CSV and then run through every row(document) and upload to the specified document library with metadata?
Coordinator
Dec 8, 2016 at 4:22 PM
Edited Dec 8, 2016 at 4:25 PM
It will do what you ask no problem.

The only caveat is that each import is defined by a configuration file that includes the url of the destination document library so this is NOT taken from the CSV file by default.

If you want to work around this you can either;
  1. Split your CSV and have one configuration file per document.
or
  1. Use PowerShell to automate the import.
A PowerShell script is installed to the examples directly when you install DIFS.

The important bit (you will need the fully script which also shows how to load assemblies) is below.
# We create a new import settings object using an existing settings file
$importsettings = [SPImportHelper.Settings.Utility]::LoadSettings($pathtosettings);

# We tell the user what source is being imported from
Write-Host "Source : Connection String : " $importsettings.Source.OleDbSourceDataSetSettings.ConnectionString

# We Create a new import object
$import = New-Object SPImportHelper.Import.DataSetImportToSharePoint_Console($importsettings);

# Write tell the user that we are starting the import
Write-Host "Importing";

# We start the import
$import.StartImport();

# Write tell the user that we have finished the import
Write-Host "Finished";
The important bit of the XML config files for destination;
    <DestinationListSettings>
      <DestinationWebUrlRelative>/sites/SPImportHelper</DestinationWebUrlRelative>
      <DestinationFolderUrlRelative>/sites/SPImportHelper/Docs</DestinationFolderUrlRelative>
      <DestinationServerUrl>http://productdev</DestinationServerUrl>
      <DestinationListName>Docs</DestinationListName>
    </DestinationListSettings>
And for source;
  <Source>
    <SourceDataSetType>OLEDbSelect</SourceDataSetType>
    <OleDbSourceDataSetSettings>
      <ConnectionString>Provider=Microsoft.ACE.OLEDB.12.0;Data Source=\\vmware-host\Shared Folders\Projects\ProductDevelopment\ImportForSharePoint\WorkingArea\Examples\StaffDocuments.xlsx;Extended Properties="Excel 12.0 Xml;HDR=YES;IMEX=0";</ConnectionString>
    </OleDbSourceDataSetSettings>
    <OleDbSelectSourceDataSetSettings>
      <SelectStatement>select * from [files$]</SelectStatement>
    </OleDbSelectSourceDataSetSettings>
  </Source>
So you should be able to add to the bottom of the script thing like ;
$importsettings.DestinationListSettings.DestinationFolderUrlRelative = ">/sites/SPImportHelper/Docs02"
$importsettings.DestinationListSettings.DestinationListName = "Docs02"
$importsettings.OleDbSelectSourceDataSetSettings.SelectStatement = "select * from [files$] where DestLibrary = 'Docs02'"
$import = New-Object SPImportHelper.Import.DataSetImportToSharePoint_Console($importsettings);
$import.StartImport();

$importsettings.DestinationListSettings.DestinationFolderUrlRelative = ">/sites/SPImportHelper/Docs03"
$importsettings.DestinationListSettings.DestinationListName = "Docs03"
$importsettings.OleDbSelectSourceDataSetSettings.SelectStatement = "select * from [files$] where DestLibrary = 'Docs03'"
$import = New-Object SPImportHelper.Import.DataSetImportToSharePoint_Console($importsettings);
$import.StartImport();
The above assumes you have converted your CSV to Excel and there is a column called Docs03.

Tagged onto the bottom of the script this would;
  • First Process the import as per the configuration XML file.
  • Second Do the same but for documents going into a library called DOCS02
  • third Do the same but for documents going into a library called DOCS03
I have not tested the exact code above but have done similar in the past and it has worked perfectly.
Coordinator
Dec 8, 2016 at 4:41 PM
Just noting the number of documents you are importing.

You might want to check out source type OLEDbTable in the documentation and consider using a SQL instance and SQL connectionstring. This source type will write any errors back to the table so you can keep track of any documents that did not import more easily.