When working with Excel files, such as EBM cubes and pivot tables, ensuring optimal performance is essential. Microsoft Excel is a powerful tool, but its inherent file structures and memory usage can sometimes lead to errors, slow performance, or even broken files if best practices aren't followed.
This article contains the following topics:
As files grow in size and complexity, Excel users encounter common challenges such as slow performance, error messages, and excessive memory usage. The following sections address frequently asked questions and provide solutions to optimize Excel files effectively.
Understanding Excel File Sizes
Excel files (.xlsx or .xlsm) are essentially compressed zip files containing XML-formatted smaller files. To observe this, change the file extension to ".zip" and explore its contents. Each worksheet is listed within the "xl/worksheets" folder. The file's size is influenced by the size of these worksheets.
When opening an Excel file, it loads:
-
The basic Excel software and installed tools or extensions.
-
An uncompressed version of your file, necessary for Excel to function.
-
Additional data generated during your work.
Even a small 189KB file can expand to about 1.09MB and consume approximately 154.8MB of memory when open in Excel. Memory usage depends on computer activity, available memory, Excel version, and setup. Larger files, especially those over 50MB, can significantly slow down Excel, leading to crashes.
To prevent issues, keep files under 50MB. Once files approach 100MB, Excel becomes unstable, causing problems during use.
Frequently Asked Questions
What are Excel file size thresholds?
-
50MB - You may experience issues with excel instability, crashing, or file corruption due to excel crashes
-
100MB - You will very likely (at some point) experience issues with excel instability, crashing, or file corruption due to excel crashes
Why is my Excel file slow and becoming unusable?
Your Excel file may be experiencing performance issues due to factors such as an excessive number of tabs, cube connections, external links, and excessive references. These issues can lead to processing demands that Excel may struggle to handle.
How can I identify performance bottlenecks in my Excel file?
Check the number of tabs, cube connections, and cube references in your file. To do this, go to Data > Queries and Connections. Right-click on a cube connection, choose Properties, and toggle to "Used In" to understand their impact.
What are the recommended limits for tabs, cube connections, and references?
Aim for a manageable number of tabs (recommended: less than 30), minimize cube connections (recommended: one per cube type), and limit cube references to a reasonable count (recommended: a few hundred at most).
Best Practices and Recommendations in Excel
These recommendations are designed to help you avoid common pitfalls and ensure your files load correctly without breaking or generating errors. Failure to implement these best practices can directly impact the stability and usability of your files. It's important to note that many of these issues are tied to Microsoft Excel's inherent limitations, not EBM’s systems. Following these steps will minimize potential challenges and ensure a smoother experience.
1. Limit the Number of Field List Items in Rows and Columns
-
Why: Adding too many items to rows and columns makes pivot tables bulky, leading to slower performance or errors.
-
Best Practice: Place as many fields as possible into the Filters area.
-
Example: Instead of adding "Region," "Product," "Salesperson," and "Quarter" to rows and columns, add "Region" and "Quarter" to Filters to reduce processing load.
2. Minimize Calculation Complexity
-
Why: Complex or nested formulas increase processing time and can cause refresh failures.
-
Best Practice: Simplify formulas or offload calculations to the data source.
-
Example: Replace nested IF formulas like =IF(A1>100, IF(B1<50, A1+B1, A1-B1), 0) with pre-calculated values in helper columns or external systems.
3. Expand Hierarchies Instead of Adding Individual Fields
-
Why: Adding individual fields from multiple hierarchies into rows can lead to overlapping data structures, slowing down performance and increasing file complexity. Expanding the drillable fields within a hierarchy ensures better organization and faster processing.
-
Best Practice: Use the drillable fields in the appropriate hierarchies rather than manually adding individual fields from different hierarchies.
-
Example:
-
What NOT to Do: Adding "Region," "Salesperson," and "Quarter" as separate row fields from different hierarchies.
-
What to Do Instead: Expand the "Region" hierarchy to drill down into its detailed fields (e.g., "Region > State > City") within the same hierarchy.
-
Result: The file remains detailed but loads and operates much faster.
-
4. Build Out Dimensions First, but Add an Amount Field Sooner than Later
-
Why: When working with tabular cubes, pulling in dimensions without an amount field allows for flexibility in building out your analysis. However, if you notice significant lag or excessive records being returned, it’s likely time to include an amount field to optimize performance. MDX cubes inherently filter by amount, but tabular cubes do not, which can lead to slower performance.
-
Best Practice: Build out as much as you can using dimensions first, but if performance starts to degrade, include an amount field to limit the data being retrieved.
-
Example:
-
What to Do Initially: Build your pivot table with dimensions like "Customer," "Region," and "Product" to structure your analysis.
-
When to Adjust: If the file slows down significantly, add an amount field such as "Revenue" or "Profit" to filter out records with no associated values.
-
Result: You retain flexibility during setup but improve performance when needed by reducing excess records.
-
5. Reduce the Number of Active Pivot Tables
-
Why: Each active pivot table creates additional load during refreshes, potentially causing Excel to crash.
-
Best Practice: Consolidate pivot tables or refresh only the ones you need.
-
Example: Use slicers to consolidate "Sales by Region," "Sales by Product," and "Sales by Quarter" into a single pivot table.
6. Avoid Excessive Data Refresh Requests
-
Why: Repeated refreshes overload the data connection, leading to errors or timeouts.
-
Best Practice: Make all adjustments first and refresh only once.
-
Example: Adjust filters, fields, and views before clicking Refresh All.
7. Optimize Conditional Formatting and Charts
-
Why: Excessive formatting or complex charts linked to pivot tables can degrade performance and cause crashes.
-
Best Practice: Apply formatting sparingly and use charts on summarized data.
-
Example: Instead of applying conditional formatting to an entire table, limit it to the top 10 rows or critical metrics.
8. Eliminate Unused Rows and Columns
-
Why: Empty rows and columns unnecessarily increase file size and can cause errors during saves.
-
Best Practice: Delete unused rows and columns and use filters instead of hiding data.
-
Example:
-
Unhide all rows and columns.
-
Highlight empty areas and delete them (Ctrl + -).
-
Save, close, and reopen the file to observe size reductions.
-
9. Reduce the Number of Tabs by Breaking the File into Subsets
-
Why: Large files over 50MB can be unstable and prone to errors.
-
Best Practice: Break large workbooks into smaller, logically grouped files.
-
Example: Split a file into "Business Summary," "Customer Data," and "Service Metrics" files. Maintain cube connections by copying tabs into new workbooks using the overwrite connection option.
10. Optimize Cube Connections
-
Why: Unused cube connections create unnecessary overhead and can slow down refreshes.
-
Best Practice: Delete unused connections and consolidate similar ones.
-
Example: Go to Data > Queries and Connections, identify duplicates, and right-click to delete unused connections.
11. Manage Cube References
-
Why: Unused references in pivot tables or formulas can create errors during refresh.
-
Best Practice: Remove references to unused tabs or fields.
-
Example: Right-click the cube connection, select Properties > Used In, and delete unnecessary references.
12. Break or Delete External Workbook Links
-
Why: External links often break when files are shared, causing errors.
-
Best Practice: Remove or break all unnecessary links.
-
Example: Navigate to Data > Edit Links and break all links unless they are essential and controlled.
13. Turn Off AutoFit for Column Widths
-
Why: AutoFit recalculates column widths during every refresh, slowing performance.
-
Best Practice: Manually set column widths and disable AutoFit.
-
Example: Go to PivotTable Options > Layout & Format and uncheck AutoFit column widths on update.
14. Monitor and Limit File Size
-
Why: Large file sizes increase load and save times, sometimes leading to crashes.
-
Best Practice: Remove unused sheets, clear unnecessary formatting, and compress images.
-
Example: Use the Inspect Document tool under File > Info to identify and remove hidden data.
15. Use Table References Instead of Cell Ranges
-
Why: Dynamic ranges or poorly defined cell references slow performance.
-
Best Practice: Convert ranges to tables for dynamic referencing.
-
Example: Instead of referencing A1:A1000, convert the range into a table (Ctrl + T).
Poorly Optimized File Examples
Example #1 Of a Poorly Optimized File
Consider a file with the following characteristics:
-
Number of Tabs: 58
-
Cube Connections: 9
-
Cube References:
-
Cube Connection 1: References 1000+ tabs, pivot tables, or formulas
-
Cube Connection 2: References 0 tabs, pivot tables, or formulas
-
Cube connection 3: References 2 tabs, pivot tables, or formulas
-
Cube connection 4: References 400+ tabs, pivot tables, or formulas
-
Cube connection 5: References 1000+ tabs, pivot tables, or formulas
-
Cube connection 6: References 0 tabs, pivot tables, or formulas
-
Cube connection 7: References 0 tabs, pivot tables, or formulas
-
Cube connection 8: References 2 tabs, pivot tables, or formulas
-
Cube Connection 9: References 0 tabs, pivot tables, or formulas
-
In this example, the file is complex and exceeds recommended limits. Cube connections 1, 5, and 4 reference an extensive number of tabs, pivot tables, or formulas, contributing to the file's sluggish performance. Additionally, cube connections 2, 6, 7, and 9 are not used, adding unnecessary processing demands.
Example #2 Of a Poorly Optimized File
Consider another file with the following characteristics:
-
Number of Tabs: 54
-
Cube Connections: 6
-
Total Impact: 75 distinct connections
-
Cube References:
-
Cube 1: 32 pivot tables or tabs
-
Cube 2: 3 pivot tables or tabs
-
Cube 3: 2 pivot tables or tabs
-
Cube 4: 15 pivot tables or tabs
-
Cube 5: 1 pivot table or tab
-
Cube 6: 22 pivot tables or tabs
-
-
Contributing Tabs:
-
Tab 2 = 10K rows (too many rows, unnecessary; filter data down)
-
Tab 26 = Thousands of blank rows (Control + Down Arrow + Minus to remove blank rows)
-
Tab 27 = Upwards of 73 hidden columns (use filters and filter down data set rather than hiding)
-
Tab 28 = 29K rows (too many rows, unnecessary; filter data down)
-
In this example, the file's complexity and size contribute to its poor performance. Cube connections are spread across numerous tabs, and excessive rows and hidden columns add to the processing demands. Following the optimization tips provided can help improve the file's usability and performance.
For more troubleshooting advice for Excel Best Practices and for managing Corrupt files, see our recommendations: Catalyst Best Practices and How do I fix a stale, corrupt, or broken file?
Following these best practices will ensure your EBM Excel files are optimized for performance and stability. Keep in mind, failure to adhere to these recommendations can result in errors or degraded performance, and these limitations are due to Microsoft Excel's inherent design. By implementing these guidelines, you’ll reduce the risk of issues and maximize the efficiency of your workbooks.
Comments
0 comments
Article is closed for comments.